通过Spark Rest 服务监控Spark任务执行情况

1、Rest服务

　　Spark源为了方便用户对任务做监控，从1.4版本启用Rest服务，用户可以通过访问地址，得到application的运行状态。

　　Spark的REST API返回的信息是JSON格式的，开发者们可以很方便地通过这个API来创建可视化的Spark监控工具。目前

　　这个API支持正在运行的应用程序，也支持历史服务器。在请求URL都有/api/v1。比如，对于历史服务器来说，我们可以通过

　　http://***:18080/api/v1 来获取一些信息，端口可以改；对于正在运行的Spark应用程序，我们可以通过 https://***/api/v1

　　来获取一些信息。

　　主要用途：通过rest服务，可以轻松对任务时长、stage等做监控，同时可以配合时间序列数据库，对集群各个任务做监控。

2、实例代码（Python）

 1 #!/usr/bin/env python
 2 # -*- coding: utf-8 -*-
 3 ‘‘‘
 4     Created by zhangy on Aug 25, 2017
 5 ‘‘‘
 6 import datetime
 7 import json, urllib2
 8 import os
 9 import time
10
11
12 if __name__ == ‘__main__‘:
13     command = "yarn application -list |grep Noce |awk -F‘\t‘ ‘{print $1}‘"
14     val = os.popen(command).read()
15     appids = val.split("\n")
16     for pid in appids:
17         if pid.__eq__(""):continue
18         url = "http://th04-znwg-sgi620-001:18088/api/v1/applications/" + pid
19         req = urllib2.Request(url)
20         res_data = urllib2.urlopen(req)
21         res = res_data.read()
22         jo = json.loads(res)
23         dict1 = jo[‘attempts‘][0]
24         st = dict1[‘startTime‘]
25         GMT_FORMAT = ‘%Y-%m-%dT%H:%M:%S.%fGMT‘
26         sti = datetime.datetime.strptime(st, GMT_FORMAT)
27         startTime = time.mktime(sti.timetuple()) + 8 * 60 * 60
28         nowTime = long(time.time())
29         sub = nowTime - startTime
30         if sub > 4 * 60 * 60:
31             killCommand = "yarn application -kill " + pid
32             res = os.popen(command).read()
33             cc = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(float(nowTime)))
34             f = open("/home/noce1/run_noce/his/monitor/" + pid + ".txt", "a")
35             f.write(cc + " : " + "pid ： " + "\n" + sub + " seconds")
36             f.write(res + "\n")
37             f.close()
38
39
40

　　测试实例中，只是对spark任务的时间做了监控，如果任务超过理想执行时长（4个小时），则终止任务，释放资源。

　结果：

 1 例如：
 2 http://132.12*****:18088/api/v1/applications/application_1502706935975_233268/
 3
 4 返回内容：json格式
 5 {
 6   "id" : "application_1502706935975_233268",
 7   "name" : "FRT3_73",
 8   "attempts" : [ {
 9     "startTime" : "2017-09-04T01:29:53.986GMT",
10     "endTime" : "2017-09-04T01:31:52.955GMT",
11     "sparkUser" : "noce1",
12     "completed" : true
13   } ]
14 }

3、官方其他（2.1.0版本，http:**** :18080/api/v1/）

Endpoint	Meaning
/applications	A list of all applications. ?status=[completed\|running] list only applications in the chosen state. ?minDate=[date] earliest start date/time to list. ?maxDate=[date] latest start date/time to list. ?minEndDate=[date] earliest end date/time to list. ?maxEndDate=[date] latest end date/time to list. ?limit=[limit] limits the number of applications listed. Examples: ?minDate=2015-02-10 ?minDate=2015-02-03T16:42:40.000GMT ?maxDate=2015-02-11T20:41:30.000GMT ?minEndDate=2015-02-12 ?minEndDate=2015-02-12T09:15:10.000GMT ?maxEndDate=2015-02-14T16:30:45.000GMT ?limit=10
/applications/[app-id]/jobs	A list of all jobs for a given application. ?status=[running\|succeeded\|failed\|unknown] list only jobs in the specific state.
/applications/[app-id]/jobs/[job-id]	Details for the given job.
/applications/[app-id]/stages	A list of all stages for a given application.
/applications/[app-id]/stages/[stage-id]	A list of all attempts for the given stage.
/applications/[app-id]/stages/[stage-id]/[stage-attempt-id]	Details for the given stage attempt.
/applications/[app-id]/stages/[stage-id]/[stage-attempt-id]/taskSummary	Summary metrics of all tasks in the given stage attempt. ?quantiles summarize the metrics with the given quantiles. Example: ?quantiles=0.01,0.5,0.99
/applications/[app-id]/stages/[stage-id]/[stage-attempt-id]/taskList	A list of all tasks for the given stage attempt. ?offset=[offset]&length=[len] list tasks in the given range. ?sortBy=[runtime\|-runtime] sort the tasks. Example: ?offset=10&length=50&sortBy=runtime
/applications/[app-id]/executors	A list of all active executors for the given application.
/applications/[app-id]/allexecutors	A list of all(active and dead) executors for the given application.
/applications/[app-id]/storage/rdd	A list of stored RDDs for the given application.
/applications/[app-id]/storage/rdd/[rdd-id]	Details for the storage status of a given RDD.
/applications/[base-app-id]/logs	Download the event logs for all attempts of the given application as files within a zip file.
/applications/[base-app-id]/[attempt-id]/logs	Download the event logs for a specific application attempt as a zip file.
/applications/[app-id]/streaming/statistics	Statistics for the streaming context.
/applications/[app-id]/streaming/receivers	A list of all streaming receivers.
/applications/[app-id]/streaming/receivers/[stream-id]	Details of the given receiver.
/applications/[app-id]/streaming/batches	A list of all retained batches.
/applications/[app-id]/streaming/batches/[batch-id]	Details of the given batch.
/applications/[app-id]/streaming/batches/[batch-id]/operations	A list of all output operations of the given batch.
/applications/[app-id]/streaming/batches/[batch-id]/operations/[outputOp-id]	Details of the given operation and given batch.
/applications/[app-id]/environment	Environment details of the given application.

时间： 2024-12-09 00:29:07

通过Spark Rest 服务监控Spark任务执行情况的相关文章

sqlserver 监控自动化作业执行情况

ALTER procedure [dbo].[monitorJob] @name varchar(100) as begin declare @bd varchar(100) ; if exists( select * from msdb.dbo.sysjobhistory where job_id in (select job_id from msdb.dbo.sysjobs where [name][email protected] ) and run_date=convert(varch

使用Spring定时任务并且通过AOP监控任务执行情况

原文:http://www.open-open.com/code/view/1426250803279 本文讲的是通过Spring注解的方式实现任务调度.只要引入了spring-context包就能够在项目中使用注解方式的任务调度. 下面看具体配置需要在Spring配置文件中加入task的schema. <xmlns:task="http://www.springframework.org/schema/task" xsi:schemaLocation="http:/

FocusBI:《DW/BI项目管理》之SSIS执行情况

微信公众号:FocusBI关注可了解更多的商业智能.数据仓库.数据库开发.爬虫知识及沪深股市数据推送.问题或建议,请关注公众号发送消息留言;如果你觉得FocusBI对你有帮助,欢迎转发朋友圈或在文章末尾点赞[1] 在 FocusBI:SSIS体系结构.<SSIS开发案例>这两篇文章中讲到SSIS 开发完最终是要被执行的,但是被执行后会出现什么样的情况,如何去监控它的执行情况:这也是在BI实施中遇到的难题,当有上百个包我们应该如何管理这个SSIS的ETL 项目,虽然SSIS执行出错是有邮件通知出

Spark（六）Spark任务提交方式和执行流程

一.Spark中的基本概念 (1)Application:表示你的应用程序 (2)Driver:表示main()函数,创建SparkContext.由SparkContext负责与ClusterManager通信,进行资源的申请,任务的分配和监控等.程序执行完毕后关闭SparkContext (3)Executor:某个Application运行在Worker节点上的一个进程,该进程负责运行某些task,并且负责将数据存在内存或者磁盘上.在Spark on Yarn模式下,其进程名称为 Coar

CDH5上安装Hive,HBase,Impala,Spark等服务

Apache Hadoop的服务的部署比较繁琐,需要手工编辑配置文件.下载依赖包等.Cloudera Manager以GUI的方式的管理CDH集群,提供向导式的安装步骤.由于需要对Hive,HBase,Impala,Spark进行功能测试,就采用了Cloudera Manager方式进行安装. Cloudera Manager提供两种软件包安装源,Package 和 Parcel: Package就是一个个rpm文件,以yum的方式组织起来. Parcel是rpm包的压缩格式,以.parcel结

Spark（五十）：使用JvisualVM监控Spark Executor JVM

引导 Windows环境下JvisulaVM一般存在于安装了JDK的目录${JAVA_HOME}/bin/JvisualVM.exe,它支持(本地和远程)jstatd和JMX两种方式连接远程JVM. jstatd (Java Virtual Machine jstat Daemon)——监听远程服务器的CPU,内存,线程等信息 JMX(Java Management Extensions,即Java管理扩展)是一个为应用程序.设备.系统等植入管理功能的框架.JMX可以跨越一系列异构操作系统平台.

Spark Web UI 监控详解

Spark集群环境配置我们有2个节点,每个节点是一个worker,每个worker上启动一个Executor,其中Driver也跑在master上.每个Executor可使用的核数为2,可用的内存为2g,集群中所有Executor最大可用核数为4. conf/spark-defaults.conf 部分参数配置如下: spark.master spark://Master:7077 spark.executor.memory 2g spark.executor.cores 2 spark.co

讨论Spark的配置监控和性能优化

讨论Spark的配置监控和性能优化(某课程笔记) 上完这节课以后,你将能够描述集群的概念通过修改Spark的属性,环境变量,或者是日志属性来配置Spark 使用Web端界面,以及各种不同的外部工具来监控Spark和应用程序在Spark集群中有三种主要的组成部分.驱动程序,是放置主程序中SparkContext的地方,要运行一个集群,你需要一个集群管理器它可以是单机版的集群管理器,也可以是 Mesos 或者 Yarn 而worker节点,就是放置执行器的地方执行器,是运行计算和储存应用程序

Spark集群模式&Spark程序提交

Spark集群模式&Spark程序提交 1. 集群管理器 Spark当前支持三种集群管理方式 Standalone-Spark自带的一种集群管理方式,易于构建集群. Apache Mesos-通用的集群管理,可以在其上运行Hadoop MapReduce和一些服务应用. Hadoop YARN-Hadoop2中的资源管理器. Tip1: 在集群不是特别大,并且没有mapReduce和Spark同时运行的需求的情况下,用Standalone模式效率最高. Tip2: Spark可以在应用间(通过集