Spark运行各个时间段的解释

package org.apache.spark.ui

private[spark] object ToolTips {
  val SCHEDULER_DELAY =
    """Scheduler delay includes time to ship the task from the scheduler to
       the executor, and time to send the task result from the executor to the scheduler. If
       scheduler delay is large, consider decreasing the size of tasks or decreasing the size
       of task results."""

val TASK_DESERIALIZATION_TIME =
    """Time spent deserializing the task closure on the executor, including the time to read the
       broadcasted task."""

val KSHUFFLE_READ_BLOCED_TIME =
    "Time that the task spent blocked waiting for shuffle data to be read from remote machines."

val INPUT = "Bytes and records read from Hadoop or from Spark storage."

val OUTPUT = "Bytes and records written to Hadoop."

val STORAGE_MEMORY =
    "Memory used / total available memory for storage of data " +
      "like RDD partitions cached in memory. "

val SHUFFLE_WRITE =
    "Bytes and records written to disk in order to be read by a shuffle in a future stage."

val SHUFFLE_READ =
    """Total shuffle bytes and records read (includes both data read locally and data read from
       remote executors). """

val SHUFFLE_READ_REMOTE_SIZE =
    """Total shuffle bytes read from remote executors. This is a subset of the shuffle
       read bytes; the remaining shuffle data is read locally. """

val GETTING_RESULT_TIME =
    """Time that the driver spends fetching task results from workers. If this is large, consider
       decreasing the amount of data returned from each task."""

val RESULT_SERIALIZATION_TIME =
    """Time spent serializing the task result on the executor before sending it back to the
       driver."""

val GC_TIME =
    """Time that the executor spent paused for Java garbage collection while the task was
       running."""

val JOB_TIMELINE =
    """Shows when jobs started and ended and when executors joined or left. Drag to scroll.
       Click Enable Zooming and use mouse wheel to zoom in/out."""

val STAGE_TIMELINE =
    """Shows when stages started and ended and when executors joined or left. Drag to scroll.
       Click Enable Zooming and use mouse wheel to zoom in/out."""

val JOB_DAG =
    """Shows a graph of stages executed for this job, each of which can contain
       multiple RDD operations (e.g. map() and filter()), and of RDDs inside each operation
       (shown as dots)."""

val STAGE_DAG =
    """Shows a graph of RDD operations in this stage, and RDDs inside each one. A stage can run
       multiple operations (e.g. two map() functions) if they can be pipelined. Some operations
       also create multiple RDDs internally. Cached RDDs are shown in green.
    """
}

时间: 2024-11-09 02:37:42

Spark运行各个时间段的解释的相关文章

Apache Spark源码走读之12 -- Hive on Spark运行环境搭建

欢迎转载,转载请注明出处,徽沪一郎. 楔子 Hive是基于Hadoop的开源数据仓库工具,提供了类似于SQL的HiveQL语言,使得上层的数据分析人员不用知道太多MapReduce的知识就能对存储于Hdfs中的海量数据进行分析.由于这一特性而收到广泛的欢迎. Hive的整体框架中有一个重要的模块是执行模块,这一部分是用Hadoop中MapReduce计算框架来实现,因而在处理速度上不是非常令人满意.由于Spark出色的处理速度,有人已经成功将HiveQL的执行利用Spark来运行,这就是已经非常

Spark3000门徒第七课Spark运行原理及RDD解密总结

今晚听了王家林老师的第七课Spark运行原理及RDD解密,课后作业是:spark基本原理,我的总结如下: 1 spark是分布式 基于内存 特别适合于迭代计算的计算框架 2 mapReduce就两个阶段map和reduce,而spark是不断地迭代计算,更加灵活更加强大,容易构造复杂算法. 3 spark不能取代hive,hive做数据仓库存储,spark sql只是取代hive的计算引擎 4 spark中间数据可以在内存也可以在磁盘 5 partition是一个数据集合 6 注意:初学者执行多

Spark运行架构

1.构建Spark Application运行环境: 在Driver Program中新建SparkContext(包含sparkcontext的程序称为Driver Program): Spark Application运行的表现方式为:在集群上运行着一组独立的executor进程,这些进程由sparkcontext来协调: 2.SparkContext向资源管理器申请运行Executor资源,并启动StandaloneExecutorBackend,executor向sparkcontent

Spark运行调试方法与学习资源汇总

最近,在学习和使用Spark的过程中,遇到了一些莫名其妙的错误和问题,在逐个解决的过程中,体会到有必要对解决上述问题的方法进行总结,以便能够在短时间内尽快发现问题来源并解决问题,现与各位看官探讨学习如下: 解决spark运行调试问题的四把“尖刀”: 1.Log 包括控制台日志.主从节点日志.HDFS日志等.许多错误可以通过日志,直接对错误类型.错误来源进行准确定位,因此,学会读取和分析Log是解决问题的第一步. 2.Google 确定错误类型和原因后,就可以使用Google在Spark User

SQL Server 运行计划操作符具体解释(2)——串联(Concatenation )

本文接上文:SQL Server 运行计划操作符具体解释(1)--断言(Assert) 前言: 依据计划.本文開始讲述另外一个操作符串联(Concatenation).读者能够依据这个词(中英文均可)先幻想一下是干嘛的.事实上还是挺直观,就是把东西连起来.那么以下我们来看看究竟连什么?怎么连?什么时候连? 简单介绍: 串联操作符既是物理操作符,也是逻辑操作符.在中文版SQL Server的图形化运行计划中称为"串联",在其它格式及英文版本号中称为"Concatenation&

Spark运行原理解析

前言: Spark Application的运行架构由两部分组成:driver program(SparkContext)和executor.Spark Application一般都是在集群中运行,比如Spark Standalone,YARN,mesos,这些集群给spark Application提供了计算资源和这些资源管理,这些资源既可以给executor运行,也可以给driver program运行.根据Spark Application的driver program是否在资源集群中运行

【转载】Spark运行架构

1. Spark运行架构 1.1 术语定义 lApplication:Spark Application的概念和Hadoop MapReduce中的类似,指的是用户编写的Spark应用程序,包含了一个Driver 功能的代码和分布在集群中多个节点上运行的Executor代码: lDriver:Spark中的Driver即运行上述Application的main()函数并且创建SparkContext,其中创建SparkContext的目的是为了准备Spark应用程序的运行环境.在Spark中由S

Spark入门实战系列--4.Spark运行架构

[注]该系列文章以及使用到安装包/测试数据 可以在<倾情大奉送--Spark入门实战系列>获取 1. Spark运行架构 1.1 术语定义 lApplication:Spark Application的概念和Hadoop MapReduce中的类似,指的是用户编写的Spark应用程序,包含了一个Driver 功能的代码和分布在集群中多个节点上运行的Executor代码: lDriver:Spark中的Driver即运行上述Application的main()函数并且创建SparkContext

spark记录(5)Spark运行流程及在不同集群中的运行过程

摘自:https://www.cnblogs.com/qingyunzong/p/8945933.html 一.Spark中的基本概念 (1)Application:表示你的应用程序 (2)Driver:表示main()函数,创建SparkContext.由SparkContext负责与ClusterManager通信,进行资源的申请,任务的分配和监控等.程序执行完毕后关闭SparkContext (3)Executor:某个Application运行在Worker节点上的一个进程,该进程负责运