Spark大师之路：广播变量（Broadcast）源码分析

概述

最近工作上忙死了……广播变量这一块其实早就看过了，一直没有贴出来。

本文基于Spark 1.0源码分析，主要探讨广播变量的初始化、创建、读取以及清除。

类关系

BroadcastManager类中包含一个BroadcastFactory对象的引用。大部分操作通过调用BroadcastFactory中的方法来实现。

BroadcastFactory是一个Trait，有两个直接子类TorrentBroadcastFactory、HttpBroadcastFactory。这两个子类实现了对HttpBroadcast、TorrentBroadcast的封装，而后面两个又同时集成了Broadcast抽象类。

图……就不画了

BroadcastManager的初始化

SparkContext初始化时会创建SparkEnv对象env，这个过程中会调用BroadcastManager的构造方法返回一个对象作为env的成员变量存在：

val broadcastManager = new BroadcastManager(isDriver, conf, securityManager)

构造BroadcastManager对象时会调用initialize方法，主要根据配置初始化broadcastFactory成员变量，并调用其initialize方法。

 val broadcastFactoryClass =
          conf.get("spark.broadcast.factory", "org.apache.spark.broadcast.HttpBroadcastFactory")

        broadcastFactory =
          Class.forName(broadcastFactoryClass).newInstance.asInstanceOf[BroadcastFactory]

        // Initialize appropriate BroadcastFactory and BroadcastObject
        broadcastFactory.initialize(isDriver, conf, securityManager)

两个工厂类的initialize方法都是对其相应实体类的initialize方法的调用，下面分开两个类来看。

HttpBroadcast的initialize方法

  def initialize(isDriver: Boolean, conf: SparkConf, securityMgr: SecurityManager) {
    synchronized {
      if (!initialized) {
        bufferSize = conf.getInt("spark.buffer.size", 65536)
        compress = conf.getBoolean("spark.broadcast.compress", true)
        securityManager = securityMgr
        if (isDriver) {
          createServer(conf)
          conf.set("spark.httpBroadcast.uri",  serverUri)
        }
        serverUri = conf.get("spark.httpBroadcast.uri")
        cleaner = new MetadataCleaner(MetadataCleanerType.HTTP_BROADCAST, cleanup, conf)
        compressionCodec = CompressionCodec.createCodec(conf)
        initialized = true
      }
    }
  }

除了一些变量的初始化外，主要做两件事情，一是createServer（只有在Driver端会做），其次是创建一个MetadataCleaner对象。

createServer

  private def createServer(conf: SparkConf) {
    broadcastDir = Utils.createTempDir(Utils.getLocalDir(conf))
    server = new HttpServer(broadcastDir, securityManager)
    server.start()
    serverUri = server.uri
    logInfo("Broadcast server started at " + serverUri)
  }

首先创建一个存放广播变量的目录，默认是

conf.get("spark.local.dir",  System.getProperty("java.io.tmpdir")).split(',')(0)

然后初始化一个HttpServer对象并启动（封装了jetty），启动过程中包括加载资源文件，起端口和线程用来监控请求等。这部分的细节在org.apache.spark.HttpServer类中，此处不做展开。

创建MetadataCleaner对象

一个MetadataCleaner对象包装了一个定时计划Timer，每隔一段时间执行一个回调函数，此处传入的回调函数为cleanup：

  private def cleanup(cleanupTime: Long) {
    val iterator = files.internalMap.entrySet().iterator()
    while(iterator.hasNext) {
      val entry = iterator.next()
      val (file, time) = (entry.getKey, entry.getValue)
      if (time < cleanupTime) {
        iterator.remove()
        deleteBroadcastFile(file)
      }
    }
  }

即清楚存在吵过一定时长的broadcast文件。在时长未设定（默认情况）时，不清除：

 if (delaySeconds > 0) {
    logDebug(
      "Starting metadata cleaner for " + name + " with delay of " + delaySeconds + " seconds " +
      "and period of " + periodSeconds + " secs")
    timer.schedule(task, periodSeconds * 1000, periodSeconds * 1000)
  }

TorrentBroadcast的initialize方法

  def initialize(_isDriver: Boolean, conf: SparkConf) {
    TorrentBroadcast.conf = conf // TODO: we might have to fix it in tests
    synchronized {
      if (!initialized) {
        initialized = true
      }
    }
  }

Torrent在此处没做什么，这也可以看出和Http的区别，Torrent的处理方式就是p2p，去中心化。而Http是中心化服务，需要启动服务来接受请求。

创建broadcast变量

调用SparkContext中的 def broadcast[T: ClassTag](value: T): Broadcast[T]方法来初始化一个广播变量，实现如下：

def broadcast[T: ClassTag](value: T): Broadcast[T] = {
    val bc = env.broadcastManager.newBroadcast[T](value, isLocal)
    cleaner.foreach(_.registerBroadcastForCleanup(bc))
    bc
  }

即调用broadcastManager的newBroadcast方法：

  def newBroadcast[T: ClassTag](value_ : T, isLocal: Boolean) = {
    broadcastFactory.newBroadcast[T](value_, isLocal, nextBroadcastId.getAndIncrement())
  }

再调用工厂类的newBroadcast方法，此处返回的是一个Broadcast对象。

HttpBroadcastFactory的newBroadcast

  def newBroadcast[T: ClassTag](value_ : T, isLocal: Boolean, id: Long) =
    new HttpBroadcast[T](value_, isLocal, id)

即创建一个新的HttpBroadcast对象并返回。

构造对象时主要做两件事情：

 HttpBroadcast.synchronized {
    SparkEnv.get.blockManager.putSingle(
      blockId, value_, StorageLevel.MEMORY_AND_DISK, tellMaster = false)
  }

  if (!isLocal) {
    HttpBroadcast.write(id, value_)
  }

1.将变量id和值放入blockManager，但并不通知master

2.调用伴生对象的write方法

def write(id: Long, value: Any) {
    val file = getFile(id)
    val out: OutputStream = {
      if (compress) {
        compressionCodec.compressedOutputStream(new FileOutputStream(file))
      } else {
        new BufferedOutputStream(new FileOutputStream(file), bufferSize)
      }
    }
    val ser = SparkEnv.get.serializer.newInstance()
    val serOut = ser.serializeStream(out)
    serOut.writeObject(value)
    serOut.close()
    files += file
  }

write方法将对象值按照指定的压缩、序列化写入指定的文件。这个文件所在的目录即是HttpServer的资源目录，文件名和id的对应关系为：

case class BroadcastBlockId(broadcastId: Long, field: String = "") extends BlockId {
  def name = "broadcast_" + broadcastId + (if (field == "") "" else "_" + field)
}

TorrentBroadcastFactory的newBroadcast方法

  def newBroadcast[T: ClassTag](value_ : T, isLocal: Boolean, id: Long) =
    new TorrentBroadcast[T](value_, isLocal, id)

同样是创建一个TorrentBroadcast对象，并返回。

  TorrentBroadcast.synchronized {
    SparkEnv.get.blockManager.putSingle(
      broadcastId, value_, StorageLevel.MEMORY_AND_DISK, tellMaster = false)
  }

  if (!isLocal) {
    sendBroadcast()
  }

做两件事情，第一步和Http一样，第二步：

  def sendBroadcast() {
    val tInfo = TorrentBroadcast.blockifyObject(value_)
    totalBlocks = tInfo.totalBlocks
    totalBytes = tInfo.totalBytes
    hasBlocks = tInfo.totalBlocks

    // Store meta-info
    val metaId = BroadcastBlockId(id, "meta")
    val metaInfo = TorrentInfo(null, totalBlocks, totalBytes)
    TorrentBroadcast.synchronized {
      SparkEnv.get.blockManager.putSingle(
        metaId, metaInfo, StorageLevel.MEMORY_AND_DISK, tellMaster = true)
    }

    // Store individual pieces
    for (i <- 0 until totalBlocks) {
      val pieceId = BroadcastBlockId(id, "piece" + i)
      TorrentBroadcast.synchronized {
        SparkEnv.get.blockManager.putSingle(
          pieceId, tInfo.arrayOfBlocks(i), StorageLevel.MEMORY_AND_DISK, tellMaster = true)
      }
    }
  }

可以看出，先将元数据信息缓存到blockManager，再将块信息缓存过去。开头可以看到有一个分块动作，是调用伴生对象的blockifyObject方法：

def blockifyObject[T](obj: T): TorrentInfo

此方法将对象obj分块（默认块大小为4M），返回一个TorrentInfo对象，第一个参数为一个TorrentBlock对象（包含blockID和block字节数组）、块数量以及obj的字节流总长度。

元数据信息中的blockId为广播变量id+后缀，value为总块数和总字节数。

数据信息是分块缓存，每块的id为广播变量id加后缀及块变好，数据位一个TorrentBlock对象

读取广播变量的值

通过调用bc.value来取得广播变量的值，其主要实现在反序列化方法readObject中

HttpBroadcast的反序列化

 HttpBroadcast.synchronized {
      SparkEnv.get.blockManager.getSingle(blockId) match {
        case Some(x) => value_ = x.asInstanceOf[T]
        case None => {
          logInfo("Started reading broadcast variable " + id)
          val start = System.nanoTime
          value_ = HttpBroadcast.read[T](id)
          /*
           * We cache broadcast data in the BlockManager so that subsequent tasks using it
           * do not need to re-fetch. This data is only used locally and no other node
           * needs to fetch this block, so we don't notify the master.
           */
          SparkEnv.get.blockManager.putSingle(
            blockId, value_, StorageLevel.MEMORY_AND_DISK, tellMaster = false)
          val time = (System.nanoTime - start) / 1e9
          logInfo("Reading broadcast variable " + id + " took " + time + " s")
        }
      }
    }

首先查看blockManager中是否已有，如有则直接取值，否则调用伴生对象的read方法进行读取：

def read[T: ClassTag](id: Long): T = {
    logDebug("broadcast read server: " +  serverUri + " id: broadcast-" + id)
    val url = serverUri + "/" + BroadcastBlockId(id).name

    var uc: URLConnection = null
    if (securityManager.isAuthenticationEnabled()) {
      logDebug("broadcast security enabled")
      val newuri = Utils.constructURIForAuthentication(new URI(url), securityManager)
      uc = newuri.toURL.openConnection()
      uc.setAllowUserInteraction(false)
    } else {
      logDebug("broadcast not using security")
      uc = new URL(url).openConnection()
    }

    val in = {
      uc.setReadTimeout(httpReadTimeout)
      val inputStream = uc.getInputStream
      if (compress) {
        compressionCodec.compressedInputStream(inputStream)
      } else {
        new BufferedInputStream(inputStream, bufferSize)
      }
    }
    val ser = SparkEnv.get.serializer.newInstance()
    val serIn = ser.deserializeStream(in)
    val obj = serIn.readObject[T]()
    serIn.close()
    obj
  }

使用serverUri和block id对应的文件名直接开启一个HttpConnection将中心服务器上相应的数据取过来，使用配置的压缩和序列化机制进行解压和反序列化。

这里可以看到，所有需要用到广播变量值的executor都需要去driver上pull广播变量的内容。

取到值后，缓存到blockManager中，以便下次使用。

TorrentBroadcast的反序列化

private def readObject(in: ObjectInputStream) {
    in.defaultReadObject()
    TorrentBroadcast.synchronized {
      SparkEnv.get.blockManager.getSingle(broadcastId) match {
        case Some(x) =>
          value_ = x.asInstanceOf[T]

        case None =>
          val start = System.nanoTime
          logInfo("Started reading broadcast variable " + id)

          // Initialize @transient variables that will receive garbage values from the master.
          resetWorkerVariables()

          if (receiveBroadcast()) {
            value_ = TorrentBroadcast.unBlockifyObject[T](arrayOfBlocks, totalBytes, totalBlocks)

            /* Store the merged copy in cache so that the next worker doesn't need to rebuild it.
             * This creates a trade-off between memory usage and latency. Storing copy doubles
             * the memory footprint; not storing doubles deserialization cost. Also,
             * this does not need to be reported to BlockManagerMaster since other executors
             * does not need to access this block (they only need to fetch the chunks,
             * which are reported).
             */
            SparkEnv.get.blockManager.putSingle(
              broadcastId, value_, StorageLevel.MEMORY_AND_DISK, tellMaster = false)

            // Remove arrayOfBlocks from memory once value_ is on local cache
            resetWorkerVariables()
          } else {
            logError("Reading broadcast variable " + id + " failed")
          }

          val time = (System.nanoTime - start) / 1e9
          logInfo("Reading broadcast variable " + id + " took " + time + " s")
      }
    }
  }

和Http一样，都是先查看blockManager中是否已经缓存，若没有，则调用receiveBroadcast方法：

def receiveBroadcast(): Boolean = {
    // Receive meta-info about the size of broadcast data,
    // the number of chunks it is divided into, etc.
    val metaId = BroadcastBlockId(id, "meta")
    var attemptId = 10
    while (attemptId > 0 && totalBlocks == -1) {
      TorrentBroadcast.synchronized {
        SparkEnv.get.blockManager.getSingle(metaId) match {
          case Some(x) =>
            val tInfo = x.asInstanceOf[TorrentInfo]
            totalBlocks = tInfo.totalBlocks
            totalBytes = tInfo.totalBytes
            arrayOfBlocks = new Array[TorrentBlock](totalBlocks)
            hasBlocks = 0

          case None =>
            Thread.sleep(500)
        }
      }
      attemptId -= 1
    }
    if (totalBlocks == -1) {
      return false
    }

    /*
     * Fetch actual chunks of data. Note that all these chunks are stored in
     * the BlockManager and reported to the master, so that other executors
     * can find out and pull the chunks from this executor.
     */
    val recvOrder = new Random().shuffle(Array.iterate(0, totalBlocks)(_ + 1).toList)
    for (pid <- recvOrder) {
      val pieceId = BroadcastBlockId(id, "piece" + pid)
      TorrentBroadcast.synchronized {
        SparkEnv.get.blockManager.getSingle(pieceId) match {
          case Some(x) =>
            arrayOfBlocks(pid) = x.asInstanceOf[TorrentBlock]
            hasBlocks += 1
            SparkEnv.get.blockManager.putSingle(
              pieceId, arrayOfBlocks(pid), StorageLevel.MEMORY_AND_DISK, tellMaster = true)

          case None =>
            throw new SparkException("Failed to get " + pieceId + " of " + broadcastId)
        }
      }
    }

    hasBlocks == totalBlocks
  }

和写数据一样，同样是分成两个部分，首先取元数据信息，再根据元数据信息读取实际的block信息。注意这里都是从blockManager中读取的，这里贴出blockManager.getSingle的分析。

调用栈中最后到BlockManager.doGetRemote方法，中间有一条语句：

 val locations = Random.shuffle(master.getLocations(blockId))

即将存有这个block的节点信息随机打乱，然后使用：

 val data = BlockManagerWorker.syncGetBlock(
        GetBlock(blockId), ConnectionManagerId(loc.host, loc.port))

来获取。

从这里可以看出，Torrent方法首先将广播变量数据分块，并存到BlockManager中；每个节点需要读取广播变量时，是分块读取，对每一块都读取其位置信息，然后随机选一个存有此块数据的节点进行get；每个节点读取后会将包含的快信息报告给BlockManagerMaster，这样本地节点也成为了这个广播网络中的一个peer。

与Http方式形成鲜明对比，这是一个去中心化的网络，只需要保持一个tracker即可，这就是p2p的思想。

广播变量的清除

广播变量被创建时，紧接着有这样一句代码：

cleaner.foreach(_.registerBroadcastForCleanup(bc))

cleaner是一个ContextCleaner对象，会将刚刚创建的广播变量注册到其中，调用栈为：

  def registerBroadcastForCleanup[T](broadcast: Broadcast[T]) {
    registerForCleanup(broadcast, CleanBroadcast(broadcast.id))
  }

  private def registerForCleanup(objectForCleanup: AnyRef, task: CleanupTask) {
    referenceBuffer += new CleanupTaskWeakReference(task, objectForCleanup, referenceQueue)
  }

等出现广播变量被弱引用时（关于弱引用，可以参考：http://blog.csdn.net/lyfi01/article/details/6415726），则会执行

cleaner.foreach(_.start())

start方法中会调用keepCleaning方法，会遍历注册的清理任务（包括RDD、shuffle和broadcast），依次进行清理：

private def keepCleaning(): Unit = Utils.logUncaughtExceptions {
    while (!stopped) {
      try {
        val reference = Option(referenceQueue.remove(ContextCleaner.REF_QUEUE_POLL_TIMEOUT))
          .map(_.asInstanceOf[CleanupTaskWeakReference])
        reference.map(_.task).foreach { task =>
          logDebug("Got cleaning task " + task)
          referenceBuffer -= reference.get
          task match {
            case CleanRDD(rddId) =>
              doCleanupRDD(rddId, blocking = blockOnCleanupTasks)
            case CleanShuffle(shuffleId) =>
              doCleanupShuffle(shuffleId, blocking = blockOnCleanupTasks)
            case CleanBroadcast(broadcastId) =>
              doCleanupBroadcast(broadcastId, blocking = blockOnCleanupTasks)
          }
        }
      } catch {
        case e: Exception => logError("Error in cleaning thread", e)
      }
    }
  }

doCleanupBroadcast调用以下语句：

broadcastManager.unbroadcast(broadcastId, true, blocking)

然后是：

  def unbroadcast(id: Long, removeFromDriver: Boolean, blocking: Boolean) {
    broadcastFactory.unbroadcast(id, removeFromDriver, blocking)
  }

每个工厂类调用其对应实体类的伴生对象的unbroadcast方法。

HttpBroadcast中的变量清除

 def unpersist(id: Long, removeFromDriver: Boolean, blocking: Boolean) = synchronized {
    SparkEnv.get.blockManager.master.removeBroadcast(id, removeFromDriver, blocking)
    if (removeFromDriver) {
      val file = getFile(id)
      files.remove(file)
      deleteBroadcastFile(file)
    }
  }

1是删除blockManager中的缓存，2是删除本地持久化的文件

TorrentBroadcast中的变量清除

  def unpersist(id: Long, removeFromDriver: Boolean, blocking: Boolean) = synchronized {
    SparkEnv.get.blockManager.master.removeBroadcast(id, removeFromDriver, blocking)
  }

小结

Broadcast可以使用在executor端多次使用某个数据的场景（比如说字典），Http和Torrent两种方式对应传统的CS访问方式和P2P访问方式，当广播变量较大或者使用较频繁时，采用后者可以减少driver端的压力。

BlockManager在此处充当P2P中的tracker角色，没有展开描述，后续会开专题讲这个部分。

声明：本文为原创，禁止用于任何商业目的，转载请注明出处：http://blog.csdn.net/asongoficeandfire/article/details/37584643

Spark大师之路：广播变量（Broadcast）源码分析,布布扣,bubuko.com

时间： 2024-11-02 14:46:35

Spark大师之路：广播变量（Broadcast）源码分析

概述

类关系

BroadcastManager的初始化

HttpBroadcast的initialize方法

createServer

创建MetadataCleaner对象

TorrentBroadcast的initialize方法

创建broadcast变量

HttpBroadcastFactory的newBroadcast

TorrentBroadcastFactory的newBroadcast方法

读取广播变量的值

HttpBroadcast的反序列化

TorrentBroadcast的反序列化

广播变量的清除

HttpBroadcast中的变量清除

TorrentBroadcast中的变量清除

小结

Spark大师之路：广播变量（Broadcast）源码分析的相关文章

《深入理解SPARK：核心思想与源码分析》——SparkContext的初始化（中）

通过 spark.files 传入spark任务依赖的文件源码分析

java ThreadLocal线程设置私有变量底层源码分析

Tomcat7.0源码分析——启动与停止服务

Tomcat7.0源码分析——启动与停止服务原理

Tomcat7.0源码分析——请求原理分析（中）

Tomcat7.0源码分析——请求原理分析（上）

Spark之SQL解析（源码阅读十）

Tomcat7.0源码分析——请求原理分析

Tomcat7.0源码分析——生命周期管理