Intro
- 这篇是对一个Spark (Streaming)作业的log进行分析。用来加深对Spark application运行过程,优化空间的各种理解。
Here to Start
- 从我这个初学者写得一个Spark Streaming程序开始...
-
package com.wttttt.spark /** * Created with IntelliJ IDEA. * Description: * Author: wttttt * Github: https://github.com/wttttt-wang/hadoop_inaction * Date: 2017-05-19 * Time: 09:56 */ import java.util.regex.Pattern import org.apache.spark.SparkConf import org.apache.spark.streaming.{Milliseconds, StreamingContext} import org.slf4j.LoggerFactory import scala.collection.mutable object LocalTest { val logger = LoggerFactory.getLogger("LocalTest") def main(args: Array[String]) { val batchInterval = Milliseconds(10000) val slideInterval = Milliseconds(5000) val conf = new SparkConf() .setMaster("local[2]") .setAppName("LocalTest") // WARN StreamingContext: spark.master should be set as local[n], n > 1 in local mode if you have receivers to get data, // otherwise Spark jobs will not get resources to process the received data. val sc = new StreamingContext(conf, Milliseconds(5000)) sc.checkpoint("flumeCheckpoint/") val stream = sc.socketTextStream("localhost", 9998) val counts = stream.mapPartitions{ events => val pattern = Pattern.compile("\\?Input=[^\\s]*\\s") val map = new mutable.HashMap[String, Int]() logger.info("Handling events, events is empty: " + events.isEmpty) while (events.hasNext){ // par is an Iterator!!! val line = events.next() val m = pattern.matcher(line) if (m.find()) { val words = line.substring(m.start(), m.end()).split("=")(1).toLowerCase() logger.info(s"Processing words $words") map.put(words, map.getOrElse(words, 0) + 1) } } map.iterator } val window = counts.reduceByKeyAndWindow(_+_, _-_, batchInterval, slideInterval) // window.print() // transform和它的变体trnasformWith运行在DStream上任意的RDD-to-RDD函数; // 可以用来使用那些不包含在DStrema API中RDD操作 val sorted = window.transform(rdd =>{ val sortRdd = rdd.map(t => (t._2, t._1)).sortByKey(false).map(t => (t._2, t._1)) val more = sortRdd.take(2) more.foreach(println) sortRdd }) sorted.print() sc.start() sc.awaitTermination() } }
- 看运行log (摘取的一部分)
-
17/05/20 10:20:13 INFO Utils: Successfully started service ‘sparkDriver‘ on port 52300. 17/05/20 10:20:17 INFO ReducedWindowedDStream: Checkpoint interval automatically set to 10000 ms 17/05/20 10:20:17 INFO SocketInputDStream: Slide time = 5000 ms 17/05/20 10:20:17 INFO ReceiverTracker: Receiver 0 started 17/05/20 10:20:17 INFO DAGScheduler: Got job 0 (start at LocalTest.scala:66) with 1 output partitions 17/05/20 10:20:17 INFO JobScheduler: Started JobScheduler 17/05/20 10:20:17 INFO StreamingContext: StreamingContext started 17/05/20 10:20:20 INFO SparkContext: Starting job: sortByKey at LocalTest.scala:58 17/05/20 10:20:20 INFO DAGScheduler: Registering RDD 2 (mapPartitions at LocalTest.scala:36) 17/05/20 10:20:20 INFO DAGScheduler: Registering RDD 4 (reduceByKeyAndWindow at LocalTest.scala:52) 17/05/20 10:20:20 INFO DAGScheduler: Got job 1 (sortByKey at LocalTest.scala:58) with 2 output partitions 17/05/20 10:20:20 INFO DAGScheduler: Final stage: ResultStage 3 (sortByKey at LocalTest.scala:58) 17/05/20 10:20:20 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 1, ShuffleMapStage 2) 17/05/20 10:20:20 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 2) 17/05/20 10:20:20 INFO DAGScheduler: Submitting ShuffleMapStage 2 (ParallelCollectionRDD[4] at reduceByKeyAndWindow at LocalTest.scala:52), which has no missing parents 17/05/20 10:20:20 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 2 (ParallelCollectionRDD[4] at reduceByKeyAndWindow at LocalTest.scala:52) 17/05/20 10:20:20 INFO DAGScheduler: Job 1 finished: sortByKey at LocalTest.scala:58, took 0.314005 s 17/05/20 10:20:20 INFO SparkContext: Starting job: take at LocalTest.scala:59 17/05/20 10:20:20 INFO DAGScheduler: Registering RDD 7 (map at LocalTest.scala:58) 17/05/20 10:20:20 INFO DAGScheduler: Got job 2 (take at LocalTest.scala:59) with 1 output partitions 17/05/20 10:20:20 INFO DAGScheduler: Final stage: ResultStage 7 (take at LocalTest.scala:59) 17/05/20 10:20:20 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 6) 17/05/20 10:20:20 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 6) 17/05/20 10:20:20 INFO DAGScheduler: Submitting ShuffleMapStage 6 (MapPartitionsRDD[7] at map at LocalTest.scala:58), which has no missing parents 17/05/20 10:20:20 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 6 (MapPartitionsRDD[7] at map at LocalTest.scala:58) 17/05/20 10:20:20 INFO TaskSchedulerImpl: Adding task set 6.0 with 2 tasks 17/05/20 10:20:20 INFO TaskSetManager: Starting task 0.0 in stage 6.0 (TID 5, localhost, executor driver, partition 0, PROCESS_LOCAL, 6109 bytes) 17/05/20 10:20:20 INFO Executor: Running task 0.0 in stage 6.0 (TID 5) 17/05/20 10:20:20 INFO BlockManager: Found block rdd60 locally 17/05/20 10:20:20 INFO Executor: Finished task 0.0 in stage 6.0 (TID 5). 1062 bytes result sent to driver 17/05/20 10:20:20 INFO TaskSetManager: Starting task 1.0 in stage 6.0 (TID 6, localhost, executor driver, partition 1, PROCESS_LOCAL, 6109 bytes) 17/05/20 10:20:20 INFO Executor: Running task 1.0 in stage 6.0 (TID 6) 17/05/20 10:20:20 INFO TaskSetManager: Finished task 0.0 in stage 6.0 (TID 5) in 14 ms on localhost (executor driver) (1/2) 17/05/20 10:20:20 INFO BlockManager: Found block rdd61 locally 17/05/20 10:20:20 INFO Executor: Finished task 1.0 in stage 6.0 (TID 6). 1062 bytes result sent to driver 17/05/20 10:20:20 INFO TaskSetManager: Finished task 1.0 in stage 6.0 (TID 6) in 9 ms on localhost (executor driver) (2/2) 17/05/20 10:20:20 INFO TaskSchedulerImpl: Removed TaskSet 6.0, whose tasks have all completed, from pool 17/05/20 10:20:20 INFO DAGScheduler: ShuffleMapStage 6 (map at LocalTest.scala:58) finished in 0.024 s 17/05/20 10:20:20 INFO DAGScheduler: looking for newly runnable stages 17/05/20 10:20:20 INFO DAGScheduler: running: Set(ResultStage 0) 17/05/20 10:20:20 INFO DAGScheduler: waiting: Set(ResultStage 7) 17/05/20 10:20:20 INFO DAGScheduler: failed: Set() 17/05/20 10:20:20 INFO DAGScheduler: Submitting ResultStage 7 (MapPartitionsRDD[11] at map at LocalTest.scala:58), which has no missing parents 17/05/20 10:20:20 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 4.0 KB, free 912.2 MB) 17/05/20 10:20:20 INFO MemoryStore: Block broadcast4piece0 stored as bytes in memory (estimated size 2.4 KB, free 912.2 MB) 17/05/20 10:20:20 INFO BlockManagerInfo: Added broadcast4piece0 in memory on 192.168.6.90:52302 (size: 2.4 KB, free: 912.3 MB) 17/05/20 10:20:20 INFO SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:996 17/05/20 10:20:20 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 7 (MapPartitionsRDD[11] at map at LocalTest.scala:58) 17/05/20 10:20:20 INFO TaskSchedulerImpl: Adding task set 7.0 with 1 tasks 17/05/20 10:20:20 INFO TaskSetManager: Starting task 0.0 in stage 7.0 (TID 7, localhost, executor driver, partition 0, PROCESS_LOCAL, 5903 bytes) 17/05/20 10:20:20 INFO Executor: Running task 0.0 in stage 7.0 (TID 7) 17/05/20 10:20:20 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 2 blocks 17/05/20 10:20:20 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 17/05/20 10:20:20 INFO Executor: Finished task 0.0 in stage 7.0 (TID 7). 1718 bytes result sent to driver 17/05/20 10:20:20 INFO TaskSetManager: Finished task 0.0 in stage 7.0 (TID 7) in 146 ms on localhost (executor driver) (1/1) 17/05/20 10:20:20 INFO TaskSchedulerImpl: Removed TaskSet 7.0, whose tasks have all completed, from pool 17/05/20 10:20:20 INFO DAGScheduler: Job 2 finished: take at LocalTest.scala:59, took 0.198460 s 17/05/20 10:20:20 INFO SparkContext: Starting job: print at LocalTest.scala:64 17/05/20 10:20:20 INFO DAGScheduler: Got job 3 (print at LocalTest.scala:64) with 1 output partitions 17/05/20 10:20:20 INFO DAGScheduler: Final stage: ResultStage 11 (print at LocalTest.scala:64) 17/05/20 10:20:20 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 10) 17/05/20 10:20:20 INFO CheckpointWriter: Submitted checkpoint of time 1495246820000 ms to writer queue 17/05/20 10:20:20 INFO DAGScheduler: Missing parents: List() 17/05/20 10:20:20 INFO DAGScheduler: Submitting ResultStage 11 (MapPartitionsRDD[11] at map at LocalTest.scala:58), which has no missing parents 17/05/20 10:20:20 INFO CheckpointWriter: Saving checkpoint for time 1495246820000 ms to file ‘file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246820000‘ 17/05/20 10:20:20 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 4.0 KB, free 912.2 MB) 17/05/20 10:20:20 INFO MemoryStore: Block broadcast5piece0 stored as bytes in memory (estimated size 2.4 KB, free 912.2 MB) 17/05/20 10:20:20 INFO BlockManagerInfo: Added broadcast5piece0 in memory on 192.168.6.90:52302 (size: 2.4 KB, free: 912.3 MB) 17/05/20 10:20:20 INFO SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:996 17/05/20 10:20:20 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 11 (MapPartitionsRDD[11] at map at LocalTest.scala:58) 17/05/20 10:20:20 INFO TaskSchedulerImpl: Adding task set 11.0 with 1 tasks 17/05/20 10:20:20 INFO TaskSetManager: Starting task 0.0 in stage 11.0 (TID 8, localhost, executor driver, partition 0, PROCESS_LOCAL, 6672 bytes) 17/05/20 10:20:20 INFO Executor: Running task 0.0 in stage 11.0 (TID 8) 17/05/20 10:20:20 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 2 blocks 17/05/20 10:20:20 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 17/05/20 10:20:20 INFO Executor: Finished task 0.0 in stage 11.0 (TID 8). 1718 bytes result sent to driver 17/05/20 10:20:20 INFO TaskSetManager: Finished task 0.0 in stage 11.0 (TID 8) in 39 ms on localhost (executor driver) (1/1) 17/05/20 10:20:20 INFO TaskSchedulerImpl: Removed TaskSet 11.0, whose tasks have all completed, from pool 17/05/20 10:20:20 INFO DAGScheduler: ResultStage 11 (print at LocalTest.scala:64) finished in 0.041 s 17/05/20 10:20:20 INFO DAGScheduler: Job 3 finished: print at LocalTest.scala:64, took 0.053225 s Time: 1495246820000 ms 17/05/20 10:20:20 INFO JobScheduler: Finished job streaming job 1495246820000 ms.0 from job set of time 1495246820000 ms 17/05/20 10:20:20 INFO JobScheduler: Total delay: 0.954 s for time 1495246820000 ms (execution: 0.064 s) 17/05/20 10:20:21 INFO CheckpointWriter: Deleting file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246110000.bk 17/05/20 10:20:21 INFO JobGenerator: Checkpointing graph for time 1495246820000 ms 17/05/20 10:20:21 INFO DStreamGraph: Updating checkpoint data for time 1495246820000 ms 17/05/20 10:20:21 INFO CheckpointWriter: Checkpoint for time 1495246820000 ms saved to file ‘file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246820000‘, took 3942 bytes and 125 ms 17/05/20 10:20:21 INFO DStreamGraph: Updated checkpoint data for time 1495246820000 ms 17/05/20 10:20:21 INFO CheckpointWriter: Submitted checkpoint of time 1495246820000 ms to writer queue 17/05/20 10:20:21 INFO CheckpointWriter: Saving checkpoint for time 1495246820000 ms to file ‘file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246820000‘ 17/05/20 10:20:21 INFO BlockManagerInfo: Removed broadcast4piece0 on 192.168.6.90:52302 in memory (size: 2.4 KB, free: 912.3 MB) 17/05/20 10:20:21 INFO BlockManagerInfo: Removed broadcast1piece0 on 192.168.6.90:52302 in memory (size: 1266.0 B, free: 912.3 MB) 17/05/20 10:20:21 INFO BlockManagerInfo: Removed broadcast2piece0 on 192.168.6.90:52302 in memory (size: 2.7 KB, free: 912.3 MB) 17/05/20 10:20:21 INFO BlockManagerInfo: Removed broadcast3piece0 on 192.168.6.90:52302 in memory (size: 2.9 KB, free: 912.3 MB) 17/05/20 10:20:21 INFO BlockManagerInfo: Removed broadcast5piece0 on 192.168.6.90:52302 in memory (size: 2.4 KB, free: 912.3 MB) 17/05/20 10:20:21 INFO CheckpointWriter: Deleting file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246110000 17/05/20 10:20:21 INFO CheckpointWriter: Checkpoint for time 1495246820000 ms saved to file ‘file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246820000‘, took 3938 bytes and 48 ms 17/05/20 10:20:21 INFO DStreamGraph: Clearing checkpoint data for time 1495246820000 ms 17/05/20 10:20:21 INFO DStreamGraph: Cleared checkpoint data for time 1495246820000 ms 17/05/20 10:20:21 INFO ReceivedBlockTracker: Deleting batches: 17/05/20 10:20:21 INFO FileBasedWriteAheadLog_ReceivedBlockTracker: Attempting to clear 2 old log files in file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/receivedBlockMetadata older than 1495246790000: file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/receivedBlockMetadata/log-1495246055009-1495246115009 file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/receivedBlockMetadata/log-1495246115167-1495246175167 17/05/20 10:20:21 INFO InputInfoTracker: remove old batch metadata: 17/05/20 10:20:21 INFO FileBasedWriteAheadLog_ReceivedBlockTracker: Cleared log files in file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/receivedBlockMetadata older than 1495246790000 17/05/20 10:20:21 INFO FileBasedWriteAheadLog_ReceivedBlockTracker: Cleared log files in file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/receivedBlockMetadata older than 1495246790000 17/05/20 10:20:21 INFO MemoryStore: Block input-0-1495246821600 stored as bytes in memory (estimated size 284.0 B, free 912.2 MB) 17/05/20 10:20:21 INFO BlockManagerInfo: Added input-0-1495246821600 in memory on 192.168.6.90:52302 (size: 284.0 B, free: 912.3 MB) 17/05/20 10:20:21 WARN RandomBlockReplicationPolicy: Expecting 1 replicas with only 0 peer/s. 17/05/20 10:20:21 WARN BlockManager: Block input-0-1495246821600 replicated to only 0 peer(s) instead of 1 peers 17/05/20 10:20:21 INFO BlockGenerator: Pushed block input-0-1495246821600 17/05/20 10:20:24 INFO MemoryStore: Block input-0-1495246824600 stored as bytes in memory (estimated size 284.0 B, free 912.2 MB) 17/05/20 10:20:24 INFO BlockManagerInfo: Added input-0-1495246824600 in memory on 192.168.6.90:52302 (size: 284.0 B, free: 912.3 MB) 17/05/20 10:20:24 WARN RandomBlockReplicationPolicy: Expecting 1 replicas with only 0 peer/s. 17/05/20 10:20:24 WARN BlockManager: Block input-0-1495246824600 replicated to only 0 peer(s) instead of 1 peers 17/05/20 10:20:24 INFO BlockGenerator: Pushed block input-0-1495246824600 17/05/20 10:20:25 INFO ShuffledDStream: Slicing from 1495246815000 ms to 1495246815000 ms (aligned to 1495246815000 ms and 1495246815000 ms) 17/05/20 10:20:25 INFO ShuffledDStream: Time 1495246815000 ms is invalid as zeroTime is 1495246815000 ms , slideDuration is 5000 ms and difference is 0 ms 17/05/20 10:20:25 INFO ShuffledDStream: Slicing from 1495246825000 ms to 1495246825000 ms (aligned to 1495246825000 ms and 1495246825000 ms) 17/05/20 10:20:25 INFO ReducedWindowedDStream: Marking RDD 16 for time 1495246825000 ms for checkpointing 17/05/20 10:20:25 INFO SparkContext: Starting job: sortByKey at LocalTest.scala:58 17/05/20 10:20:25 INFO DAGScheduler: Registering RDD 13 (mapPartitions at LocalTest.scala:36) 17/05/20 10:20:25 INFO DAGScheduler: Got job 4 (sortByKey at LocalTest.scala:58) with 2 output partitions 17/05/20 10:20:25 INFO DAGScheduler: Final stage: ResultStage 15 (sortByKey at LocalTest.scala:58) 17/05/20 10:20:25 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 12, ShuffleMapStage 13, ShuffleMapStage 14) 17/05/20 10:20:25 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 13) 17/05/20 10:20:25 INFO DAGScheduler: Submitting ShuffleMapStage 13 (MapPartitionsRDD[13] at mapPartitions at LocalTest.scala:36), which has no missing parents 17/05/20 10:20:25 INFO MemoryStore: Block broadcast_6 stored as values in memory (estimated size 2.6 KB, free 912.2 MB) 17/05/20 10:20:25 INFO MemoryStore: Block broadcast6piece0 stored as bytes in memory (estimated size 1667.0 B, free 912.2 MB) 17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast6piece0 in memory on 192.168.6.90:52302 (size: 1667.0 B, free: 912.3 MB) 17/05/20 10:20:25 INFO SparkContext: Created broadcast 6 from broadcast at DAGScheduler.scala:996 17/05/20 10:20:25 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 13 (MapPartitionsRDD[13] at mapPartitions at LocalTest.scala:36) 17/05/20 10:20:25 INFO TaskSchedulerImpl: Adding task set 13.0 with 2 tasks 17/05/20 10:20:25 INFO TaskSetManager: Starting task 0.0 in stage 13.0 (TID 9, localhost, executor driver, partition 0, ANY, 6020 bytes) 17/05/20 10:20:25 INFO Executor: Running task 0.0 in stage 13.0 (TID 9) 17/05/20 10:20:25 INFO BlockManager: Found block input-0-1495246821600 locally 17/05/20 10:20:25 INFO LocalTest: Handling events, events is empty: false 17/05/20 10:20:25 INFO LocalTest: Processing words test1 17/05/20 10:20:25 INFO LocalTest: Processing words test2 17/05/20 10:20:25 INFO LocalTest: Processing words test2 17/05/20 10:20:25 INFO LocalTest: Processing words test3 17/05/20 10:20:25 INFO LocalTest: Processing words test3 17/05/20 10:20:25 INFO LocalTest: Processing words test3 17/05/20 10:20:25 INFO LocalTest: Processing words test3 17/05/20 10:20:25 INFO Executor: Finished task 0.0 in stage 13.0 (TID 9). 1498 bytes result sent to driver 17/05/20 10:20:25 INFO TaskSetManager: Starting task 1.0 in stage 13.0 (TID 10, localhost, executor driver, partition 1, ANY, 6020 bytes) 17/05/20 10:20:25 INFO TaskSetManager: Finished task 0.0 in stage 13.0 (TID 9) in 28 ms on localhost (executor driver) (1/2) 17/05/20 10:20:25 INFO Executor: Running task 1.0 in stage 13.0 (TID 10) 17/05/20 10:20:25 INFO BlockManager: Found block input-0-1495246824600 locally 17/05/20 10:20:25 INFO LocalTest: Handling events, events is empty: false 17/05/20 10:20:25 INFO LocalTest: Processing words test1 17/05/20 10:20:25 INFO LocalTest: Processing words test2 17/05/20 10:20:25 INFO LocalTest: Processing words test2 17/05/20 10:20:25 INFO LocalTest: Processing words test3 17/05/20 10:20:25 INFO LocalTest: Processing words test3 17/05/20 10:20:25 INFO LocalTest: Processing words test3 17/05/20 10:20:25 INFO LocalTest: Processing words test3 17/05/20 10:20:25 INFO Executor: Finished task 1.0 in stage 13.0 (TID 10). 1498 bytes result sent to driver 17/05/20 10:20:25 INFO TaskSetManager: Finished task 1.0 in stage 13.0 (TID 10) in 22 ms on localhost (executor driver) (2/2) 17/05/20 10:20:25 INFO TaskSchedulerImpl: Removed TaskSet 13.0, whose tasks have all completed, from pool 17/05/20 10:20:25 INFO DAGScheduler: ShuffleMapStage 13 (mapPartitions at LocalTest.scala:36) finished in 0.048 s 17/05/20 10:20:25 INFO DAGScheduler: looking for newly runnable stages 17/05/20 10:20:25 INFO DAGScheduler: running: Set(ResultStage 0) 17/05/20 10:20:25 INFO DAGScheduler: waiting: Set(ResultStage 15) 17/05/20 10:20:25 INFO DAGScheduler: failed: Set() 17/05/20 10:20:25 INFO DAGScheduler: Submitting ResultStage 15 (MapPartitionsRDD[19] at sortByKey at LocalTest.scala:58), which has no missing parents 17/05/20 10:20:25 INFO MemoryStore: Block broadcast_7 stored as values in memory (estimated size 6.4 KB, free 912.2 MB) 17/05/20 10:20:25 INFO MemoryStore: Block broadcast7piece0 stored as bytes in memory (estimated size 3.6 KB, free 912.2 MB) 17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast7piece0 in memory on 192.168.6.90:52302 (size: 3.6 KB, free: 912.3 MB) 17/05/20 10:20:25 INFO SparkContext: Created broadcast 7 from broadcast at DAGScheduler.scala:996 17/05/20 10:20:25 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 15 (MapPartitionsRDD[19] at sortByKey at LocalTest.scala:58) 17/05/20 10:20:25 INFO TaskSchedulerImpl: Adding task set 15.0 with 2 tasks 17/05/20 10:20:25 INFO TaskSetManager: Starting task 0.0 in stage 15.0 (TID 11, localhost, executor driver, partition 0, PROCESS_LOCAL, 6185 bytes) 17/05/20 10:20:25 INFO Executor: Running task 0.0 in stage 15.0 (TID 11) 17/05/20 10:20:25 INFO BlockManager: Found block rdd60 locally 17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks 17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 17/05/20 10:20:25 INFO MemoryStore: Block rdd140 stored as bytes in memory (estimated size 155.0 B, free 912.2 MB) 17/05/20 10:20:25 INFO BlockManagerInfo: Added rdd140 in memory on 192.168.6.90:52302 (size: 155.0 B, free: 912.3 MB) 17/05/20 10:20:25 INFO MemoryStore: Block rdd160 stored as bytes in memory (estimated size 155.0 B, free 912.2 MB) 17/05/20 10:20:25 INFO BlockManagerInfo: Added rdd160 in memory on 192.168.6.90:52302 (size: 155.0 B, free: 912.3 MB) 17/05/20 10:20:25 INFO Executor: Finished task 0.0 in stage 15.0 (TID 11). 2799 bytes result sent to driver 17/05/20 10:20:25 INFO TaskSetManager: Starting task 1.0 in stage 15.0 (TID 12, localhost, executor driver, partition 1, PROCESS_LOCAL, 6185 bytes) 17/05/20 10:20:25 INFO TaskSetManager: Finished task 0.0 in stage 15.0 (TID 11) in 31 ms on localhost (executor driver) (1/2) 17/05/20 10:20:25 INFO Executor: Running task 1.0 in stage 15.0 (TID 12) 17/05/20 10:20:25 INFO BlockManager: Found block rdd61 locally 17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks 17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 17/05/20 10:20:25 INFO MemoryStore: Block rdd141 stored as bytes in memory (estimated size 180.0 B, free 912.2 MB) 17/05/20 10:20:25 INFO BlockManagerInfo: Added rdd141 in memory on 192.168.6.90:52302 (size: 180.0 B, free: 912.3 MB) 17/05/20 10:20:25 INFO MemoryStore: Block rdd161 stored as bytes in memory (estimated size 180.0 B, free 912.2 MB) 17/05/20 10:20:25 INFO BlockManagerInfo: Added rdd161 in memory on 192.168.6.90:52302 (size: 180.0 B, free: 912.3 MB) 17/05/20 10:20:25 INFO Executor: Finished task 1.0 in stage 15.0 (TID 12). 2803 bytes result sent to driver 17/05/20 10:20:25 INFO TaskSetManager: Finished task 1.0 in stage 15.0 (TID 12) in 39 ms on localhost (executor driver) (2/2) 17/05/20 10:20:25 INFO TaskSchedulerImpl: Removed TaskSet 15.0, whose tasks have all completed, from pool 17/05/20 10:20:25 INFO DAGScheduler: ResultStage 15 (sortByKey at LocalTest.scala:58) finished in 0.069 s 17/05/20 10:20:25 INFO DAGScheduler: Job 4 finished: sortByKey at LocalTest.scala:58, took 0.156155 s 17/05/20 10:20:25 INFO MemoryStore: Block broadcast_8 stored as values in memory (estimated size 127.1 KB, free 912.1 MB) 17/05/20 10:20:25 INFO MemoryStore: Block broadcast8piece0 stored as bytes in memory (estimated size 14.3 KB, free 912.1 MB) 17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast8piece0 in memory on 192.168.6.90:52302 (size: 14.3 KB, free: 912.3 MB) 17/05/20 10:20:25 INFO SparkContext: Created broadcast 8 from sortByKey at LocalTest.scala:58 17/05/20 10:20:25 INFO SparkContext: Starting job: sortByKey at LocalTest.scala:58 17/05/20 10:20:25 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 1 is 83 bytes 17/05/20 10:20:25 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 3 is 157 bytes 17/05/20 10:20:25 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 156 bytes 17/05/20 10:20:25 INFO DAGScheduler: Got job 5 (sortByKey at LocalTest.scala:58) with 2 output partitions 17/05/20 10:20:25 INFO DAGScheduler: Final stage: ResultStage 19 (sortByKey at LocalTest.scala:58) 17/05/20 10:20:25 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 16, ShuffleMapStage 17, ShuffleMapStage 18) 17/05/20 10:20:25 INFO DAGScheduler: Missing parents: List() 17/05/20 10:20:25 INFO DAGScheduler: Submitting ResultStage 19 (MapPartitionsRDD[16] at reduceByKeyAndWindow at LocalTest.scala:52), which has no missing parents 17/05/20 10:20:25 INFO MemoryStore: Block broadcast_9 stored as values in memory (estimated size 6.5 KB, free 912.1 MB) 17/05/20 10:20:25 INFO MemoryStore: Block broadcast9piece0 stored as bytes in memory (estimated size 3.6 KB, free 912.1 MB) 17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast9piece0 in memory on 192.168.6.90:52302 (size: 3.6 KB, free: 912.3 MB) 17/05/20 10:20:25 INFO SparkContext: Created broadcast 9 from broadcast at DAGScheduler.scala:996 17/05/20 10:20:25 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 19 (MapPartitionsRDD[16] at reduceByKeyAndWindow at LocalTest.scala:52) 17/05/20 10:20:25 INFO TaskSchedulerImpl: Adding task set 19.0 with 2 tasks 17/05/20 10:20:25 INFO TaskSetManager: Starting task 0.0 in stage 19.0 (TID 13, localhost, executor driver, partition 0, PROCESS_LOCAL, 6122 bytes) 17/05/20 10:20:25 INFO Executor: Running task 0.0 in stage 19.0 (TID 13) 17/05/20 10:20:25 INFO BlockManager: Found block rdd160 locally 17/05/20 10:20:25 INFO Executor: Finished task 0.0 in stage 19.0 (TID 13). 1085 bytes result sent to driver 17/05/20 10:20:25 INFO TaskSetManager: Starting task 1.0 in stage 19.0 (TID 14, localhost, executor driver, partition 1, PROCESS_LOCAL, 6122 bytes) 17/05/20 10:20:25 INFO TaskSetManager: Finished task 0.0 in stage 19.0 (TID 13) in 30 ms on localhost (executor driver) (1/2) 17/05/20 10:20:25 INFO Executor: Running task 1.0 in stage 19.0 (TID 14) 17/05/20 10:20:25 INFO BlockManager: Found block rdd161 locally 17/05/20 10:20:25 INFO Executor: Finished task 1.0 in stage 19.0 (TID 14). 1172 bytes result sent to driver 17/05/20 10:20:25 INFO TaskSetManager: Finished task 1.0 in stage 19.0 (TID 14) in 85 ms on localhost (executor driver) (2/2) 17/05/20 10:20:25 INFO TaskSchedulerImpl: Removed TaskSet 19.0, whose tasks have all completed, from pool 17/05/20 10:20:25 INFO DAGScheduler: ResultStage 19 (sortByKey at LocalTest.scala:58) finished in 0.115 s 17/05/20 10:20:25 INFO DAGScheduler: Job 5 finished: sortByKey at LocalTest.scala:58, took 0.129602 s 17/05/20 10:20:25 INFO MemoryStore: Block broadcast_10 stored as values in memory (estimated size 127.1 KB, free 912.0 MB) 17/05/20 10:20:25 INFO MemoryStore: Block broadcast10piece0 stored as bytes in memory (estimated size 14.3 KB, free 911.9 MB) 17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast10piece0 in memory on 192.168.6.90:52302 (size: 14.3 KB, free: 912.2 MB) 17/05/20 10:20:25 INFO SparkContext: Created broadcast 10 from sortByKey at LocalTest.scala:58 17/05/20 10:20:25 INFO ReliableRDDCheckpointData: Done checkpointing RDD 16 to file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/874437fe-0c21-4a9d-8793-8ae914f3f38b/rdd-16, new parent is RDD 20 17/05/20 10:20:25 INFO SparkContext: Starting job: take at LocalTest.scala:59 17/05/20 10:20:25 INFO DAGScheduler: Registering RDD 17 (map at LocalTest.scala:58) 17/05/20 10:20:25 INFO DAGScheduler: Got job 6 (take at LocalTest.scala:59) with 1 output partitions 17/05/20 10:20:25 INFO DAGScheduler: Final stage: ResultStage 21 (take at LocalTest.scala:59) 17/05/20 10:20:25 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 20) 17/05/20 10:20:25 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 20) 17/05/20 10:20:25 INFO DAGScheduler: Submitting ShuffleMapStage 20 (MapPartitionsRDD[17] at map at LocalTest.scala:58), which has no missing parents 17/05/20 10:20:25 INFO MemoryStore: Block broadcast_11 stored as values in memory (estimated size 6.1 KB, free 911.9 MB) 17/05/20 10:20:25 INFO MemoryStore: Block broadcast11piece0 stored as bytes in memory (estimated size 3.5 KB, free 911.9 MB) 17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast11piece0 in memory on 192.168.6.90:52302 (size: 3.5 KB, free: 912.2 MB) 17/05/20 10:20:25 INFO SparkContext: Created broadcast 11 from broadcast at DAGScheduler.scala:996 17/05/20 10:20:25 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 20 (MapPartitionsRDD[17] at map at LocalTest.scala:58) 17/05/20 10:20:25 INFO TaskSchedulerImpl: Adding task set 20.0 with 2 tasks 17/05/20 10:20:25 INFO TaskSetManager: Starting task 0.0 in stage 20.0 (TID 15, localhost, executor driver, partition 0, PROCESS_LOCAL, 6169 bytes) 17/05/20 10:20:25 INFO Executor: Running task 0.0 in stage 20.0 (TID 15) 17/05/20 10:20:25 INFO BlockManager: Found block rdd160 locally 17/05/20 10:20:25 INFO Executor: Finished task 0.0 in stage 20.0 (TID 15). 1413 bytes result sent to driver 17/05/20 10:20:25 INFO TaskSetManager: Starting task 1.0 in stage 20.0 (TID 16, localhost, executor driver, partition 1, PROCESS_LOCAL, 6169 bytes) 17/05/20 10:20:25 INFO TaskSetManager: Finished task 0.0 in stage 20.0 (TID 15) in 44 ms on localhost (executor driver) (1/2) 17/05/20 10:20:25 INFO Executor: Running task 1.0 in stage 20.0 (TID 16) 17/05/20 10:20:25 INFO BlockManager: Found block rdd161 locally 17/05/20 10:20:25 INFO Executor: Finished task 1.0 in stage 20.0 (TID 16). 1413 bytes result sent to driver 17/05/20 10:20:25 INFO TaskSetManager: Finished task 1.0 in stage 20.0 (TID 16) in 58 ms on localhost (executor driver) (2/2) 17/05/20 10:20:25 INFO TaskSchedulerImpl: Removed TaskSet 20.0, whose tasks have all completed, from pool 17/05/20 10:20:25 INFO DAGScheduler: ShuffleMapStage 20 (map at LocalTest.scala:58) finished in 0.101 s 17/05/20 10:20:25 INFO DAGScheduler: looking for newly runnable stages 17/05/20 10:20:25 INFO DAGScheduler: running: Set(ResultStage 0) 17/05/20 10:20:25 INFO DAGScheduler: waiting: Set(ResultStage 21) 17/05/20 10:20:25 INFO DAGScheduler: failed: Set() 17/05/20 10:20:25 INFO DAGScheduler: Submitting ResultStage 21 (MapPartitionsRDD[22] at map at LocalTest.scala:58), which has no missing parents 17/05/20 10:20:25 INFO MemoryStore: Block broadcast_12 stored as values in memory (estimated size 4.0 KB, free 911.9 MB) 17/05/20 10:20:25 INFO MemoryStore: Block broadcast12piece0 stored as bytes in memory (estimated size 2.4 KB, free 911.9 MB) 17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast12piece0 in memory on 192.168.6.90:52302 (size: 2.4 KB, free: 912.2 MB) 17/05/20 10:20:25 INFO SparkContext: Created broadcast 12 from broadcast at DAGScheduler.scala:996 17/05/20 10:20:25 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 21 (MapPartitionsRDD[22] at map at LocalTest.scala:58) 17/05/20 10:20:25 INFO TaskSchedulerImpl: Adding task set 21.0 with 1 tasks 17/05/20 10:20:25 INFO TaskSetManager: Starting task 0.0 in stage 21.0 (TID 17, localhost, executor driver, partition 0, ANY, 5903 bytes) 17/05/20 10:20:25 INFO Executor: Running task 0.0 in stage 21.0 (TID 17) 17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 2 blocks 17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 17/05/20 10:20:25 INFO Executor: Finished task 0.0 in stage 21.0 (TID 17). 1869 bytes result sent to driver 17/05/20 10:20:25 INFO TaskSetManager: Finished task 0.0 in stage 21.0 (TID 17) in 28 ms on localhost (executor driver) (1/1) 17/05/20 10:20:25 INFO TaskSchedulerImpl: Removed TaskSet 21.0, whose tasks have all completed, from pool 17/05/20 10:20:25 INFO DAGScheduler: ResultStage 21 (take at LocalTest.scala:59) finished in 0.029 s 17/05/20 10:20:25 INFO DAGScheduler: Job 6 finished: take at LocalTest.scala:59, took 0.150449 s 17/05/20 10:20:25 INFO SparkContext: Starting job: take at LocalTest.scala:59 17/05/20 10:20:25 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 4 is 160 bytes 17/05/20 10:20:25 INFO DAGScheduler: Got job 7 (take at LocalTest.scala:59) with 1 output partitions 17/05/20 10:20:25 INFO DAGScheduler: Final stage: ResultStage 23 (take at LocalTest.scala:59) 17/05/20 10:20:25 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 22) 17/05/20 10:20:25 INFO DAGScheduler: Missing parents: List() 17/05/20 10:20:25 INFO DAGScheduler: Submitting ResultStage 23 (MapPartitionsRDD[22] at map at LocalTest.scala:58), which has no missing parents 17/05/20 10:20:25 INFO MemoryStore: Block broadcast_13 stored as values in memory (estimated size 4.0 KB, free 911.9 MB) 17/05/20 10:20:25 INFO MemoryStore: Block broadcast13piece0 stored as bytes in memory (estimated size 2.4 KB, free 911.9 MB) 17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast13piece0 in memory on 192.168.6.90:52302 (size: 2.4 KB, free: 912.2 MB) 17/05/20 10:20:25 INFO SparkContext: Created broadcast 13 from broadcast at DAGScheduler.scala:996 17/05/20 10:20:25 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 23 (MapPartitionsRDD[22] at map at LocalTest.scala:58) 17/05/20 10:20:25 INFO TaskSchedulerImpl: Adding task set 23.0 with 1 tasks 17/05/20 10:20:25 INFO TaskSetManager: Starting task 0.0 in stage 23.0 (TID 18, localhost, executor driver, partition 1, ANY, 5903 bytes) 17/05/20 10:20:25 INFO Executor: Running task 0.0 in stage 23.0 (TID 18) 17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks 17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 17/05/20 10:20:25 INFO Executor: Finished task 0.0 in stage 23.0 (TID 18). 1869 bytes result sent to driver 17/05/20 10:20:25 INFO TaskSetManager: Finished task 0.0 in stage 23.0 (TID 18) in 75 ms on localhost (executor driver) (1/1) 17/05/20 10:20:25 INFO TaskSchedulerImpl: Removed TaskSet 23.0, whose tasks have all completed, from pool 17/05/20 10:20:25 INFO DAGScheduler: ResultStage 23 (take at LocalTest.scala:59) finished in 0.077 s 17/05/20 10:20:25 INFO DAGScheduler: Job 7 finished: take at LocalTest.scala:59, took 0.087657 s 17/05/20 10:20:25 INFO JobScheduler: Added jobs for time 1495246825000 ms 17/05/20 10:20:25 INFO JobGenerator: Checkpointing graph for time 1495246825000 ms 17/05/20 10:20:25 INFO DStreamGraph: Updating checkpoint data for time 1495246825000 ms 17/05/20 10:20:25 INFO JobScheduler: Starting job streaming job 1495246825000 ms.0 from job set of time 1495246825000 ms (test3 ,8) (test2 ,4) 17/05/20 10:20:25 INFO DStreamGraph: Updated checkpoint data for time 1495246825000 ms 17/05/20 10:20:25 INFO SparkContext: Starting job: print at LocalTest.scala:64 17/05/20 10:20:25 INFO CheckpointWriter: Saving checkpoint for time 1495246825000 ms to file ‘file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246825000‘ 17/05/20 10:20:25 INFO CheckpointWriter: Submitted checkpoint of time 1495246825000 ms to writer queue 17/05/20 10:20:25 INFO DAGScheduler: Got job 8 (print at LocalTest.scala:64) with 1 output partitions 17/05/20 10:20:25 INFO DAGScheduler: Final stage: ResultStage 25 (print at LocalTest.scala:64) 17/05/20 10:20:25 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 24) 17/05/20 10:20:25 INFO DAGScheduler: Missing parents: List() 17/05/20 10:20:25 INFO DAGScheduler: Submitting ResultStage 25 (MapPartitionsRDD[22] at map at LocalTest.scala:58), which has no missing parents 17/05/20 10:20:25 INFO MemoryStore: Block broadcast_14 stored as values in memory (estimated size 4.0 KB, free 911.9 MB) 17/05/20 10:20:25 INFO MemoryStore: Block broadcast14piece0 stored as bytes in memory (estimated size 2.4 KB, free 911.9 MB) 17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast14piece0 in memory on 192.168.6.90:52302 (size: 2.4 KB, free: 912.2 MB) 17/05/20 10:20:25 INFO SparkContext: Created broadcast 14 from broadcast at DAGScheduler.scala:996 17/05/20 10:20:25 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 25 (MapPartitionsRDD[22] at map at LocalTest.scala:58) 17/05/20 10:20:25 INFO TaskSchedulerImpl: Adding task set 25.0 with 1 tasks 17/05/20 10:20:25 INFO TaskSetManager: Starting task 0.0 in stage 25.0 (TID 19, localhost, executor driver, partition 0, ANY, 6672 bytes) 17/05/20 10:20:25 INFO Executor: Running task 0.0 in stage 25.0 (TID 19) 17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 2 blocks 17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 17/05/20 10:20:25 INFO CheckpointWriter: Deleting file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246115000.bk 17/05/20 10:20:25 INFO CheckpointWriter: Checkpoint for time 1495246825000 ms saved to file ‘file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246825000‘, took 4010 bytes and 37 ms 17/05/20 10:20:25 INFO Executor: Finished task 0.0 in stage 25.0 (TID 19). 1869 bytes result sent to driver 17/05/20 10:20:25 INFO TaskSetManager: Finished task 0.0 in stage 25.0 (TID 19) in 37 ms on localhost (executor driver) (1/1) 17/05/20 10:20:25 INFO TaskSchedulerImpl: Removed TaskSet 25.0, whose tasks have all completed, from pool 17/05/20 10:20:25 INFO DAGScheduler: ResultStage 25 (print at LocalTest.scala:64) finished in 0.039 s 17/05/20 10:20:25 INFO DAGScheduler: Job 8 finished: print at LocalTest.scala:64, took 0.046366 s 17/05/20 10:20:25 INFO SparkContext: Starting job: print at LocalTest.scala:64 17/05/20 10:20:25 INFO DAGScheduler: Got job 9 (print at LocalTest.scala:64) with 1 output partitions 17/05/20 10:20:25 INFO DAGScheduler: Final stage: ResultStage 27 (print at LocalTest.scala:64) 17/05/20 10:20:25 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 26) 17/05/20 10:20:25 INFO DAGScheduler: Missing parents: List() 17/05/20 10:20:25 INFO DAGScheduler: Submitting ResultStage 27 (MapPartitionsRDD[22] at map at LocalTest.scala:58), which has no missing parents 17/05/20 10:20:25 INFO MemoryStore: Block broadcast_15 stored as values in memory (estimated size 4.0 KB, free 911.9 MB) 17/05/20 10:20:25 INFO MemoryStore: Block broadcast15piece0 stored as bytes in memory (estimated size 2.4 KB, free 911.9 MB) 17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast15piece0 in memory on 192.168.6.90:52302 (size: 2.4 KB, free: 912.2 MB) 17/05/20 10:20:25 INFO SparkContext: Created broadcast 15 from broadcast at DAGScheduler.scala:996 17/05/20 10:20:25 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 27 (MapPartitionsRDD[22] at map at LocalTest.scala:58) 17/05/20 10:20:25 INFO TaskSchedulerImpl: Adding task set 27.0 with 1 tasks 17/05/20 10:20:25 INFO TaskSetManager: Starting task 0.0 in stage 27.0 (TID 20, localhost, executor driver, partition 1, ANY, 6672 bytes) 17/05/20 10:20:25 INFO Executor: Running task 0.0 in stage 27.0 (TID 20) 17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks 17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms Time: 1495246825000 ms (test3 ,8) (test2 ,4) (test1 ,2)
时间: 2024-11-05 19:42:22