Spark Standalone模式下的Application
Application是Spark中类似于Hadoop的Job的用户提交的应用。sc是Spark集群初始化时创建的SparkContext,Spark中包含Action算子和Transferer算子。有宽依赖和窄依赖。默认情况下Spark的调度器(DAGScheduler)是FIFO方式。
//默认排序输出到磁盘文件
scala> val r1 = sc.textFile("/root/rdd1.txt").flatMap(_.split(" ")).map(x=>(x,1)).reduceByKey(_+_).saveAsTextFile("/root/rddOut/noSort")
FileOutputCommitter: Saved output of task ‘attempt_201507140546_0014_m_000000_14‘ to file:/root/rddOut/noSort/_temporary/0/task_201507140546_0014_m_000000
//字典序正序排序输出到磁盘文件
val r1 = sc.textFile("/root/rdd1.txt").flatMap(_.split(" ")).map(x=>(x,1)).reduceByKey(_+_).sortByKey(true).saveAsTextFile("/root/rddOut/zSort")
FileOutputCommitter: Saved output of task ‘attempt_201507140546_0017_m_000000_17‘ to file:/root/rddOut/zSort/_temporary/0/task_201507140546_0017_m_000000
//字典序倒序排序输出到磁盘文件
val r1 = sc.textFile("/root/rdd1.txt").flatMap(_.split(" ")).map(x=>(x,1)).reduceByKey(_+_).sortByKey(false).saveAsTextFile("/root/rddOut/fSort")
FileOutputCommitter: Saved output of task ‘attempt_201507140547_0020_m_000000_20‘ to file:/root/rddOut/fSort/_temporary/0/task_201507140547_0020_m_000000