只作为我个人笔记,没有过多解释
Transfor
map
filter filter之后,依然有三个分区,第二个分区为空,但不会消失
flatMap
reduceByKey
groupByKey()
sortByKey()
val pets = sc.parallelize( List((“cat”, 1), (“dog”, 1), (“cat”, 2)) ) pets.reduceByKey(_ + _) // => {(cat, 3), (dog, 1)} pets.groupByKey() // => {(cat, Seq(1, 2)), (dog, Seq(1)} pets.sortByKey() // => {(cat, 1), (cat, 2), (dog, 1)}
mapValues(_ + 1) mapvalues是忽略掉key,只把value进行操作
join RDD[(String, Int)].join(RDD[(String, Long)]) => RDD[(String, (Int, Long))]
join这两个rdd的value类型可以不一样,至于分区是根据hash来指定的
union
cogroup
用 cogroup 实现 join
sample() 从数据集中采样
cartesian() 求笛卡尔积
pipe() 传入一个外部程序
Action
collect()
take(2)
count()
reduce
foreach(println)
时间: 2024-12-18 01:52:02