sc.intersection

用来找到两个rdd的交集,注意,最终的new rdd的分区数量取决于两个rdd中的最大分区数量。

测试一下:

val data1 = sc.parallelize(1 to 20,1)
val data2 = sc.parallelize(1 to 5,2)
val data3 = data1.intersection(data2)
data3.partitions.length
data3.collect

输出结果是1,2,3,4,5

data3的分区数量是2

时间: 2024-09-17 17:03:40

sc.intersection的相关文章

spark 教程三 spark Map filter flatMap union distinct intersection操作

RDD的创建 spark 所有的操作都围绕着弹性分布式数据集(RDD)进行,这是一个有容错机制的并可以被并行操作的元素集合,具有只读.分区.容错.高效.无需物化.可以缓存.RDD依赖等特征 RDD的创建基础RDD 1.并行集合(Parallelized Collections):接收一个已经存在的Scala集合,然后进行各种并行运算 var sc=new SparkContext(conf) var rdd=sc.parallelize(Array(2,4,9,3,5,7,8,1,6)); rd

RDD常用方法之subtract&intersection&cartesian

subtract Return an RDD with the elements from `this` that are not in `other` . def subtract(other: RDD[T]): RDD[T] def subtract(other: RDD[T], numPartitions: Int): RDD[T] def subtract(other: RDD[T], p: Partitioner): RDD[T] val a = sc.parallelize(1 to

[LeetCode] 349 Intersection of Two Arrays & 350 Intersection of Two Arrays II

这两道题都是求两个数组之间的重复元素,因此把它们放在一起. 原题地址: 349 Intersection of Two Arrays :https://leetcode.com/problems/intersection-of-two-arrays/description/ 350 Intersection of Two Arrays II:https://leetcode.com/problems/intersection-of-two-arrays-ii/description/ 题目&解法

350.求两个数组的交集 Intersection of Two Arrays II

Given two arrays, write a function to compute their intersection. Example:Given nums1 = [1, 2, 2, 1], nums2 = [2, 2], return [2, 2]. Note: Each element in the result should appear as many times as it shows in both arrays. The result can be in any ord

Intersection of Two Arrays I & II

题目链接:https://leetcode.com/problems/intersection-of-two-arrays/ 题目大意:要求两个数组的交集(注意集合是不能含有重复的元素的) 方法1) 先对两个数组进行排序,设置两个指针pA和pB,分别指向这两个数组,比较nums1[pA]和nums[pB] a. 如果想等,则为交集中的元素,++pA, ++pB b. 如果nums[pA] < nums[pB],则++pA c. 否则,++pB 注意数组中有重复的元素(实现代码中的小trick)

Intersection of Two Linked Lists

Write a program to find the node at which the intersection of two singly linked lists begins. For example, the following two linked lists: A: a1 → a2 c1 → c2 → c3 B: b1 → b2 → b3 begin to intersect at node c1. Notes: If the two linked lists have no i

哈希(4) - 求两个链表的交集(intersection)以及并集(union)

给定两个链表,求它们的交集以及并集.用于输出的list中的元素顺序可不予考虑. 例子: 输入下面两个链表: list1: 10->15->4->20 list2: 8->4->2->10 输出链表: 交集list: 4->10 并集list: 2->8->20->4->15->10 方法1 (简单方法) 可以参考链表系列中的"链表操作 - 求两个链表的交集(intersection)以及并集(union)" 方法2

[C++]LeetCode: 60 Intersection of Two Linked Lists

题目: Write a program to find the node at which the intersection of two singly linked lists begins. For example, the following two linked lists: A: a1 → a2 c1 → c2 → c3 B: b1 → b2 → b3 begin to intersect at node c1. Notes: If the two linked lists have

350. Intersection of Two Arrays II(LeetCode)

Given two arrays, write a function to compute their intersection. Example:Given nums1 = [1, 2, 2, 1], nums2 = [2, 2], return [2, 2]. Note: Each element in the result should appear as many times as it shows in both arrays. The result can be in any ord