jnd = join a by f1, b by f2;
join操作默认的是内连接,只有两边都匹配才会保留
需要用null补位的那边需要知道它的模式:
如果是左外连接,需要知道右边的数据集的模式,不匹配的字段用null补位
如果是右外连接,需要知道左边的数据集的模式,不匹配的字段用null补位
如果是全外连接,需要知道两边的数据集的模式,不匹配的字段用null补位
触发reduce阶段
基本用法
a = load ‘input1‘; b = load ‘input2‘; jnd = join a by $0, b by $1;
多字段连接
a = load ‘input1‘ as (username, age, city); b = load ‘input2‘ as (orderid, user, city); jnd = join a by (username, city), b by (user, city);
:: join后的字段引用
a = load ‘input1‘ as (username, age, address); b = load ‘input2‘ as (orderid, user, money; jnd = join a by username, b by user; result = foreach jnd generate a::username, a::age, address, b::orderid;
多数据集连接
a = load ‘input1‘ as (username, age); b = load ‘input2‘ as (orderid, user); c = load ‘input3‘ as (user, acount); jnd = join a by username, b by user, c by user;
外连接 仅限两个数据集
a = load ‘input1‘ as (username, age); b = load ‘input2‘ as (orderid, user); jnd = join a by username left outer, b by user; jnd = join a by username right, b by user; jnd = join a by username full, b by user;
自连接 需要加载自身数据集两次,使用不同的别名
a = load ‘data‘ as (node, parentid, name); b = load ‘data‘ as (node, parentid, name); jnd = join a by node, b by parentid;
时间: 2024-10-12 03:07:03