MapReduce 多表连接

题目描述：

现在有两个文件，1为存放公司名字和城市ID，2为存放城市ID和城市名

表一：

factoryname,addressed
Beijing Red Star,1
Shenzhen Thunder,3
Guangzhou Honda,2
Beijing Rising,1
Guangzhou Development Bank,2
Tencent,3
Back of Beijing,1

表2：

1,Beijing
2,Guangzhou
3,Shenzhen
4,Xian

现在要求输出公司名和城市名。例如：

Beijing Red Star Beijing

这个类似数据库里的多表连接。整体思路和单表连接差不多。还是利用reduce阶段对城市ID进行归并，我们在map阶段统一输出key=城市ID value=falg+“+”+城市名or公司名。然后通过reduce对flag的解析，分析后者是城市名还是公司名，并放到两个数组中，最后利用笛卡尔积将其输出

具体代码

public class MyMapper extends Mapper<LongWritable, Text, Text, Text> {

public void map(LongWritable ikey, Text ivalue, Context context )

throws IOException, InterruptedException {

String line=ivalue.toString();

StringTokenizer st= new StringTokenizer(line,"," );

String value0=st.nextToken();

String value1=st.nextToken();

if(value0.compareTo("factoryname" )!=0){

if(value0.length()==1){

context.write(new Text(value0), new Text("1" +"+"+value1));

} else{

context.write(new Text(value1), new Text("2" +"+"+value0));

}

public class MyReducer extends Reducer<Text, Text, Text, Text> {

public void reduce(Text _key, Iterable<Text> values, Context context)

throws IOException, InterruptedException {

// process values

ArrayList<String> address= new ArrayList<String>();

ArrayList<String> factory= new ArrayList<String>();

for (Text val : values) {

String line=val.toString();

StringTokenizer st=new StringTokenizer(line,"+" );

int flag=Integer.parseInt(st.nextToken());

if(flag==1){

String addressname=st.nextToken();

address.add(addressname);

} else if (flag==2){

String factoryname=st.nextToken();

factory.add(factoryname);

}

if(address.size()!=0&&factory.size()!=0){

for(int i=0;i<address.size();i++){

for(int j=0;j<factory.size();j++){

context.write( new Text(address.get(i)),new Text(factory.get(j)));

}

时间： 2024-10-05 10:22:33

MapReduce 多表连接

MapReduce 多表连接的相关文章

Hadoop阅读笔记（三）——深入MapReduce排序和单表连接

Hadoop 多表连接

MapReduce处理表的自连接

Hadoop 学习之单表连接

SQL Server三种表连接原理

每天一点数据库之-----Day 9 表连接

表连接查询与where后使用子查询的性能分析。

Mysql 表连接查询

Mysql表连接