MapReduce实现基于物品的协同过滤:
实现过程中需要执行多个mapreduce任务。
初始数据:
u1,i101,5.0 u1,i102,3.0 u1,i103,2.5 u2,i101,2.0 u2,i102,2.5 u2,i103,5.0 u2,i104,2.0 u3,i101,2.0 u3,i104,4.0 u3,i105,4.5 u3,i107,5.0 u4,i101,5.0 u4,i103,3.0 u4,i104,4.5 u4,i106,4.0 u5,i101,4.0 u5,i102,3.0 u5,i103,2.0 u5,i104,4.0 u5,i105,3.5 u5,i106,4.0
job1: 生成用户对物品喜爱度矩阵
数据:初始数据
map:
key=userid
value=item:grade
reduce:
key=userid
value=item:grade,item:grade
结果:
u1 i101:5.0,i102:3.0,i103:2.5 u2 i101:2.0,i102:2.5,i103:5.0,i104:2.0 u3 i107:5.0,i105:4.5,i104:4.0,i101:2.0 u4 i106:4.0,i103:3.0,i101:5.0,i104:4.5 u5 i104:4.0,i105:3.5,i106:4.0,i101:4.0,i102:3.0,i103:2.0
job2: 生成物品与物品的同现矩阵
数据:job1的结果数据
map:
例,将i101 ,i102 ,i103 循环组合
key=item:item
value=1
reduce:
key=item:item
value=n
结果:
i101:i104 4 i101:i105 2 i101:i106 2 i101:i107 1 i102:i101 3 i102:i102 3 i102:i103 3 i102:i104 2 i102:i105 1 i102:i106 1 i103:i101 4 i103:i102 3 i103:i103 4 i103:i104 3 i103:i105 1 i103:i106 2 i104:i101 4 i104:i102 2 i104:i103 3 i104:i104 4 i104:i105 2 i104:i106 2 i104:i107 1 i105:i101 2 i105:i102 1 i105:i103 1 i105:i104 2 i105:i105 2 i105:i106 1 i105:i107 1 i106:i101 2 i106:i102 1 i106:i103 2 i106:i104 2 i106:i105 1 i106:i106 2 i107:i101 1 i107:i104 1 i107:i105 1 i107:i107 1
job3:将同现矩阵和用户喜爱度矩阵进行相乘
数据:job1和job2的输出数据
map:
区分不同的数据进行处理,根据文件目录进行区分
FileSplit split = (FileSplit)context.getInputSplit();
dirName = split.getPath().getParent().getName();
job1的数据经过map处理:
i101 B:u1,5.0
i102 B:u1,3.0
i103 B:u1,2.5
job2的数据经过map处理:
i101 A:i101,5
key=item
value=B:u1,5.0或A:i101,5
reduce:
针对同一个item的数据,A的数据,分别和B的数据进行相乘
key=user
value=item,score
u2 i105,4.0
结果:
u2 i105,4.0 u1 i105,14.0 u4 i105,24.0 u3 i105,28.0 u5 i105,36.0 u2 i104,44.0 u1 i104,64.0 u4 i104,84.0 u3 i104,92.0 u5 i104,108.0 u2 i107,110.0 u1 i107,115.0 u4 i107,120.0 u3 i107,122.0 u5 i107,126.0 u2 i106,130.0 u1 i106,140.0 u4 i106,150.0 u3 i106,154.0 u5 i106,162.0 u2 i101,172.0 u1 i101,197.0 u4 i101,222.0 u3 i101,232.0 u5 i101,252.0 u2 i103,260.0 u1 i103,280.0 u4 i103,300.0 u3 i103,308.0 u5 i103,324.0 u2 i102,330.0 u1 i102,345.0 u4 i102,360.0 u3 i102,366.0 u5 i102,378.0 u2 i105,2.5 u1 i105,5.5 u5 i105,8.5 u2 i104,13.5 u1 i104,19.5 u5 i104,25.5 u2 i106,28.0 u1 i106,31.0 u5 i106,34.0 u2 i101,41.5 u1 i101,50.5 u5 i101,59.5 u2 i103,67.0 u1 i103,76.0 u5 i103,85.0 u2 i102,92.5 u1 i102,101.5 u5 i102,110.5 u2 i105,5.0 u1 i105,7.5 u4 i105,10.5 u5 i105,12.5 u2 i104,27.5 u1 i104,35.0 u4 i104,44.0 u5 i104,50.0 u2 i106,60.0 u1 i106,65.0 u4 i106,71.0 u5 i106,75.0 u2 i101,95.0 u1 i101,105.0 u4 i101,117.0 u5 i101,125.0 u2 i103,145.0 u1 i103,155.0 u4 i103,167.0 u5 i103,175.0 u2 i102,190.0 u1 i102,197.5 u4 i102,206.5 u5 i102,212.5 u2 i105,4.0 u4 i105,13.0 u3 i105,21.0 u5 i105,29.0 u2 i104,37.0 u4 i104,55.0 u3 i104,71.0 u5 i104,87.0 u2 i107,89.0 u4 i107,93.5 u3 i107,97.5 u5 i107,101.5 u2 i106,105.5 u4 i106,114.5 u3 i106,122.5 u5 i106,130.5 u2 i101,138.5 u4 i101,156.5 u3 i101,172.5 u5 i101,188.5 u2 i103,194.5 u4 i103,208.0 u3 i103,220.0 u5 i103,232.0 u2 i102,236.0 u4 i102,245.0 u3 i102,253.0 u5 i102,261.0 u3 i105,9.0 u5 i105,16.0 u3 i104,25.0 u5 i104,32.0 u3 i107,36.5 u5 i107,40.0 u3 i106,44.5 u5 i106,48.0 u3 i101,57.0 u5 i101,64.0 u3 i103,68.5 u5 i103,72.0 u3 i102,76.5 u5 i102,80.0 u4 i105,4.0 u5 i105,8.0 u4 i104,16.0 u5 i104,24.0 u4 i106,32.0 u5 i106,40.0 u4 i101,48.0 u5 i101,56.0 u4 i103,64.0 u5 i103,72.0 u4 i102,76.0 u5 i102,80.0 u3 i105,5.0 u3 i104,10.0 u3 i107,15.0 u3 i101,20.0
Job4: 矩阵乘法求和
map:
不做特殊处理
key:user
value:item,score
reduce:
将相同的user及item的score的值进行相加。
key:user
value:item,score
结果:
u1 i105:15.5 u1 i104:33.5 u1 i107:5.0 u1 i106:18.0 u1 i101:44.0 u1 i103:39.0 u1 i102:31.5 u2 i105:15.5 u2 i104:36.0 u2 i107:4.0 u2 i106:20.5 u2 i101:45.5 u2 i103:41.5 u2 i102:32.5 u3 i105:26.0 u3 i104:38.0 u3 i107:15.5 u3 i106:16.5 u3 i101:40.0 u3 i103:24.5 u3 i102:18.5 u4 i105:26.0 u4 i104:55.0 u4 i107:9.5 u4 i106:33.0 u4 i101:63.0 u4 i103:53.5 u4 i102:37.0 u5 i105:32.0 u5 i104:59.0 u5 i107:11.5 u5 i106:34.5 u5 i101:68.0 u5 i103:56.5 u5 i102:42.5
此结果为用户对各个物品的喜爱度。
时间: 2024-10-27 13:00:37