使用方法如下
set mapred.reduce.tasks = 300; add file /home/work/process.py; insert overwrite directory ‘/mydir/‘ select * from( from( select id, name from hive_table_one where name = ‘张三‘ )one join ( select id, name from hive_table_two where name = ‘李四‘ )two on one.id = two.id reduce one.id, one.name, two.id, two.name using ‘/home/sharelib/python/bin/python process.py‘ as id, name )redall
在process.py脚本处理Hive表中的NULL数据时,需要注意一下。
# 判断name是否为NULL的语句如下 if (name == ‘\N‘) #如果是先查询出结果,存成文本,再进行处理。那么就会是 if (name == ‘NULL‘)
时间: 2024-10-11 00:47:17