1. 下载
wget http://repo1.maven.org/maven2/org/mongodb/mongo-hadoop/mongo-hadoop-hive/2.0.2/mongo-hadoop-hive-2.0.2.jar
wget http://repo1.maven.org/maven2/org/mongodb/mongo-hadoop/mongo-hadoop-core/2.0.2/mongo-hadoop-core-2.0.2.jar
wget http://repo1.maven.org/maven2/org/mongodb/mongo-java-driver/3.6.0/mongo-java-driver-3.6.0.jar
2. 放入hive的lib目录
因为本人搭建的是CDH版本的hadoop集群,所以hive的lib目录在:
/opt/cloudera/parcels/CDH/lib/hive/lib
把上面三个jar包分别放入集群每个节点上的目录中。然后做成软链接,如下:
ln -s mongo-hadoop-hive-2.0.2.jar mongo-hadoop-hive.jar
ln -s mongo-hadoop-core-2.0.2.jar mongo-hadoop-core.jar
ln -s mongo-java-driver-3.6.0.jar mongo-java-driver.jar
如图:
3. mongdb中插入一些测试数据:
db.student.insert({"name":"张三","age":"22","sex":"男","class":"计算机2班"});//如果数据库中不存在集合,就创建
db.student.insert({"name":"李四","age":"23","sex":"女","class":"计算机3班"});
db.student.insert({"name":"王五","age":"24","sex":"男","class":"计算机2班"});
db.student.insert({"name":"刘六","age":"25","sex":"男","class":"计算机3班"});
db.student.insert({"name":"赵七","age":"26","sex":"女","class":"计算机3班"});
db.student.insert({"name":"吴八","age":"28","sex":"女","class":"计算机2班"});
4. hive 中创建表
create external table student
( id string,
name string,
age string,
sex string,
class string
)
stored by ‘com.mongodb.hadoop.hive.MongoStorageHandler‘
with serdeproperties(‘mongo.columns.mapping‘=‘{"id":"_id","name":"name","age":"age","sex":"sex","class":"class"}‘)
tblproperties(‘mongo.uri‘=‘mongodb://root:[email protected]:40000/test_v3.student‘);
查询数据如下:
select * from student;
数据是实时同步的:
mongodb中插入一条数据如下:
db.student.insert({"name":"杨十","age":"28","sex":"男","class":"计算机3班"});
在做查询:
mongodb中修改一条数据如下:
db.student.update({"name":"张三"},{$set:{"name":"张无忌"}});#只想改某个key的value使用set
mongodb中删除一条数据如下:
db.student.remove({"name":"张无忌"});#删除
看张无忌大哥已经不在了,缅怀一下下。
原文地址:https://www.cnblogs.com/xiqing/p/9673834.html