使用Mongo dump 将数据导入到hive

概述：使用dump 方式将mongo数据导出，上传到hdfs,然后在hive中建立外部表。

1. 使用mongodump 将集合导出

mongodump --host=localhost:27017 --db=mydb --collection=users --out=/tmp/root/mongodump0712

[[email protected] root]# mongodump --host=localhost:27017 --db=mydb --collection=users --out=/tmp/root/mongodump0712

2018-07-12T10:07:27.894+0800 writing mydb.users to

2018-07-12T10:07:27.896+0800 done dumping mydb.users (2 documents)

[[email protected] root]# cd /tmp/root

[[email protected] root]# ls

3604abd2-a359-4c53-a7b4-e4ea84185801 3604abd2-a359-4c53-a7b4-e4ea841858017799130181720133073.pipeout dump hive.log hive.log.2018-07-11 mongodump0712

[[email protected] root]# ll

total 624

drwx------. 2 root root 6 Jul 12 09:34 3604abd2-a359-4c53-a7b4-e4ea84185801

-rw-r--r--. 1 root root 0 Jul 12 09:34 3604abd2-a359-4c53-a7b4-e4ea841858017799130181720133073.pipeout

drwxr-xr-x. 5 root root 44 Jul 12 10:04 dump

-rw-r--r--. 1 root root 88700 Jul 12 09:39 hive.log

-rw-r--r--. 1 root root 547126 Jul 11 21:07 hive.log.2018-07-11

drwxr-xr-x. 3 root root 18 Jul 12 10:07 mongodump0712

[[email protected] root]# cd mongodump0712/

[[email protected] mongodump0712]# ls

mydb

[[email protected] mongodump0712]# cd mydb

[[email protected] mydb]# ls

users.bson users.metadata.json

2. 将dump文件上传到hdfs

hdfs dfs -mkdir /user/hive/warehouse/mongo

hdfs dfs -put /tmp/root/mongodump0712/mydb/users.bson /user/hive/warehouse/mongo/

[[email protected] mydb]# hdfs dfs -mkdir /user/hive/warehouse/mongo

[[email protected] mydb]# hdfs dfs -put /tmp/root/mongodump0712/mydb/users.bson /user/hive/warehouse/mongo/

3. 创建表并测试

hive> create EXTERNAL table muser

> (

> id string,

> userid string,

> age bigint,

> status string

> )

> row format serde ‘com.mongodb.hadoop.hive.BSONSerDe‘

> WITH SERDEPROPERTIES(‘mongo.columns.mapping‘=‘{"id":"_id","userid":"user_id","age":"age","status":"status"}‘)

> stored as inputformat ‘com.mongodb.hadoop.mapred.BSONFileInputFormat‘

> outputformat ‘com.mongodb.hadoop.hive.output.HiveBSONFileOutputFormat‘

> location ‘/user/hive/warehouse/muser‘;

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:hdfs://ns1/user/hive/warehouse/muser is not a directory or unable to create one)

hive> create EXTERNAL table muser

> (

> id string,

> userid string,

> age bigint,

> status string

> )

> row format serde ‘com.mongodb.hadoop.hive.BSONSerDe‘

> WITH SERDEPROPERTIES(‘mongo.columns.mapping‘=‘{"id":"_id","userid":"user_id","age":"age","status":"status"}‘)

> stored as inputformat ‘com.mongodb.hadoop.mapred.BSONFileInputFormat‘

> outputformat ‘com.mongodb.hadoop.hive.output.HiveBSONFileOutputFormat‘

> location ‘/user/hive/warehouse/mongo‘;

Time taken: 0.123 seconds

hive> select * from muser;

5b456e33a93daf7ae53e6419 abc123 58 D

5b45705ca93daf7ae53e8b2a bcd001 45 C

Time taken: 0.181 seconds, Fetched: 2 row(s)

原文地址：https://www.cnblogs.com/abcdwxc/p/9298299.html

时间： 2024-10-13 22:46:15

使用Mongo dump 将数据导入到hive

使用Mongo dump 将数据导入到hive的相关文章

使用 sqoop 将mysql数据导入到hive（import）

使用sqoop将mysql数据导入到hive中

Talend 将Oracle中数据导入到hive中,根据系统时间设置hive分区字段

数据导入(一):Hive On HBase

Elasticsearch+Mongo亿级别数据导入及查询实践

用sqoop将mysql的数据导入到hive表

sqoop从oracle数据库抽取数据,导入到hive

Hive数据导入——数据存储在Hadoop分布式文件系统中，往Hive表里面导入数据只是简单的将数据移动到表所在的目录中！

HDFS、Hive、MySQL、Sqoop之间的数据导入导出（强烈建议去看）