1.创建表
(1)内部表和外部表的区别
默认创建的是内部表,可以指定目录,如果不指定则会创建默认目录,一旦drop,该目录和数据都会被删除
创建external table 的时候需要指定存放目录,并且drop表的时候,不会删除该目录和目录下的数据,只会删除元信息
#创建一个外部表
0: jdbc:hive2://192.168.163.102:10000> create external table t10(c1 int,c2 string) row format delimited fields terminated by ',' stored as testfile location "/dir1";
[[email protected] tmp]# hdfs dfs -put file1 /dir1
[[email protected] tmp]# hdfs dfs -ls -R /dir1
-rw-r--r-- 1 root supergroup 24 2017-11-25 20:53 /dir1/file1
0: jdbc:hive2://192.168.163.102:10000> drop table t10;
No rows affected (0.41 seconds)
[[email protected] tmp]# hdfs dfs -ls -R /dir1
17/11/25 20:56:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
-rw-r--r-- 1 root supergroup 24 2017-11-25 20:53 /dir1/file1
#创建一个默认的内部表
0: jdbc:hive2://192.168.163.102:10000> create table t2(c1 int,c2 string) row format delimited fields terminated by ',' stored as textfile;
(2)Hive支持的存储文件格式
textfile, sequencefile, orc, parquet,avro
0: jdbc:hive2://192.168.163.102:10000> create table t5(c1 int,c2 string) row format delimited fields terminated by ',' stored as sequencefile ;
0: jdbc:hive2://192.168.163.102:10000> insert into t5 select * from t4;
#作为sequencefile格式存储的文件无法直接查看其内容
[[email protected] tmp]# hdfs dfs -ls /user/hive/warehouse/testdb1.db/t5/
-rwxr-xr-x 1 root supergroup 146 2017-11-26 03:03 /user/hive/warehouse/testdb1.db/t5/000000_0
0: jdbc:hive2://192.168.163.102:10000> desc formatted t5;
2.导入数据到hive
语法:
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
(1) 直接把本地的文件导入到hive中的表
0: jdbc:hive2://192.168.163.102:10000> load data local inpath '/tmp/file1' into table t1;
0: jdbc:hive2://192.168.163.102:10000> select * from t1;
+--------+--------+--+
| t1.c1 | t1.c2 |
+--------+--------+--+
| 1 | aaa |
| 2 | bbb |
| 3 | ccc |
| 4 | ddd |
+--------+--------+--+
(2)加载数据到表中,但是会覆盖表中所有数据,实质是覆盖t1目录下的所有文件
0: jdbc:hive2://192.168.163.102:10000> load data local inpath '/tmp/file3' overwrite into table t1;
No rows affected (0.597 seconds)
0: jdbc:hive2://192.168.163.102:10000> select * from t1;
+--------+---------+--+
| t1.c1 | t1.c2 |
+--------+---------+--+
| 1 | yiyi |
| 2 | erer |
| 3 | sansan |
| 4 | sisi |
+--------+---------+--+
4 rows selected (0.073 seconds)
(3)把hdfs上的文件导入到hive中的表
[[email protected] tmp]# cat /tmp/file2
5,eee
[[email protected] tmp]# hdfs dfs -put /tmp/file2 /user/hive/warehouse/testdb1.db/t1
0: jdbc:hive2://192.168.163.102:10000> load data inpath '/user/hive/warehouse/testdb1.db/t1/file2' into table t1;
0: jdbc:hive2://192.168.163.102:10000> select * from t1;
+--------+--------+--+
| t1.c1 | t1.c2 |
+--------+--------+--+
| 1 | aaa |
| 2 | bbb |
| 3 | ccc |
| 4 | ddd |
| 5 | eee |
+--------+--------+--+
(4)根据一个表创建另一个表,同时插入数据
0: jdbc:hive2://192.168.163.102:10000> create table t2 as select * from t1;
(5)根据一个表先创建表结构,后插入数据
0: jdbc:hive2://192.168.163.102:10000> create table t3 like t1;
0: jdbc:hive2://192.168.163.102:10000> insert into t3 select * from t1;
3,从查询结果导数据到文件系统中
(1)从查询结果导数据到HDFS文件系统中
0: jdbc:hive2://192.168.163.102:10000> select * from t1;
+--------+---------+--+
| t1.c1 | t1.c2 |
+--------+---------+--+
| 1 | yiyi |
| 2 | erer |
| 3 | sansan |
| 4 | sisi |
+--------+---------+--+
0: jdbc:hive2://192.168.163.102:10000> insert overwrite directory '/user/hive/warehouse/tmp' select * from testdb1.t1;
[[email protected] tmp]# hdfs dfs -ls -R /user/hive/warehouse/tmp
-rwxr-xr-x 1 root supergroup 30 2017-11-26 00:25 /user/hive/warehouse/tmp/000000_0
[[email protected] tmp]# hdfs dfs -get /user/hive/warehouse/tmp/000000_0 /tmp/
导出的文件的分隔符对应的ASCII码是Ctrl+a 即\001
[[email protected] tmp]# vim /tmp/000000_0
1^Ayiyi
2^Aerer
3^Asansan
4^Asisi
利用这个文件创建一个外部表,使用\001为分隔符
0: jdbc:hive2://192.168.163.102:10000> create external table t5(c1 int,c2 string) row format delimited fields terminated by '\001' location '/user/hive/warehouse/tmp/';
0: jdbc:hive2://192.168.163.102:10000> select * from t5;
+--------+---------+--+
| t5.c1 | t5.c2 |
+--------+---------+--+
| 1 | yiyi |
| 2 | erer |
| 3 | sansan |
| 4 | sisi |
+--------+---------+--+
(2)从查询结果导数据到本地
0: jdbc:hive2://192.168.163.102:10000> insert overwrite local directory '/tmp' select * from testdb1.t1;
[[email protected] tmp]# ls /tmp/000000_0
/tmp/000000_0
4 insert
(1) insert 插入数据的实质是建立一个文件
0: jdbc:hive2://192.168.163.102:10000> insert into t5 values(4,'sisi');
No rows affected (17.987 seconds)
0: jdbc:hive2://192.168.163.102:10000> dfs -ls /user/hive/warehouse/testdb1.db/t5 ;
+----------------------------------------------------------------------------------------------------------------+--+
| DFS Output |
+----------------------------------------------------------------------------------------------------------------+--+
| Found 2 items |
| -rwxr-xr-x 1 root supergroup 146 2017-11-26 03:03 /user/hive/warehouse/testdb1.db/t5/000000_0 |
| -rwxr-xr-x 1 root supergroup 106 2017-11-26 04:22 /user/hive/warehouse/testdb1.db/t5/000000_0_copy_1 |
+----------------------------------------------------------------------------------------------------------------+--+
原文地址:http://blog.51cto.com/darrenmemos/2066616