Hive表操作以及原理

hive
   hive是基于hadoop的一个数据仓库工具，可将结构化的数据文件映射为一张数据库表，并提供完整的sql查询功能，可将sql转换为MapReduce任务运行。hive不适合用于联机事务处理，也不提供实时
查询，他适合应用在基于大量不可变的批处理作业。
   hive架构分为四个部分：
       用户接口：
           CLI:hive的命令模式，启动命令hive
           Client：hive的远程服务，启动命令hive --service hiveserver 10001 > /dev/null 2>/dev/null &
           WUI：hive的web模式，启动命令hive --service hwi，通过http://hostname:9999/hwi访问
       元数据存储：
           hive将元数据存储在数据库中，例如mysql和derby。hive中的元数据包括表的名字、表的列和分区及其属性、表属性（外部表和内部表）、表数据所在的目录等。
       解释器、编译器和优化器：
           完成hql查询语句从词法分析、语法分析、编译、优化以及查询计划的生成，生成的查询计划存储在hdfs中，并在随后MapReduce调用执行。
       数据存储：
           hive的数据存储在hdfs中，大部分的查询都被解析为MapReduce作业执行，只有少部分直接读取文件，如select * from tablename;

hive默认的目录是hdfs上的：/user/hive/warehouse/，所有的表都是在hdfs上的文件夹

mysql中的hive数据库：
   TBLS表：保存hive所有的表信息，MANAGED_TABLE是内部表，EXTERNAL_TABLE是外部表
   COLUMN_V2表：保存hive所有表的字段信息
   SDS表：保存hive中所有表数据在hdfs上的路径信息

内部表：将数据移动到数仓所指向的路径，通过drop table tableName操作时，hive删除元数据中表结构的同时，表中的数据也会从hdfs删除。
外部表：记录数据所在的路径且不对数据的位置做任何改变，通过drop table tableName操作时，hive仅删除元数据的表结构而不删除hdfs上的数据。
分区表：分区表没有复杂的分区类型（范围分区、列表分区、hash分区、混合分区）。分区列也不是表中的实际字段，而是一个或多个伪列，在表的数据文件中实际不保存分区列的信息和数据。
桶表：hive中的表可拆分成分区表partition table和桶表bucket，桶操作时通过partition的clustered by实现的，桶表的数据可根据sort by排序，主要作用为数据采样，提升某些查询效率，如map-side join。

创建内部表
create table mytable1(id int, name string) row format delimited fields terminated by ‘\t‘;
加载本地数据到mytable1表
load data local inpath ‘/usr/local/mytable‘ into table mytable1;
加载hdfs数据到mytable1表
load data inpath ‘/testdata/mytable‘ into table mytable1;
通过hadoop fs -put加载数据
hadoop fs -put mytable1 /user/hive/warehouse/mytable1

创建外部表
create external table mytable2(id int, name string) row format delimited fields terminated by ‘\t‘ location ‘/testtable‘;
加载本地数据到mytable2表
load data local inpath ‘/usr/local/mytable1‘ into table mytable2;
加载hdfs数据到mytable2表
load data inpath ‘/testdata/mytable2‘ into table mytable2;
通过hadoop fs -put加载数据
hadoop fs -put /usr/local/mytable_test /testtable

创建分区表
create table mypart(id int,name string) partitioned by (type string) row format delimited fields terminated by ‘\t‘;
加载本地数据到mypart表
load data local inpath ‘/usr/local/mypart‘ into table mypart partition (type=‘pc‘);
加载hdfs数据到mypart表
load data inpath ‘/testdata/mypart2‘ into table mypart partition (type=‘phone‘);
通过hadoop fs -mkdir/-put创建文件夹和上传数据后是查询不到的
dfs -mkdir /user/hive/warehouse/mypart/type=shoes;
dfs -put /usr/local/mypartshoes /user/hive/warehouse/mypart/type=shoes;
需通过如下命令让hive在SDS表中添加分区为shoes的记录后才能查到数据（添加分区）
alter table mypart add partition (type=‘shoes‘) location ‘/user/hive/warehouse/mypart/type=shoes‘;
删除分区
alter table mypart drop if exists partition (type=‘shoes‘);
查看mypart表分区
show partitions mypart;

查看创建表信息
show create table mytable2;

时间： 2024-10-11 20:07:08

Hive表操作以及原理

Hive表操作以及原理的相关文章

【源】从零自学Hadoop(15)：Hive表操作

Hive基础之Hive表常用操作

hive表信息查询：查看表结构、表操作等--转

Hive命令行常用操作（数据库操作，表操作）

hive表信息查询：查看表结构、表操作等

hive 表的创建的操作与测试

hive 表的常用操作

Hbase 表与Hive 表的映射操作

Winform开发框架里面使用事务操作的原理及介绍