hive使用技巧（一）自动化动态分配表分区及修改hive表字段名称

Author：FuRenjie kwu

1、自动化动态分配表分区

set hive.exec.dynamic.partition.mode=nonstrict;

insert overwrite table ods.fund2hundsunlg PARTITION(day)

select distinct fromHostIp ,hundsunNodeIp,concat(substring(requestTime,0,10),‘ ‘, substring(requestTime,12,8)) , httpStatus ,responseTimes,urlpath, responseCharts ,postBody,

concat(substring(requestTime,0,4),substring(requestTime,6,2), substring(requestTime,9,2)) as day

from ods.fund2hundsunlog ;

说明：

1）set
hive.exec.dynamic.partition.mode=nonstrict; 设置表分区可动态加载

2）concat(substring(requestTime,0,4),substring(requestTime,6,2),
substring(requestTime,9,2)) as day，根据已有时间的切分来做partition

2、快速修改hive表字段名称

1）重新创建新表

drop table ods.dratio;

create EXTERNAL table ods.dratio (

dratioId string comment "用户ID:用户ID:860010-2370010130,注册用户ID：860010-2370010131",

cookieId string comment "mcookie",

sex string comment "sex: 1 男， 2 女",

age string comment "age: 1 0-18, 2 19-29, 3 30-39, 4 40以上",

ppt string comment "ppt: 1 高购买力 2 中购买力 3 低购买力",

degree string comment "degree: 1 本科以下 2 本科及其以上",

favor string comment "喜好信息(不定长)",

commercial string comment "商业价值信息(不定长)"

)

comment "用户行为分析"

partitioned by(day string comment "按天的分区表字段")

STORED AS TEXTFILE

location ‘/dw/ods/dratio‘;

2）重新分配表分区的数据，无须数据移动

alter table ods.dratio add partition(day=‘20150507‘) location ‘/dw/ods/dratio/day=20150507‘;

alter table ods.dratio add partition(day=‘20150508‘) location ‘/dw/ods/dratio/day=20150508‘;

alter table ods.dratio add partition(day=‘20150509‘) location ‘/dw/ods/dratio/day=20150509‘;

alter table ods.dratio add partition(day=‘20150510‘) location ‘/dw/ods/dratio/day=20150510‘;

alter table ods.dratio add partition(day=‘20150511‘) location ‘/dw/ods/dratio/day=20150511‘;

alter table ods.dratio add partition(day=‘20150512‘) location ‘/dw/ods/dratio/day=20150512‘;

alter table ods.dratio add partition(day=‘20150513‘) location ‘/dw/ods/dratio/day=20150513‘;

alter table ods.dratio add partition(day=‘20150514‘) location ‘/dw/ods/dratio/day=20150514‘;

alter table ods.dratio add partition(day=‘20150515‘) location ‘/dw/ods/dratio/day=20150515‘;

alter table ods.dratio add partition(day=‘20150516‘) location ‘/dw/ods/dratio/day=20150516‘;

alter table ods.dratio add partition(day=‘20150517‘) location ‘/dw/ods/dratio/day=20150517‘;

alter table ods.dratio add partition(day=‘20150518‘) location ‘/dw/ods/dratio/day=20150518‘;

alter table ods.dratio add partition(day=‘20150519‘) location ‘/dw/ods/dratio/day=20150519‘;

alter table ods.dratio add partition(day=‘20150520‘) location ‘/dw/ods/dratio/day=20150520‘;

alter table ods.dratio add partition(day=‘20150521‘) location ‘/dw/ods/dratio/day=20150521‘;

时间： 2024-11-17 11:41:37

hive使用技巧（一）自动化动态分配表分区及修改hive表字段名称的相关文章

通过jdbc获取数据库中的表结构主键各个表字段类型及应用生成实体类

http://www.cnblogs.com/lbangel/p/3487796.html package cn.test; import java.io.File;import java.io.FileOutputStream;import java.sql.Connection;import java.sql.DatabaseMetaData;import java.sql.DriverManager;import java.sql.ResultSet;import java.text.Si

hive使用技巧（四）——巧用MapJoin解决数据倾斜问题

相关文章推荐: hive使用技巧(一)自动化动态分配表分区及修改hive表字段名称 hive使用技巧(二)--共享中间结果集 hive使用技巧(三)--巧用group by实现去重统计 hive使用技巧(四)--巧用MapJoin解决数据倾斜问题 Hive的MapJoin,在Join 操作在 Map 阶段完成,如果需要的数据在 Map 的过程中可以访问到则不再需要Reduce. 小表关联一个超大表时,容易发生数据倾斜,可以用MapJoin把小表全部加载到内存在map端进行join,避免reduc

Hive学习之路（三）Hive元数据信息对应MySQL数据库表

概述 Hive 的元数据信息通常存储在关系型数据库中,常用MySQL数据库作为元数据库管理.上一篇hive的安装也是将元数据信息存放在MySQL数据库中. Hive的元数据信息在MySQL数据中有57张表一.存储Hive版本的元数据表(VERSION) VERSION -- 查询版本信息该表比较简单,但很重要. VER_ID SCHEMA_VERSION VERSION_COMMENT ID主键 Hive版本版本说明 1 0.13.0 Set by MetaStore 如果该表出现问题

Hive架构层面优化之五合理设计表分区(静态分区和动态分区)

合理建表分区有效提高查询速度. 重要数据采用外部表存储,CREATE EXTERNAL TABLE,数据和表只是一个location的关联,drop表后数据不会丢失: 内部表也叫托管表,drop表后数据丢失:所以重要数据的表不能采用内部表的方式存储. 在全天的数据里查询某个时段的数据,性能很低效------可以通过增加小时级别的分区来改进! Trackreal为例,有三个分区: 日增量: 按日期分区: 小时增量:按日期.小时分区: 10分钟增量:按日期.小时.step分区:每个小时要导6次. 场

shell定时创建Hive表分区

首先看一下hive 的help命令: [[email protected] hive]$ hive -h Missing argument for option: h usage: hive -d,--define <key=value> Variable subsitution to apply to hive commands. e.g. -d A=B or --define A=B --database <databasename> Specify the database

hive表分区的修复

hive从低版本升级到高版本,需要重新创建表和表分区,由于使用的是动态分区,所以需要重新刷新分区表,否则无法查看数据. 在hive中执行中以下命令即可自动更新元数据中的表分区: MSCK REPAIR TABLE 表名; 原文地址:https://www.cnblogs.com/30go/p/8241883.html

Hive管理表分区的创建，数据导入，分区的删除操作

Hive分区和传统数据库的分区的异同: 分区技术是处理大型数据集经常用到的方法.在Oracle中,分区表中的每个分区是一个独立的segment段对象,有多少个分区,就存在多少个相应的数据库对象.而在Postgresql中分区表其实相当于分别建立了很多小表,其实和Oracle是异曲同工罢了. 在HIVE中的管理表其实就是在数据库目录下的一个和表名称一样的目录,数据文件都存放在该目录下,如果在Hive中查询一张表数据,那就需要遍历该目录下的所有数据文件,如果表的数据非常庞大,那查询性能会很不好. 管

Hive整合HBase——通过Hive读/写 HBase中的表

写在前面一: 本文将Hive与HBase整合在一起,使Hive可以读取HBase中的数据,让Hadoop生态系统中最为常用的两大框架互相结合,相得益彰. 写在前面二: 使用软件说明约定所有软件的存放目录: /home/yujianxin 一.Hive整合HBase原理 Hive与HBase整合的实现是利用两者本身对外的API接口互相进行通信,相互通信主要是依靠hive-hbase-handler-0.9.0.jar工具类,如下图 Hive与HBase通信示意图二.具体步骤安装前说明 1.关

hive 创建、删除、截断表基本操作及相关注意事项

简单的创建表 create table table_name ( id int, dtDontQuery string, name string ) 创建有分区的表 create table table_name ( id int, dtDontQuery string, name string ) partitioned by (date string) 一个表可以拥有一个或者多个分区,每个分区以文件夹的形式单独存在表文件夹的目录下. 分区是以字段的形式在表结构中存在,通过describe t