hive 1.1.0 动态分区实现

hive (public)> explain insert overwrite table public_t_par partition(delivery_datekey) select * from public_oi_fact_partition;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-7 depends on stages: Stage-1 , consists of Stage-4, Stage-3, Stage-5
  Stage-4
  Stage-0 depends on stages: Stage-4, Stage-3, Stage-6
  Stage-2 depends on stages: Stage-0
  Stage-3
  Stage-5
  Stage-6 depends on stages: Stage-5
STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: public_oi_fact_partition
            Statistics: Num rows: 110000 Data size: 35162211 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: order_datekey (type: int), oiid (type: bigint), custom_order_id (type: bigint), ciid (type: int), bi_name (type: string), siid (type: int), si_name (type: string), classify (type: string), status (type: int), status_text (type: string), class1_id (type: int), class1_name (type: string), class2_id (type: int), class2_name (type: string), city_id (type: int), city_name (type: string), operate_area (type: int), company_id (type: int), standard_item_num (type: int), package_num (type: double), expect_num (type: decimal(30,6)), price (type: decimal(30,6)), order_weight (type: decimal(30,6)), order_amount (type: decimal(30,6)), order_money (type: decimal(30,6)), ci_weight (type: decimal(30,6)), c_t (type: string), u_t (type: string), real_num (type: decimal(30,6)), real_weight (type: decimal(30,6)), real_money (type: decimal(30,6)), cost_price (type: decimal(30,6)), cost_money (type: decimal(30,6)), price_unit (type: string), order_money_coupon (type: decimal(30,6)), real_money_coupon (type: decimal(30,6)), real_price (type: decimal(30,6)), f_activity (type: int), activity_type (type: tinyint), is_activity (type: tinyint), original_price (type: decimal(30,6)), car_group_id (type: bigint), driver_id (type: string), expect_pay_way (type: int), desc (type: string), coupon_score_amount (type: decimal(30,6)), sale_area_id (type: int), delivery_area_id (type: int), tag (type: int), promote_tag_id (type: bigint), promote_tag_name (type: string), pop_id (type: bigint), delivery_datekey (type: string)
              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29, _col30, _col31, _col32, _col33, _col34, _col35, _col36, _col37, _col38, _col39, _col40, _col41, _col42, _col43, _col44, _col45, _col46, _col47, _col48, _col49, _col50, _col51, _col52
              Statistics: Num rows: 110000 Data size: 35162211 Basic stats: COMPLETE Column stats: NONE
              File Output Operator
                compressed: false
                Statistics: Num rows: 110000 Data size: 35162211 Basic stats: COMPLETE Column stats: NONE
                table:
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                    name: public.public_t_par
  Stage: Stage-7
    Conditional Operator
  Stage: Stage-4
    Move Operator
      files:
          hdfs directory: true
          destination: hdfs://ns1/user/hive/warehouse/public.db/public_t_par/.hive-staging_hive_2018-06-08_15-41-18_222_4176438830382881060-1/-ext-10000
  Stage: Stage-0
    Move Operator
      tables:
          partition:
            delivery_datekey
          replace: true
          table:
              input format: org.apache.hadoop.mapred.TextInputFormat
              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              name: public.public_t_par
  Stage: Stage-2
    Stats-Aggr Operator
  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            File Output Operator
              compressed: false
              table:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                  name: public.public_t_par
  Stage: Stage-5
    Map Reduce
      Map Operator Tree:
          TableScan
            File Output Operator
              compressed: false
              table:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                  name: public.public_t_par
  Stage: Stage-6
    Move Operator
      files:
          hdfs directory: true
          destination: hdfs://ns1/user/hive/warehouse/public.db/public_t_par/.hive-staging_hive_2018-06-08_15-41-18_222_4176438830382881060-1/-ext-10000

uploading-image-422377.png

uploading-image-83430.png

目录都完成后_tmp.-ext-1000会变成-ext-1000 并参见stage-6

原文地址:https://www.cnblogs.com/jiangxiaoxian/p/9565500.html

时间: 2024-10-09 01:14:13

hive 1.1.0 动态分区实现的相关文章

动态分区代码

管道程序 #include <stdio.h> #include <unistd.h> #include <sys/wait.h> #include <stdlib.h> int main(void) { int fds[2]; pid_t pid; if(pipe(fds) == -1) { perror("pipe error"); exit(1); } pid=fork(); if(pid == -1) { perror("

Hive动态分区

Hive默认是静态分区,我们在插入数据的时候要手动设置分区,如果源数据量很大的时候,那么针对一个分区就要写一个insert,比如说,我们有很多日志数据,我们要按日期作为分区字段,在插入数据的时候我们不可能手动的去添加分区,那样太麻烦了.还好,Hive提供了动态分区,动态分区简化了我们插入数据时的繁琐操作. 使用动态分区的时候必须开启动态分区(动态分区默认是关闭的),语句如下: [java] view plain copy set hive.exec.hynamic.partition=true;

Hive分区(静态分区+动态分区)

Hive分区的概念与传统关系型数据库分区不同. 传统数据库的分区方式:就oracle而言,分区独立存在于段里,里面存储真实的数据,在数据进行插入的时候自动分配分区. Hive的分区方式:由于Hive实际是存储在HDFS上的抽象,Hive的一个分区名对应一个目录名,子分区名就是子目录名,并不是一个实际字段. 所以可以这样理解,当我们在插入数据的时候指定分区,其实就是新建一个目录或者子目录,或者在原有的目录上添加数据文件. Hive分区的创建 Hive分区是在创建表的时候用Partitioned b

Hive的静态分区和动态分区

作者:Syn良子 出处:http://www.cnblogs.com/cssdongl/p/6831884.html 转载请注明出处 虽然之前已经用过很多次hive的分区表,但是还是找时间快速回顾总结一下加深理解. 举个栗子,基本需求就是Hive有一张非常详细的原子数据表original_device_open,而且还在不断随着时间增长,那么我需要给它进行分区,为什么要分区?因为我想缩小查询范围,提高速度和性能. 分区其实是物理上对hdfs不同目录进行数据的load操作,0.7之后的版本都会自动

Hive Experiment 2(表动态分区和IDE)

1.使用oracle sql developer 4.0.3作为hive query的IDE. 下载hive-jdbc driver http://www.cloudera.com/content/cloudera/en/downloads/connectors/hive/jdbc/hive-jdbc-v2-5-6.html Start    Oracle    SQL    Developer    and    navigate    to    Preferences    |    Da

hive从查询中获取数据插入到表或动态分区

(前人写的不错,很实用,负责任转发)转自:http://www.crazyant.net/1197.html Hive的insert语句能够从查询语句中获取数据,并同时将数据Load到目标表中.现在假定有一个已有数据的表staged_employees(雇员信息全量表),所属国家cnty和所属州st是该表的两个属性,我们做个试验将该表中的数据查询出来插入到另一个表employees中. 1 2 3 4 INSERT OVERWRITE TABLE employees PARTITION (cou

Hive架构层面优化之五合理设计表分区(静态分区和动态分区)

合理建表分区有效提高查询速度. 重要数据采用外部表存储,CREATE EXTERNAL TABLE,数据和表只是一个location的关联,drop表后数据不会丢失: 内部表也叫托管表,drop表后数据丢失:所以重要数据的表不能采用内部表的方式存储. 在全天的数据里查询某个时段的数据,性能很低效------可以通过增加小时级别的分区来改进! Trackreal为例,有三个分区: 日增量: 按日期分区: 小时增量:按日期.小时分区: 10分钟增量:按日期.小时.step分区:每个小时要导6次. 场

hive:默认允许动态分区个数为100,超出抛出异常:

在创建好一个分区表后,执行动态分区插入数据,抛出了错误: Caused by: org.apache.hadoop.hive.ql.metadata.HiveFatalException: [Error 20004]: Fatal error occurred when node tried to create too many dynamic partitions. The maximum number of dynamic partitions is controlled by hive.e

Hive静态分区和动态分区

一.静态分区 1.创建分区表 1 hive (default)> create table order_mulit_partition( 2 > order_number string, 3 > event_time string 4 > ) 5 > PARTITIONED BY(event_month string, step string) 6 > row format delimited fields terminated by '\t'; 2.加载数据到分区表