hive行转列

一、问题

hive如何将

a       1,2,3
b       4,7
c       5

转化成为:

a       1
a       2
a       3
b       4
b       7
c       5

二、原始数据

cat row_column.txt
a       1,2,3
b       4,7
c       5

三、解决方案

3.1 遍历每一列

3.1.1 创建表

-- 创建表
create table tmp.row_column
(
col1 string,
col3 string
)
row format delimited fields terminated by ‘\t‘
stored as textfile;
-- 载入数据
load data local inpath ‘/tmp/row_column.txt‘ into table row_column;

3.1.2 查看数据:

hive> select * from row_column;
OK
a       1,2,3
b       4,7
c       5

3.1.3 遍历每一列

select col1,name
from tmp.row_column
lateral view explode(split(col3,‘,‘)) col3 as name;
---------------------------------------------------------------
Total MapReduce CPU Time Spent: 2 seconds 20 msec
OK
a       1
a       2
a       3
b       4
b       7
c       5

3.2 数组遍历

3.2.1 创建表

create table tmp.row_column_array
(
  col1 string,
  col3 array<int>
)
row format delimited
fields terminated by ‘\t‘
collection items terminated by ‘,‘
stored as textfile;

3.2.2 加载数据

load data local inpath ‘/tmp/row_column.txt‘ into table tmp.row_column_array;

3.2.3 查看数据

hive> select * from tmp.row_column_array;
OK
a       [1,2,3]
b       [4,7]
c       [5]

3.2.4 查看每一列

select col1,name
from tmp.row_column_array
lateral view explode(col3) col3 as name;

3.2.5 结果

a       1
a       2
a       3
b       4
b       7
c       5

四、补充

查看使用逗号分割的列

select t.list[0],t.list[1],t.list[2] from (
select (split(col3,‘,‘)) list from tmp.row_column)t;
Total MapReduce CPU Time Spent: 1 seconds 740 msec
OK
1       2       3
4       7       NULL
5       NULL    NULL
Time taken: 15.264 seconds, Fetched: 3 row(s)

查看长度

select col1, size(split(col3,‘,‘)) list from tmp.row_column;
Total MapReduce CPU Time Spent: 1 seconds 690 msec
OK
a       3
b       2
c       1
时间: 2024-10-08 16:38:35

hive行转列的相关文章

hive 行转列显示

首先查看一个sql 1.首先存在一个数据表tmp CREATE TABLE tmp( platform string, channel string, chan_value string, uid string, host int, logtime string, bd_source string, action string, refer string, back_url string, browser string, mobile string, server_ip string, ip s

hive 行转列 并添加虚列

select regexp_extract(a.col2,'(phonenum=\")(.*?)\"',2) user_device, regexp_extract(a.col13,'(imsicode=\")(.*?)\"',2) imsi, regexp_extract(a.col12,'(imeicode=\")(.*?)\"',2) imei, call_log from (select * from ods_sso_dislocatio

Mysql或者Hive数据行变成列

对于mysql /  hive 再进行统计的时候如果需要行变成列,可以使用函数 CASE 字段a WHEN 值b THEN c [WHEN d THEN e]* [ELSE f] END 当字段a=值b时,返回c(如果是字段的话则代表该字段的值,也可以是一个固定值 加单引号就可以):当a=d时,返回e,否则返回f. 如: 数据表结构:(举例说明,id有重复的) select id ,sum(CASE action when 'article' then count else 0 end) as

hive行转多列LATERAL VIEW explode

源表(table1)数据{A:string B:array<BIGINT> C:string} A                         B                                C 190     [1030,1031,1032,1033,1190]      select id191     [1030,1031,1032,1033,1190]      select id 希望的结果是: 190    1030  select id 190    103

hive udtf 输入一列返回多行多列

之前说到了hive udf,见https://blog.csdn.net/liu82327114/article/details/80670415 UDTF(User-Defined Table-Generating Functions) 用来解决 输入一行输出多行(On-to-many maping) 的需求. 继承org.apache.hadoop.hive.ql.udf.generic.GenericUDTF,实现initialize, process, close三个方法. UDTF首先

Hive之列转行,行转列

测试数据 hive> select * from col_lie limit 10; OK col_lie.user_id col_lie.order_id 104399 1715131 104399 2105395 104399 1758844 104399 981085 104399 2444143 104399 1458638 104399 968412 104400 1609001 104400 2986088 104400 1795054 把相同user_id的order_id按照逗号

hive列转行、行转列

1.列转行 select  t.cust_id          ,concat_ws(',',collect_list(group_id)) one_pace  from (select   'A_001' cust_id                       ,'20191014' group_id           union all           select   'A_001' cust_id                      ,'20191015' group_

MySQL,排序,统计行转列

表 -- ------------------------------ Table structure for a-- ---------------------------- DROP TABLE IF EXISTS `a`; CREATE TABLE `a` ( `type` varchar(255) DEFAULT NULL, `name` varchar(255) DEFAULT NULL, `val` varchar(255) DEFAULT NULL ) ENGINE=InnoDB

GreenPlum之数组合并取交集及行变列、列变行函数

--1.利用INTERSECT关键字数组之间交集函数 CREATE OR REPLACE FUNCTION array_intersect(anyarray, anyarray) RETURNS anyarray AS $$ SELECT ARRAY( SELECT UNNEST($1) INTERSECT SELECT UNNEST($2)); $$ LANGUAGE SQL; select array_intersect(array[1,2,3],array[2,3,4]); --2.行变列