hive grouping sets 和 cube 用法

grouping sets 和cube基本知识。

基础知识可参考 http://blog.csdn.net/mashroomxl/article/details/22578471

grouping sets 适用于多维度统计,可以代替之前lateral view explode 方式

cube 相当于grouping sets 所有条件组合。

平时跑临时需求看数据，也可以用cube。比如看某款游戏android，ios，_NONE的数量，很方便可以用一个sql写。
grouping sets 与lateral view explode方式比较

结论：map和reducer数一样，在运算速度上差距也不大，但写法会比较简单。
cube 使用碰到的情况

当>=5个维度且聚合中用了distinct，会报如下错误

An additional MR job is introduced since the cardinality of grouping sets is more than hive.new.job.grouping.set.cardinality.

This functionality is not supported with distincts.

Either set hive.new.job.grouping.set.cardinality to a high number (higher than the number of rows per input row due to grouping sets in the query),

or rewrite the query to not use distincts. The number of rows per input row due to grouping sets is 32 (state=42000,code=10226)

解决方法：如错误日志描述给出的解决方法，可以通过修改 hive.new.job.grouping.set.cardinality 配置，或者在聚合中不用distinct来解决。

目前我们采用后者这个方式，可以通过在子查询中用group by去重，避免在聚合中用到distinct。

时间： 2024-10-06 22:01:12

hive grouping sets 和 cube 用法的相关文章

Hive分析窗口函数(五) GROUPING SETS,GROUPING__ID,CUBE,ROLLUP

1.GROUPING SETS与另外哪种方式等价? 2.根据GROUP BY的维度的所有组合进行聚合由哪个关键字完成? 3.ROLLUP与ROLLUP关系是什么? GROUPING SETS,GROUPING__ID,CUBE,ROLLUP 这几个分析函数通常用于OLAP中,不能累加,而且需要根据不同维度上钻和下钻的指标统计,比如,分小时.天.月的UV数. Hive版本为 apache-hive-0.13.1 数据准备: 2015-03,2015-03-10,cookie1 2015-03,20

GROUPING SETS、CUBE、ROLLUP

其实还是写一个Demo 比较好 USE tempdb IF OBJECT_ID( 'dbo.T1' , 'U' )IS NOT NULL BEGIN DROP TABLE dbo.T1; END; GO CREATE TABLE dbo.T1 ( id INT , productName VARCHAR(200) , price MONEY , num INT , amount INT , operatedate DATETIME ) GO DECLARE @i INT DECLARE @ran

T-Sql语法：GROUP BY子句GROUPING SETS、CUBE、ROLLUP

#cnblogs_post_body h1 { background-color: #A5A5A5; color: white; padding: 5px } GROUP BY子句 1.ROLLUP() 生成某一维度的分组的小计行,还生成一个总计行. 示例表: select * from student 我们来看一下具体示例: select sex,sclass,sum(score) from student group by rollup(sex,sclass) 如图中所示,ROLLUP()为

hive grouping sets 等聚合函数

函数说明: grouping sets 在一个 group by 查询中,根据不同的维度组合进行聚合,等价于将不同维度的 group by 结果集进行 union allcube 根据 group by 的维度的所有组合进行聚合rollup 是 cube 的子集,以最左侧的维度为主,从该维度进行层级聚合. -- grouping sets select order_id, departure_date, count(*) as cnt from ord_test where order_id=4

[转]详解Oracle高级分组函数(ROLLUP, CUBE, GROUPING SETS)

原文地址:http://blog.csdn.net/u014558001/article/details/42387929 本文主要讲解 ROLLUP, CUBE, GROUPING SETS的主要用法,这些函数可以理解为GroupBy分组函数封装后的精简用法,相当于多个union all 的组合显示效果,但是要比多个union all的效率要高. 其实这些函数在时间的程序开发中应用的并不多,至少在我工作的多年时间中没用过几次,因为现在的各种开发工具/平台都自带了这些高级分组统计功能,使用的方

SQL Server2008 程序设计汇总 GROUP BY，WITH ROLLUP，WITH CUBE，GROUPING SETS(..)

--SQL Server2008 程序设计汇总 GROUP BY ,WITH ROLLUP WITH CUBE GROUPING SET(..) /******************************************************************************** *主题:SQL Server2008 程序设计汇总 group by ,WITH ROLLUP WITH CUBE *说明:本文是个人学习的一些笔记和个人愚见 * 有很多

Oracle中group by 的扩展函数rollup、cube、grouping sets

Oracle的group by除了基本用法以外,还有3种扩展用法,分别是rollup.cube.grouping sets,分别介绍如下: 1.rollup 对数据库表emp,假设其中两个字段名为a,b,c. 如果使用group by rollup(a,b),首先会对(a,b)进行group by ,然后对 a 进行 group by ,最后对全表进行 group by 操作. 如下查询结果: 查询语句 Select deptno,job,sum(sal) from emp group by r

grouping sets,cube,rollup,grouping__id,group by

例1: hive -e" select type ,status ,count(1) from usr_info where pt='2015-09-14' group by type,status grouping sets ((type,status),( type),()); ">one.txt Grouping sets按照各种指定聚类汇总方式,如group by type,status grouping sets ((type,status),( type),()) 表

[Oracle] Group By 语句的扩展 - Rollup、Cube和Grouping Sets

常常写SQL语句的人应该知道Group by语句的主要使用方法是进行分类汇总,以下是一种它最常见的使用方法(依据部门.职位分别统计业绩): SELECT a.dname,b.job,SUM(b.sal) sum_sal FROM dept a,emp b WHERE a.deptno = b.deptno GROUP BY a.dname,b.job; DNAME JOB SUM_SAL -------------- --------- ---------- SALES MANAGER 2850

hive grouping sets 和 cube 用法

grouping sets 和cube基本知识。

grouping sets 与lateral view explode方式比较

cube 使用碰到的情况

hive grouping sets 和 cube 用法的相关文章