Kylin-基本知识

CUBE

Table - This is definition of hive tables as source of cubes, which must be synced before building cubes.
Data Model - This describes a STAR SCHEMA data model, which defines fact/lookup tables and filter condition.
Cube Descriptor - This describes definition and settings for a cube instance, defining which data model to use, what dimensions and measures to have, how to partition to segments and how to handle auto-merge etc.
Cube Instance - This is instance of cube, built from one cube descriptor, and consist of one or more cube segments according partition settings.
Partition - User can define a DATE/STRING column as partition column on cube descriptor, to separate one cube into several segments with different date periods.
Cube Segment - This is actual carrier of cube data, and maps to a HTable in HBase. One building job creates one new segment for the cube instance. Once data change on specified data period, we can refresh related segments to avoid rebuilding
whole cube.
Aggregation Group - Each aggregation group is subset of dimensions, and build cuboid with combinations inside. It aims at pruning for optimization.

DIMENSION & MEASURE

Mandotary - This dimension type is used for cuboid pruning, if a dimension is specified as “mandatory”, then those combinations without such dimension are pruned.
Hierarchy - This dimension type is used for cuboid pruning, if dimension A,B,C forms a “hierarchy” relation, then only combinations with A, AB or ABC shall be remained.
Derived - On lookup tables, some dimensions could be generated from its PK, so there’s specific mapping between them and FK from fact table. So those dimensions are DERIVED and don’t participate in cuboid generation.
Count Distinct(HyperLogLog) - Immediate COUNT DISTINCT is hard to calculate, a approximate algorithm -
HyperLogLog is introduced, and keep error rate in a lower level.
Count Distinct(Precise) - Precise COUNT DISTINCT will be pre-calculated basing on RoaringBitmap, currently only int or bigint are supported.
Top N - For example, with this measure type, user can easily get specified numbers of top sellers/buyers etc.

CUBE ACTIONS

BUILD - Given an interval of partition column, this action is to build a new cube segment.
REFRESH - This action will rebuilt cube segment in some partition period, which is used in case of source table increasing.
MERGE - This action will merge multiple continuous cube segments into single one. This can be automated with auto-merge settings in cube descriptor.
PURGE - Clear segments under a cube instance. This will only update metadata, and won’t delete cube data from HBase.

JOB STATUS

NEW - This denotes one job has been just created.
PENDING - This denotes one job is paused by job scheduler and waiting for resources.
RUNNING - This denotes one job is running in progress.
FINISHED - This denotes one job is successfully finished.
ERROR - This denotes one job is aborted with errors.
DISCARDED - This denotes one job is cancelled by end users.

JOB ACTION

RESUME - Once a job in ERROR status, this action will try to restore it from latest successful point.
DISCARD - No matter status of a job is, user can end it and release resources with DISCARD action.

时间： 2024-10-18 21:40:13

Kylin-基本知识的相关文章

初试U盘安装Ubuntu14.04 kylin版

等待Ubuntu14.04 kylin版出来后,一直想要尝试在物理机进行安装体验,恰巧碰到五一假期,因为咱是穷人,没钱出去玩,所以就有了时间实地进行练手,捣鼓一下想要搞的东西. 因为第一次通过U盘进行Linux操作的系统的安装,有点小费周折,网上找了好多,都是Ubuntu13.04的操作,心想应该也差不多,但是按照网上的指引,如下图操作:使用USB-HDD+,还需写入Syslinux,但是在写入Syslinux时,总是报错:"设备忙请退出所有正在运行的应用程序",我关了

U盘安装Ubuntu kylin版

初试U盘安装Ubuntu14.04 kylin版等待Ubuntu14.04 kylin版出来后,一直想要尝试在物理机进行安装体验,恰巧碰到五一假期,因为咱是穷人,没钱出去玩,所以就有了时间实地进行练手,捣鼓一下想要搞的东西. 因为第一次通过U盘进行Linux操作的系统的安装,有点小费周折,网上找了好多,都是Ubuntu13.04的操作,心想应该也差不多,但是按照网上的指引,如下图操作:使用USB-HDD+,还需写入Syslinux,但是在写入Syslinux时,总是报错:“设备忙

Win7+ubuntu kylin+CentOS 6.5三系统安装图文教程

引言:原本机子上已经装好了win7+Ubuntu Kylin 由win7引导,而不是Ubuntu的grub引导的双系统(安装的方法是用EasyBCD引导的方式硬盘安装) ADD:win7 主引导还是Grub/grub2主引导的意思就是你一开机,发现出现的是还是 ? 最近在看鸟哥的Linux私房菜,由于书上用到的是CentOS系统,所以也想尝试安装一个,毕竟方便日后的看书学习.但又不想将自己的ubuntu搞掉,所以便想尝试安装3系统即 win7+centos+ubuntu的三系统,并打算由win

顶级项目孵化的故事系列——Kylin的心路历程【转】

现在已经名满天下的 Apache Kylin,是 Hadoop 大数据生态系统不可或缺的一部分,要知道在 Kylin 项目早期,可是以华人为主的开源团队,一路披荆斩棘经过几年的奋斗,才在 Apache 基金会牢牢的巩固了自己的位置.作为本土第一个进入到世界顶级基金会的项目,Kylin 的经验是值得大家学习的. 以下内容根据 COSCon'17讲师史少锋(Apache Kylin PMC&Committer .Kyligence 技术合伙人兼高级架构师)的演讲速记所整理. 演讲实录今天我主要介绍

HBase场景 | 都是HBase上的SQL引擎，Kylin和Phoenix有什么不同？

大数据时代,数据的价值越来越被重视,企业从海量大数据中挖掘所需要的信息,用来驱动业务决策以获得更大的商业价值.与此同时,出现了越来越多的大数据技术帮助企业进行大数据分析,例如 Apache Hadoop,Hive,Spark,Presto,Drill,以及今天我们即将介绍的 Apache Kylin 和 Apache Phoenix 项目等,都是使用 SQL 语言就可以分析大数据,极大地降低了大数据的使用门槛.这些大数据技术提供 SQL 查询接口,不只是因为 SQL 学习成本低,同时也和 SQL

知识图谱文献综述（第三章实体识别与链接）

第三章实体识别与链接 1. 任务定义.目标和研究意义实体是文本中承载信息的重要语言单位,也是知识图谱的核心单元. 命名实体识别是指识别文本中的命名性实体,并将其划分到指定类别的任务[Chinchor & Robinson, 1997].常用实体类别包括人名.地名.机构名.日期等. 实体链接主要解决实体名的歧义性和多样性问题,是指将文本中实体名指向其所代表的真实世界实体的任务,也通常被称为实体消歧.例如,给一句话“苹果发布了最新产品 iPhone X”,实体链接系统需要将文本中的“苹果”

MySQL数据库基础知识

day02 MySQL数据库基础知识一.基础知识概述: 基础决定你这门课程的学习成败!只有学习好这些基础知识以后,你才能真正的运用自如.才能够对数据库有更深入的了解,道路才会越走越远. 二.基础知识: 1.数据库(database):数据库就好比是一个物理的文档柜,一个容器,把我们整理好的数据表等等归纳起来. 创建数据库命令: create database 数据库名; 2.查看数据库 show databases; 3.打开指定的数据库 use

前端里移动端到底比pc端多哪些知识?

前端里移动端到底比pc端多哪些知识,为啥面试时好多公司都问h5水平如何? 我做过几年的web前端开发,就简单谈谈自己的感受吧. 首先来看看PC端和移动端在前端开发上的一些区别: (1)PC考虑的是浏览器兼容性,移动端开发考虑的更多的是手机兼容性,因为目前不管是android手机还是ios手机,一般浏览器用的都是webkit内核,所以做移动端开发,更多考虑的应该是手机分辨率的适配,和不同操作系统的略微差异化: (2)在部分事件的处理上,移动端自然是偏向于触屏的,所以触屏事件的一些规律要多摸索一下,

linux入门基础知识及简单命令介绍

linux入门基础知识介绍 1.计算机硬件组成介绍计算机主要由cpu(运算器.控制器),内存,I/O,外部存储等构成. cpu主要是用来对二进制数据进行运算操作,它从内存中取出数据,然后进行相应的运算操作.不能从硬盘中直接取数据. 内存从外部存储中取出数据供cpu运存.内存的最小单位是字节(byte) 备注:由于32的cpu逻辑寻址能力最大为32内存单元.因此32位cpu可以访问的最大内存空间为:4GB,算法如下: 2^32=2^10*2^10*2^10*2^2 =1024*1024*1024

认知,构建个人的知识体系(上)

1.前言本文将聊聊我对构建个人知识体系的一些想法,主要是为了提升自我认知.从个人经历开始,谈谈对知识的划分,也就是一个是什么,为什么的过程. 2.缘起把时间回到一年前,那时候我工作快一年了,得益于前面的一些努力,工作比较顺利.特别是技术上,没有遇到太多过无法解决的问题.同时也开始迷茫,工作难道就是这个轻松的样子?三五年之后那不是很无趣,该怎么办? 想找到这个问题的答案,而最好的方式莫过于,亲自去了解那些三五年工作经验的人是怎么的样子. 因此从那时候起,关注了不少来公司面试的人的简历,也有过几