Junk Dimension

In data warehouse design, frequently we run into a situation where there are yes/no indicator fields in the source system. Through business analysis, we know it is necessary to keep such information in the fact table. However, if keep all those indicator fields in the fact table, not only do we need to build many small dimension tables, but the amount of information stored in the fact table also increases tremendously, leading to possible performance and management issues.

Junk dimension is the way to solve this problem. In a junk dimension, we combine these indicator fields into a single dimension. This way, we‘ll only need to build a single dimension table, and the number of fields in the fact table, as well as the size of the fact table, can be decreased. The content in the junk dimension table is the combination of all possible values of the individual indicator fields.

Let‘s look at an example. Assuming that we have the following fact table:

In this example, TXN_CODE, COUPON_IND, and PREPAY_IND are all indicator fields. In this existing format, each one of them is a dimension. Using the junk dimension principle, we can combine them into a single junk dimension, resulting in the following fact table:

Note that now the number of dimensions in the fact table went from 7 to 5.

The content of the junk dimension table would look like the following:

In this case, we have 3 possible values for the TXN_CODE field, 2 possible values for the COUPON_IND field, and 2 possible values for the PREPAY_IND field. This results in a total of 3 x 2 x 2 = 12 rows for the junk dimension table.

By using a junk dimension to replace the 3 indicator fields, we have decreased the number of dimensions by 2 and also decreased the number of fields in the fact table by 2. This will result in a data warehousing environment that offer better performance as well as being easier to manage.

时间: 2024-10-07 22:19:36

Junk Dimension的相关文章

数据仓库基础术语名词一览

冰山查询――iceberg query 在数据仓库领域有一个概念叫Iceberg query,中文一般翻译为"冰山查询".冰山查询在一个属性或属性集上计算一个聚集函数,以找出大于某个指定阈值的聚集值. 以销售数据为例,你想产生这样的一个顾客-商品对的列表,这些顾客购买商品的数量达到3件或更多.这可以用下面的冰山查询表示: Select        P.cust_ID, P.item_ID, SUM(P.qty) From           Purchase P Group by  

数据仓库专题(10)-文本事实和杂项维度

一.杂项维度 在维度建模的数据仓库中,有一种维度叫Junk Dimension,中文一般翻译为“杂项维度”.杂项维度是由操作系统中的指示符或者标志字段组合而成,一般不在一致性维度之列. 在操作系统中,我们定义好各种维度后,通常还会剩下一些在小范围内取离散值的指示符或者标志字段.例如:支付类型字段,包括现金和信用卡两种类型,在源系统中它们可能是维护在类型表中,也可能直接保存在交易表中. 一张事实表中可能会存在好几个类似的字段,如果作为事实存放在事实表中,会导致事实表占用空间过大:如果单独建立维度表

效率飞速提高Four Dimension Technologies GeoTools v17.0 1CD+AutoHook.2017.v1.0.3.00 1CD

效率飞速提高Four Dimension Technologies GeoTools v17.0 1CD+AutoHook.2017.v1.0.3.00 1CD GeoTools v12.18 1CD     GeoTools写的是测绘.GIS用户心中最初但现在有这个程序它是有用的,只是任何AutoCAD用户相关的足够的命令.GeoTools现在是几乎所有的AutoCAD用户有用.它解决了很多常见的问题和地图生产的要求和编辑AutoCAD是地理数据的一个非常方便的工具捕获(GIS底图).处理.转

Dimension类

一.基本定义 1.Dimension类封装了单个对象中组件的宽度和高度(精确到整数): 2.该类与组件的某个属性相关联: 3.由Component类和LayoutManager接口定义的一些方法将返回Dimension对象: 4.通常width和height的值是非负整数 二.构造方法 构造方法摘要 Dimension()           创建 Dimension 的一个实例(宽度为零,高度为零). Dimension(Dimension d)           创建 Dimension 

机器学习基石第七讲 The VC Dimension

一.Definition of VC Dimension

【The VC Dimension】林轩田机器学习基石

首先回顾上节课末尾引出来的VC Bound概念,对于机器学习来说,VC dimension理论到底有啥用. 三点: 1. 如果有Break Point证明是一个好的假设集合 2. 如果N足够大,那么Ein跟Eout的表现会比较接近 3. 如果算法A选的g足够好(Ein很小),则可能从数据中学到了东西 ================================================== 现在正式引出VC Dimension的概念:啥叫VC Dimension: VC Dimensi

命题作文:Dimension Tree区间查找与IP数据包分类

这个题目有点大,而且我要严格控制字数,不能像<命题作文:在一棵IPv4地址树中彻底理解IP路由表的各种查找过程>那样扯得那么开了.事实上,这篇作文是上 一篇作文中关于区间查找小节的扩展. 1.IP数据包分类 根据IP数据包协议头的若干字段,也叫匹配域,将数据包划分到某个类别,这就是IP数据包分类的核心. 事实上,IP路由查找的过程就是IP数据包分类的一个特例,一个极其简单的特例,此时的匹配域就是目标IP地址,而类别就是路由项或者说更简单一点,下一 跳.此时考虑一下源地址Policy routi

SMD Package Footprint/Dimension/Datasheet

SMD Package  Footprint/Dimension/Datasheet     Resistor/Capacitor  diagram taken from Wiki 0603 Size: 1.5 mm × 0.8 mm (0.06" × 0.03")   0805 Size: 2.0 mm × 1.3 mm (0.08" × 0.05") 1206 Size: 3.0 mm × 1.5 mm (0.12" × 0.06") 181

VC Dimension -衡量模型与样本的复杂度

(1)定义VC Dimension: dichotomies数量的上限是成长函数,成长函数的上限是边界函数: 所以VC Bound可以改写成: 下面我们定义VC Dimension: 对于某个备选函数集H,VC Dimension就是它所能shatter的最大数据个数N.VC Dimension = minimum break point - 1.所以在VC Bound中,(2N)^(k-1)可以替换为(2N)^(VC Dimension).VC Dimension与学习算法A,输入分布P,目标