Junk Dimension

In data warehouse design, frequently we run into a situation where there are yes/no indicator fields in the source system. Through business analysis, we know it is necessary to keep such information in the fact table. However, if keep all those indicator fields in the fact table, not only do we need to build many small dimension tables, but the amount of information stored in the fact table also increases tremendously, leading to possible performance and management issues.

Junk dimension is the way to solve this problem. In a junk dimension, we combine these indicator fields into a single dimension. This way, we‘ll only need to build a single dimension table, and the number of fields in the fact table, as well as the size of the fact table, can be decreased. The content in the junk dimension table is the combination of all possible values of the individual indicator fields.

Let‘s look at an example. Assuming that we have the following fact table:

In this example, TXN_CODE, COUPON_IND, and PREPAY_IND are all indicator fields. In this existing format, each one of them is a dimension. Using the junk dimension principle, we can combine them into a single junk dimension, resulting in the following fact table:

Note that now the number of dimensions in the fact table went from 7 to 5.

The content of the junk dimension table would look like the following:

In this case, we have 3 possible values for the TXN_CODE field, 2 possible values for the COUPON_IND field, and 2 possible values for the PREPAY_IND field. This results in a total of 3 x 2 x 2 = 12 rows for the junk dimension table.

By using a junk dimension to replace the 3 indicator fields, we have decreased the number of dimensions by 2 and also decreased the number of fields in the fact table by 2. This will result in a data warehousing environment that offer better performance as well as being easier to manage.

时间： 2024-10-07 22:19:36

Junk Dimension

Junk Dimension的相关文章

数据仓库基础术语名词一览

数据仓库专题（10）-文本事实和杂项维度

效率飞速提高Four Dimension Technologies GeoTools v17.0 1CD+AutoHook.2017.v1.0.3.00 1CD

Dimension类

机器学习基石第七讲 The VC Dimension

【The VC Dimension】林轩田机器学习基石

命题作文：Dimension Tree区间查找与IP数据包分类

SMD Package Footprint/Dimension/Datasheet

VC Dimension -衡量模型与样本的复杂度