It's import to know before start learning DW/BI

Data Warehousing and Business Intelligence

Differences Between Data Warehousing and Business Intelligence

Filed under: Business Intelligence,Data Warehousing — Vincent Rainardi @ 6:38 pm 
Tags: Business IntelligenceData Warehousing

Try asking your colleague what is the difference between business intelligence and a data warehouse. I find that a lot of people, even those who work in BI projects and BI industry, do not understand the difference. A lot of people use these 2 terms interchangeably. Some people even prefer to use 1 term instead of the other because it simply “sounds better”. Many people think that business intelligence is not just a data warehouse, but there is more to it. But when asked “what business intelligence systems are not data warehouse systems?” or “what part of business intelligence systems are not data warehouses?”, most of them have difficulties explaining the answer.

These days, “business intelligence” is the norm used by most vendors in the industry, rather than “data warehouse”. Most of them call / classify their tools as business intelligence software, not data warehouse software. The name of Cognos product is “Cognos 8 Business Intelligence”. BusinessObjects label themselves as “BI software company” and “global leader in BI software”. The name of one of Hyperionproducts is “Hyperion System 9 BI+”. SAS Enterprise BI Server provides a fully integrated and comprehensive suite of business intelligence software. Microsoft promotes SQL Server 2005 as the end-to-end business intelligence platform. It seems that only Kimball Group who consistently use the term data warehouse. Bill Inmon, as the inventor of this term, also uses the term data warehouse.

So, let’s get into the details. This is an example of a data warehouse system:

It includes ETL from the source system, front end applications (those 10 boxes on the right hand side), and everything in between. It has a control system, an audit system and a data quality system (also known as data firewall). Not all data warehouse systems have all the components pictured above, for example, some data warehouse system may not have operational data stored (ODS), see this article for details.

The 2 blue items are data warehouse databases. The cylinder is in relational format (labelled as dimensional data store, DDS for short), the box is in multidimensional format (labelled as cubes in the picture above). This blue cube is also known as on line analytical processing cube, or OLAP cube for short.

The yellow items are business intelligence applications. Most business intelligence applications take data from multidimensional format data warehouse, but some do take data from the relational format. The whole diagram above is also known as business intelligence system.

Some business intelligence applications take data directly from the source system. For example, some dashboard systems may get sales summary data from the source system and display it in gauge meter format. In this case, we can not call the system a data warehouse system. It is still a business intelligence system, but it is not a data warehouse system, because it does not have a data warehouse database behind the gauge meter application.

Business intelligence systems, in the past also known as executive information systems, or decision support systems, are a non-transactional IT system used to support business decision making and solve management problems, normally used by top executives and managers. Many varied definitions exist in the market place today about the business intelligence system; one from Dr. Jay Liebowitz is arguably one of the better ones. Most people agree that OLAP and data warehouse systems are a major and important part of business intelligence systems. Most business intelligence systems are in the form of a data warehouse systems. Yes, there are business intelligence systems that do not use OLAP or data warehouses, as illustrated in the example of gauge meter application above, but they are more rare than the ones with OLAP or a data warehouse.

According to Ralph Kimball, in his book The Data Warehouse ETL Toolkit, a data warehouse is a system that extracts, cleans, conforms, and delivers source data into a dimensional data store and then supports and implements querying and analysis for the purpose of decision making. He stressed that a data warehouse is not a product, a language, a project, a data model or a copy of transaction system. In an interview with Professional Association for SQL Server (PASS) on 30th April 2004, he explained about the relationship between data warehousing and business intelligence.

In their latest book, The Microsoft Data Warehouse Toolkit, Joy Mundy and Warren Thornthwaite do not differentiate data warehouse systems and business intelligence systems. They consistently use the term DW/BI system throughout the book. This is understandable because, as I describe above, most business intelligence systems are in the form of a data warehouse system.

Bill Inmon, who invented the term data warehouse, defines data warehouse as a source of data that is subject oriented, integrated, nonvolatile and time variant for the purpose of management’s decision processes. He pointed that the term data warehouse was never trademarked or copyrighted. As a result, anyone can call anything a data warehouse. He recently defined a new term, DW 2.0, and this one is trademarked so nobody can change the definition. He explained the architecture in his article in dmreview, along with the differences between the first generation of data warehouses and DW 2.0 and its advantages.

So, as a summary, back to the original question, what is the difference between data warehouse and business intelligence? Most business intelligence systems are based on data warehouse systems (the one with dimensional model, fact tables, dimension, etc), but some business intelligence systems are not data warehousing, i.e. taking data directly from the source system, like the example described above. Business intelligence application (as opposed to business intelligence system) is the yellow boxes on the diagram above, i.e. the front end applications. The data warehouse database (or sometimes people dropped the word database, so it becomes just ‘data warehouse’) is the blue cylinder and blue box on the diagram above, i.e. the dimensional storage, whether in relational database format or in multidimensional database format.

If people say ‘data warehouse’, be careful because it can mean either data warehouse system (the whole diagram above) or data warehouse database (just the blue items). If people say ‘business intelligence’, it can mean either business intelligence system (the whole diagram above, or a BI system without data warehouse) or business intelligence application (the yellow boxes).

I hope this article makes the terms clearer, but I am open to comments and suggestions. As Ralph Kimball said, if you ask 10 different people what data warehouse is you are likely to get 10 different answers.

Vincent Rainardi
1st May 2006

This is a repost from SQLServerCentral.

It's import to know before start learning DW/BI

时间: 2024-08-02 15:13:45

It's import to know before start learning DW/BI的相关文章

Python 内置函数 2018-08-02

abs():返回一个数的绝对值 divmod(a,b): 结合除法和取模运算,返回一个包含商和模的元祖(a//b, a%b), 在 python 2.3 版本之前不允许处理复数. >>> divmod(7,2) (3, 1) >>> divmod(-7,2) (-4, 1) >>> divmod(1+2j,1+0.5j) ((1+0j), 1.5j) input()和raw_input() Python3.x 中 input() 函数接受一个标准输入数

Python单行函数lambda(小米)加reduce、map、filter(步枪)应用

什么是lambda? lambda定义匿名函数,并不会带来程序运行效率的提高,只会使代码更简洁.为了减少单行函数的定义而存在的. lambda的使用大量简化了代码,使代码简练清晰.但是值得注意的是,这会在一定程度上降低代码的可读性.如果不是非常熟悉Python的人也许会对此很难理解. 如果可以使用for...in...if来完成的,坚决不用lambda. 如果使用lambda,lambda内不要包含循环,如果有,宁愿定义函数来完成,使代码获得可重用性和更好的可读性.如果你对你就喜欢用lambda

pytest参数化的两种方式

1.传统方式 1 import requests,pytest 2 from Learning.variable import * 3 4 # 定义变量 5 #url = "https://www.baidu.com" 6 7 class TestClass(object): 8 global url #在此获取全局变量,并将其设置为TestClass类的全局变量 9 def setup_class(self): 10 print("start...") 11 12

【tf.keras】AdamW: Adam with Weight decay

论文 Decoupled Weight Decay Regularization 中提到,Adam 在使用时,L2 regularization 与 weight decay 并不等价,并提出了 AdamW,在神经网络需要正则项时,用 AdamW 替换 Adam+L2 会得到更好的性能. TensorFlow 2.0 在 tensorflow_addons 库里面实现了 AdamW,目前在 Mac 和 Linux 上可以直接pip install tensorflow_addons进行安装,在

『cs231n』作业2选讲_通过代码理解优化器

1).Adagrad一种自适应学习率算法,实现代码如下: cache += dx**2 x += - learning_rate * dx / (np.sqrt(cache) + eps) 这种方法的好处是,对于高梯度的权重,它们的有效学习率被降低了:而小梯度的权重迭代过程中学习率提升了.要注意的是,这里开根号很重要.平滑参数eps是为了避免除以0的情况,eps一般取值1e-4 到1e-8. 2).RMSpropRMSProp方法对Adagrad算法做了一个简单的优化,以减缓它的迭代强度: ca

PowerBI开发 第十篇:R 脚本

R是一种专门用于数据分析和统计的脚本语言,广泛应用在每一个需要统计和数据分析的领域.PowerBI支持R脚本,两者强强结合,使PowerBI的功能更加强大.PowerBI Desktop默认没有安装R,在使用R脚本之前,必须向PowerBI Desktop中安装R引擎.用户可以使用R脚本加载数据.对数据进行转换和处理.使用R脚本图形化显示数据,这意味着,PowerBI对R的支持是深度融合的,在数据处理的各个阶段都能使用R.而且,为了便于开发人员使用R进行编程,PowerBI可以直接调用R外部ID

django 之csrf、auth模块及settings源码、插拔式设计

目录 基于django中间件拷贝思想 跨站请求伪造简介 跨站请求伪造解决思路 方式1:form表单发post请求解决方法 方式2:ajax发post请求解决方法 csrf相关的两个装饰器 csrf装饰器在CBV上的特例 django settings源码 auth模块简介 auth创建用户 auth扩展表 基于django settings配置文件实现插拔式设计 csrf:Cross Site Request Forgery protection 基于django中间件拷贝思想 # start.

Spark MLlib Deep Learning Convolution Neural Network (深度学习-卷积神经网络)3.1

3.Spark MLlib Deep Learning Convolution Neural Network (深度学习-卷积神经网络)3.1 http://blog.csdn.net/sunbow0 Spark MLlib Deep Learning工具箱,是根据现有深度学习教程<UFLDL教程>中的算法,在SparkMLlib中的实现.具体Spark MLlib Deep Learning(深度学习)目录结构: 第一章Neural Net(NN) 1.源码 2.源码解析 3.实例 第二章D

PLA Percentron Learning Algorithm #台大 Machine learning #

Percentron Learning Algorithm 于垃圾邮件的鉴别 这里肯定会预先给定一个关于垃圾邮件词汇的集合(keyword set),然后根据四组不通过的输入样本里面垃圾词汇出现的频率来鉴别是否是垃圾邮件.系统输出+1判定为垃圾邮件,否则不是.这里答案是第二组. 拿二维数据来做例子.我们要选取一条线来划分红色的叉叉,和蓝色的圈圈样本点(线性划分).怎么做呢?这里的困难之处就在于,其实可行的解可能存在无数条直线可以划分这些样本点.很难全部求解,或许实际生活中并不需要全部求解.于是,