之前部署公司BI项目例子,发现数据库表都没有设置主键、外键,一直以为是模拟项目,不严谨要求的原因。今天才知道数据仓库本来就不设计主键和外键。这些约束在ETL编程的时候就该做好,保证在满足数据源约束的所有数据都可以流进数据仓库。
Primary Key is there ... but Enforcing the primary key constraint in database level is not required.
If you think about this, technically a unique key or primary key is a key that uniquely defines the characteristics of each row. And it can be composed of more than one attributes of that entity. Now in the case of a Fact table, foreign keys flowing-in from the other dimension tables together already act as a compounded primary key. And these foreign-key combinations can uniquely identify each record in the fact table. So, this foreign key combination is the primary key for the fact table. Why not a Surrogate Key then? Now if you wanted, you could have defined one surrogate key for the fact table. But what purpose would that serve? You are never going to retrieve one record from that fact table referring its surrogate key (use Indexes instead). Neither you are going to use that surrogate key to join the fact with other tables. Such a surrogate key will be completely waste of space in the database. Enforcing Database Constraints When you define this conceptual primary key in the database level, database needs to ensure that this constraint is not getting violated in any of the DML operation performed over it. Ensuring this constraint is a overhead for your database. It might be insignificant for an OLTP system, but for a large OLAP system where data are loaded in batch, this may incur significant performance penalties. Beside, why do you want your database to ensure the integrity of the constraints when you can ensure the same during the data loading phase itself (typically through your ETL coding).
不需要主键,因为该记录各个维度的组合已经唯一定义了该记录。
不需要代理键,因为我们用索引检索数据,也用不到代理键关联其他表。
不需要多余的约束,因为我们优先保证所有清洗出来的数据都能够流进仓库。确保数据完备和一致是ETL编码的事。