数据库设计(三)11 important database designing rules which I follow

原文地址:https://www.codeproject.com/Articles/359654/important-database-designing-rules-which-I-fo

麻辣个??的,好像在哪儿看到过这篇文章的中文版的,世风日下,人心不古啊,各种抄袭啊。

原文的锚点不知道是太长了,还是格式有问题,这里要改一下。

Table of Contents

  • Introduction
  • Rule 1: What is the nature of the application (OLTP or OLAP)?
  • Rule 2: Break your data in to logical pieces, make life simpler
  • Rule 3: Do not get overdosed with rule 2
  • Rule 4: Treat duplicate non-uniform data as your biggest enemy
  • Rule 5: Watch for data separated by separators
  • Rule 6: Watch for partial dependencies
  • Rule 7: Choose derived columns preciously
  • Rule 8: Do not be hard on avoiding redundancy, if performance is the key
  • Rule 9: Multidimensional data is a different beast altogether
  • Rule 10: Centralize name value table design
  • Rule 11: For unlimited hierarchical data self-reference PK and FK

Courtesy: Image from Motion pictures

Introduction

Before you start reading this article let me confirm to you I am not a guru in database designing. The below 11 points are what I have learnt via projects, my own experiences, and my own reading. I personally think it has helped me a lot when it comes to DB designing. Any criticism is welcome.

The reason I am writing a full blown article is, when developers design a database they tend to follow the three normal forms like a silver bullet. They tend to think normalization is the only way of designing. Due this mind set they sometimes hit road blocks as the project moves ahead.

If you are new to normalization, then click and see 3 normal forms in action which explains all the three normal forms step by step.

Said and done normalization rules are important guidelines but taking them as a mark on stone is calling for trouble. Below are my own 11 rules which I remember on the top of my head while doing DB design.

Rule 1: What is the nature of the application (OLTP or OLAP)?

When you start your database design the first thing to analyze is the nature of the application you are designing for, is it Transactional or Analytical. You will find many developers by default applying normalization rules without thinking about the nature of the application and then later getting into performance and customization issues. As said, there are two kinds of applications: transaction based and analytical based, let’s understand what these types are.

Transactional: In this kind of application, your end user is more interested in CRUD, i.e., creating, reading, updating, and deleting records. The official name for such a kind of database is OLTP.

Analytical: In these kinds of applications your end user is more interested in analysis, reporting, forecasting, etc. These kinds of databases have a less number of inserts and updates. The main intention here is to fetch and analyze data as fast as possible. The official name for such a kind of database is OLAP.

In other words if you think inserts, updates, and deletes are more prominent then go for a normalized table design, else create a flat denormalized database structure.

Below is a simple diagram which shows how the names and address in the left hand side are a simple normalized table and by applying a denormalized structure how we have created a flat table structure.

Rule 2: Break your data into logical pieces, make life simpler

This rule is actually the first rule from 1st normal form. One of the signs of violation of this rule is if your queries are using too many string parsing functions like substring, charindex, etc., then probably this rule needs to be applied.

For instance you can see the below table which has student names; if you ever want to query student names having “Koirala” and not “Harisingh”, you can imagine what kind of a query you will end up with.

So the better approach would be to break this field into further logical pieces so that we can write clean and optimal queries.

Rule 3: Do not get overdosed with rule 2

Developers are cute creatures. If you tell them this is the way, they keep doing it; well, they overdo it leading to unwanted consequences. This also applies to rule 2 which we just talked above. When you think about decomposing, give a pause and ask yourself, is it needed? As said, the decomposition should be logical.

For instance, you can see the phone number field; it’s rare that you will operate on ISD codes of phone numbers separately (until your application demands it). So it would be a wise decision to just leave it as it can lead to more complications.

Rule 4: Treat duplicate non-uniform data as your biggest enemy

Focus and refactor duplicate data. My personal worry about duplicate data is not that it takes hard disk space, but the confusion it creates.

For instance, in the below diagram, you can see “5th Standard” and “Fifth standard” means the same. Now you can say the data has come into your system due to bad data entry or poor validation. If you ever want to derive a report, they would show them as different entities, which is very confusing from the end user point of view.

One of the solutions would be to move the data into a different master table altogether and refer them via foreign keys. You can see in the below figure how we have created a new master table called “Standards” and linked the same using a simple foreign key.

Rule 5: Watch for data separated by separators

The second rule of 1st normal form says avoid repeating groups. One of the examples of repeating groups is explained in the below diagram. If you see the syllabus field closely, in one field we have too much data stuffed. These kinds of fields are termed as “Repeating groups”. If we have to manipulate this data, the query would be complex and also I doubt about the performance of the queries.

These kinds of columns which have data stuffed with separators need special attention and a better approach would be to move those fields to a different table and link them with keys for better management.

So now let’s apply the second rule of 1st normal form: “Avoid repeating groups”. You can see in the above figure I have created a separate syllabus table and then made a many-to-many relationship with the subject table.

With this approach the syllabus field in the main table is no more repeating and has data separators.

Rule 6: Watch for partial dependencies

Watch for fields which depend partially on primary keys. For instance in the above table we can see the primary key is created on roll number and standard. Now watch the syllabus field closely. The syllabus field is associated with a standard and not with a student directly (roll number).

The syllabus is associated with the standard in which the student is studying and not directly with the student. So if tomorrow we want to update the syllabus we have to update it for each student, which is painstaking and not logical. It makes more sense to move these fields out and associate them with the Standard table.

You can see how we have moved the syllabus field and attached it to the Standards table.

This rule is nothing but the 2nd normal form: “All keys should depend on the full primary key and not partially”.

Rule 7: Choose derived columns preciously

If you are working on OLTP applications, getting rid of derived columns would be a good thought, unless there is some pressing reason for performance. In case of OLAP where we do a lot of summations, calculations, these kinds of fields are necessary to gain performance.

In the above figure you can see how the average field is dependent on the marks and subject. This is also one form of redundancy. So for such kinds of fields which are derived from other fields, give a thought: are they really necessary?

This rule is also termed as the 3rd normal form: “No column should depend on other non-primary key columns”. My personal thought is do not apply this rule blindly, see the situation; it’s not that redundant data is always bad. If the redundant data is calculative data, see the situation and then decide if you want to implement the 3rdnormal form.

Rule 8: Do not be hard on avoiding redundancy, if performance is the key

Do not make it a strict rule that you will always avoid redundancy. If there is a pressing need for performance think about de-normalization. In normalization, you need to make joins with many tables and in denormalization, the joins reduce and thus increase performance.

Rule 9: Multidimensional data is a different beast altogether

OLAP projects mostly deal with multidimensional data. For instance you can see the below figure, you would like to get sales per country, customer, and date. In simple words you are looking at sales figures which have three intersections of dimension data.

For such kinds of situations a dimension and fact design is a better approach. In simple words you can create a simple central sales fact table which has the sales amount field and it makes a connection with all dimension tables using a foreign key relationship.

Rule 10: Centralize name value table design

Many times I have come across name value tables. Name and value tables means it has key and some data associated with the key. For instance in the below figure you can see we have a currency table and a country table. If you watch the data closely they actually only have a key and value.

For such kinds of tables, creating a central table and differentiating the data by using a type field makes more sense.

Rule 11: For unlimited hierarchical data self-reference PK and FK

Many times we come across data with unlimited parent child hierarchy. For instance consider a multi-level marketing scenario where a sales person can have multiple sales people below them. For such scenarios, using a self-referencing primary key and foreign key will help to achieve the same.

This article is not meant to say that do not follow normal forms, instead do not follow them blindly, look at your project‘s nature and the type of data you are dealing with first.

Below is a video which explains the three normal forms step by step using a simple school table.

You can also visit my website for step by step videos on Design Patterns, UML, SharePoint 2010, .NET Fundamentals, VSTS, UML, SQL Server, MVC, and lots more.

原文地址:https://www.cnblogs.com/tuhooo/p/8461375.html

时间: 2024-09-30 10:40:49

数据库设计(三)11 important database designing rules which I follow的相关文章

数据库设计——三范式概念+实战

在利用三范式设计数据库的时候,以前总以为是先画完ER图,然后导出关系模式,最后用三范式去检验数据库设计的是否合理,but not!我们在一开始画ER图的时候,就应当和三范式联系起来,将错误消灭在源头.为了能最早的检验出错误,我们就要对ER图转换成关系模式的算法和三范式是如何消除冗余,避免冲突有深刻的了解,才能知道如何最早发现错误. 本文主要以机房收费系统数据库设计中的一些东西为例,结合三范式概念,简述下三范式. 一,1NF 定义: 如果关系模式R的每个关系r的属性值都是不可分的原子值,那么称R是

数据库设计三范式理解

数据库设计的第三范式 关系数据库中的关系必须满足一定的要求.满足不同程度要求的为不同范式.数据库的设计范式是数据库设计所需要满足的规范.只有理解数据库的设计范式,才能设计出高效率.优雅的数据库,否则可能会设计出错误的数据库. 目前,主要有六种范式:第一范式.第二范式.第三范式.BC范式.第四范式和第五范式.满足最低要求的叫第一范式,简称1NF.在第一范式基础上进一步满足一些要求的为第二范式,简称2NF.其余依此类推. 范式可以避免数据冗余,减少数据库的空间,减轻维护数据完整性的麻烦,但是操作困难

数据库设计——三范式

关系型数据库是现在广泛应用的数据库类型,对关系型数据库的设计就是对数据进行组织化和结构化的过程.对于小规模的数据库我们处理起来还是比较轻松地,但是随着数据库规模的扩大我们将发现用户操控数据库的SQL语句将变得笨拙.复杂.更糟糕的是很有可能导致数据不完整,不准确.所以我们有必要将数据设计的更加符合规范. 在实际开发中最为常见的设计范式有三个: 1.第一范式 即表的列的具有原子性,不可再分解,即列的信息,不能分解, 只有数据库是关系型数据库(mysql/oracle/db2/informix/sys

mysql数据库设计三范式

为了建立冗余较小.结构合理的数据库,设计数据库时必须遵循一定的规则.在关系型数据库中这种规则就称为范式.范式是符合某一种设计要求的总结.要想设计一个结构合理的关系型数据库,必须满足一定的范式. 在实际开发中最为常见的设计范式有三个: 1.第一范式(确保每列保持原子性) 第一范式是最基本的范式.如果数据库表中的所有字段值都是不可分解的原子值,就说明该数据库表满足了第一范式. 第一范式的合理遵循需要根据系统的实际需求来定.比如某些数据库系统中需要用到“地址”这个属性,本来直接将“地址”属性设计成一个

数据库设计 三范式

1NF:字段不可分; 2NF:有主键,非主键字段依赖主键; 3NF:非主键字段不能相互依赖; 解释: 1NF:原子性 字段不可再分,否则就不是关系数据库; 2NF:唯一性 一个表只说明一个事物; 3NF:每列都与主键有直接关系,不存在传递依赖; 不符合第一范式的例子(关系数据库中create不出这样的表): 表:字段1, 字段2(字段2.1, 字段2.2), 字段3 ...... 存在的问题: 因为设计不出这样的表, 所以没有问题; 不符合第二范式的例子: 表:学号, 姓名, 年龄, 课程名称,

数据库设计三范式

1.第一范式(确保每列保持原子性) 第一范式是最基本的范式.如果数据库表中的所有字段值都是不可分解的原子值,就说明该数据库表满足了第一范式. 第一范式的合理遵循需要根据系统的实际需求来定.比如某些数据库系统中需要用到"地址"这个属性,本来直接将"地址"属性设计成一个数据库表的字段就行.但是如果系统经常会访问"地址"属性中的"城市"部分,那么就非要将"地址"这个属性重新拆分为省份.城市.详细地址等多个部分进行

学生成绩数据库设计 三 模拟数据

1 基础数据 1 /*一 模拟数据说明:从2000年到当年,每年添加100个学生*/ 2 Declare @StuCount int, /*每年添加的数量*/ 3 @StartYear int,/*初始年份*/ 4 @CurYear int /*当前年份*/ 5 Begin 6 /*设置添加数据的初始值*/ 7 SET @StuCount=100 8 SET @StartYear=2010 9 SET @CurYear=YEAR(GETDATE()) 10 11 /*1 向学年表添加数据*/ 1

十四、数据库设计三范式

1.第一范式:主键.字段不能再分 定义:要求有主键,数据库中不能出现重复记录,每一个字段是原子性不能再分 2.第二范式:非主键字段完全依赖主键 定义:第二范式是建立在第一范式的基础之上,要求数据库中所有非主键字段完全依赖主键,不能产生部分依赖.(严格意义上讲,尽量不要使用联合主键)    在多对多的关系中,创建包含两张表主键的第三张关系表. 3.第三范式:非主键字段不能产生传递依赖于主键字段 定义:建立在第二范式的基础上,要求非主键字段不能产生传递依赖于主键字段       一对多的关系中,在多

11 个重要的数据库设计规则

英文原文: 11 Important Database designing rules 简介 在您开始阅读这篇文章之前,我得明确地告诉您,我并不是一个数据库设计领域的大师.以下列出的 11 点是我对自己在平时项目实践和阅读中学习到的经验总结出来的个人见解.我个人认为它们对我的数据库设计提供了很大的帮助.实属一家之言,欢迎拍砖 : ) 我之所以写下这篇这么完整的文章是因为,很多开发者一参与到数据库设计,就会很自然地把 “三范式” 当作银弹一样来使用.他们往往认为遵循这个规范就是数据库设计的唯一标准