What is the difference between a Clustered and Non Clustered Index?

A clustered index determines the order in which the rows of a table are stored on disk. If a table has a clustered index, then the rows of that table will be stored on disk in the same exact order as the clustered index. An example will help clarify what we mean by that.

An example of a clustered index

Suppose we have a table named Employee which has a column named EmployeeID. Let’s say we create a clustered index on the EmployeeID column. What happens when we create this clustered index? Well, all of the rows inside the Employee table will bephysically – sorted (on the actual disk) – by the values inside the EmployeeID column. What does this accomplish? Well, it means that whenever a lookup/search for a sequence of EmployeeID’s is done using that clustered index, then the lookup will be much faster because of the fact that the sequence of employee ID’s are physically stored right next to each other on disk – that is the advantage with the clustered index. This is because the rows in the table are sorted in the exact same order as the clustered index, and the actual table data is stored in the leaf nodes of the clustered index.

Remember that an index is usually a tree data structure – and leaf nodes are the nodes that are at the very bottom of that tree. In other words, a clustered index basically contains the actual table level data in the index itself. This is very different from most other types of indexes as you can read about below.

When would using a clustered index make sense?

Let’s go through an example of when and why using a clustered index would actually make sense. Suppose we have a table named Owners and a table named Cars. This is what the simple schema would look like – with the column names in each table:

Owners
Owner_Name
Owner_Age

Cars
Car_Type
Owner_Name

Let’s assume that a given owner can have multiple cars – so a single Owner_Name can appear multiple times in the Cars table.
Now, let’s say that we create a clustered index on the Owner_Name column in the Cars table. What does this accomplish for us? Well, because a clustered index is stored physically on the disk in the same order as the index, it would mean that a given Owner_Name would have all his/her car entries stored right next to each other on disk. In other words, if there is an owner named “Joe Smith” or “Raj Gupta”, then each owner would have all of his/her entries in the Cars table stored right next to each other on the disk.

When is using a clustered index an advantage?

What is the advantage of this? Well, suppose that there is a frequently run query which tries to find all of the cars belonging to a specific owner. With the clustered index, since all of the car entries belonging to a single owner would be right next to each other on disk,the query will run much faster than if the rows were being stored in some random order on the disk. And that is the key point to remember!

Why is it called a clustered index?

In our example, all of the car entries belonging to a single owner would be right next to each other on disk. This is the “clustering”, or grouping of similar values, which is referred to in the term “clustered” index.

Note that having an index on the Owner_Name would not necessarily be unique, because there are many people who share the same name. So, you might have to add another column to the clustered index to make sure that it’s unique.

What is a disadvantage to using a clustered index?

A disadvantage to using a clustered index is the fact that if a given row has a value updated in one of it’s (clustered) indexed columns what typically happens is that the database will have to move the entire row so that the table will continue to be sorted in the same order as the clustered index column. Consider our example above to clarify this. Suppose that someone named “Rafael Nadal” buys a car – let’s say it’s a Porsche – from “Roger Federer”. Remember that our clustered index is created on the Owner_Name column. This means that when we do a update to change the name on that row in the Cars table, the Owner_Name will be changed from “Roger Federer” to “Rafael Nadal”.

But, since a clustered index also tells the database in which order to physically store the rows on disk, when the Owner_Name is changed it will have to move an updated row so that it is still in the correct sorted order. So, now the row that used to belong to “Roger Federer” will have to be moved on disk so that it’s grouped (or clustered) with all the car entries that belong to “Rafael Nadal”. Clearly, this is a performance hit. This means that a simple UPDATE has turned into a DELETE and then an INSERT – just to maintain the order of the clustered index. For this exact reason, clustered indexes are usually created on primary keys or foreign keys, because of the fact that those values are less likely to change once they are already a part of a table.

A comparison of a non-clustered index with a clustered index with an example

As an example of a non-clustered index, let’s say that we have a non-clustered index on the EmployeeID column. A non-clustered index will store both the value of the EmployeeID AND a pointer to the row in the Employee table where that value is actually stored. But a clustered index, on the other hand, will actually store the row data for a particular EmployeeID – so if you are running a query that looks for an EmployeeID of 15, the data from other columns in the table like EmployeeName, EmployeeAddress, etc. will all actually be stored in the leaf node of the clustered index itself.

This means that with a non-clustered index extra work is required to follow that pointer to the row in the table to retrieve any other desired values, as opposed to a clustered index which can just access the row directly since it is being stored in the same order as the clustered index itself. So, reading from a clustered index is generally faster than reading from a non-clustered index.

A table can have multiple non-clustered indexes

A table can have multiple non-clustered indexes because they don’t affect the order in which the rows are stored on disk like clustered indexes.

Why can a table have only one clustered index?

Because a clustered index determines the order in which the rows will be stored on disk, having more than one clustered index on one table is impossible. Imagine if we have two clustered indexes on a single table – which index would determine the order in which the rows will be stored? Since the rows of a table can only be sorted to follow just one index, having more than one clustered index is not allowed.

Summary of the differences between clustered and non-clustered indexes

Here’s a summary of the differences:

  • A clustered index determines the order in which the rows of the table will be stored on disk – and it actually stores row level data in the leaf nodes of the index itself. A non-clustered index has no effect on which the order of the rows will be stored.
  • Using a clustered index is an advantage when groups of data that can be clustered are frequently accessed by some queries. This speeds up retrieval because the data lives close to each other on disk. Also, if data is accessed in the same order as the clustered index, the retrieval will be much faster because the physical data stored on disk is sorted in the same order as the index.
  • A clustered index can be a disadvantage because any time a change is made to a value of an indexed column, the subsequent possibility of re-sorting rows to maintain order is a definite performance hit.
  • A table can have multiple non-clustered indexes. But, a table can have only one clustered index.
  • Non clustered indexes store both a value and a pointer to the actual row that holds that value. Clustered indexes don’t need to store a pointer to the actual row because of the fact that the rows in the table are stored on disk in the same exact order as the clustered index – and the clustered index actually stores the row-level data in it’s leaf nodes.
时间: 2024-10-12 21:10:12

What is the difference between a Clustered and Non Clustered Index?的相关文章

Skipping clustered volume group

问题:在这次RHCE考试中,使用图形化创建LVM时候不小心勾选了集群,结果出现以下错误: pvcreate command failed. Command attempted: "/sbin/pvcreate -M 2 /dev/sda5" - System Error Message:   Can't initialize physical volume "/dev/sda5" of volume group "clustered" witho

Oracle 11g Articles

发现一个比较有意思的网站,http://www.oracle-base.com/articles/11g/articles-11g.php Oracle 11g Articles Oracle Database 11g: New Features For Administrators OCP Exam Articles Oracle Database 11g Release 1: Miscellaneous Articles Oracle Database 11g Release 2: Misc

JI_4

(1) OOP and Design Patterns (1.1) Please explain difference among class, interface and abstract class. When you would use it rather than other? Feature Interface Abstract class Multiple inheritance A class may implement multiple interfaces. A class m

生成建表脚本up_CreateTable

已经很久没用使用这个脚本了,今天用到,并做修改,增加了生成扩展属性功能. Go if object_ID('[up_CreateTable]') is not null Drop Procedure [up_CreateTable] Go /* 生成建表脚本(V4.0) Andy 2017-3-28 */ Create Proc up_CreateTable ( @objectList nvarchar(max)=null ) as --With ENCRYPTION /* 参数说明: @obj

数据库设计开发规范

1 数据库命名约定  1.1 规则 (1) 命名富有意义英文词汇,多个单词组成的,中间以下划线分割. (2) 除数据库名称长度为1-8个字符,其余为1-30个字符,dblink名称也不要超过30个字符. (3)命名只能使用英文字母,数字和下划线,字母全部小写 (4)避免使用Oracle的保留字如level.关键字如type. 1. 2系统模块 编号 名称 英文 缩写 1 系统管理 system sys 2 配置管理 dictionary dic 3 设备系统 equipment equ 4 通讯

文档管理

库名: DocSystem 路径: E:\wzs\x.文档\C#文档系统\2.数据库 DocArticle 文章表 2012-05-23 20:51:02 序号 字段名 字段说明 自动 主键 类型 字节数 长度 小数位数 非空 关联表 关联编码 关联名称 其它说明 1 ArticleId 文档编号 √ √ int 4 4 0 √ 2 ArticleTitle 文档标题 nvarchar 200 100 0 3 MenuId 目录编号 int 4 4 0 DocMenu MenuId MenuNa

Identifying Duplicate Indexes

本文是在阅读<Troubleshooting SQL Server>->Chapter 5: Missing Indexes->Identifying Duplicate Indexes时,文中提及两个处理重复索引的链接.此处整理链接文章,方便自己后期查看,详细内容请参考原文:How can you tell if an index is REALLY a duplicate? and Removing duplicate indexes一.创建新的存储过程,返回表上的索引信息St

Hadoop实现Clustering by fast search and find of density peaks

Hadoop实现Clustering by fast search and find of density peaks 本篇博客参考:Clustering by fast search and find of density peaks论文以及http://www.cnblogs.com/peghoty/p/3945653.html. Hadoop版本:2.6.0,Myeclipse:10.0 代码可在https://github.com/fansy1990/fast_cluster下载. 1.

Dalvik虚拟机Java堆创建过程分析

使用C/C++开发应用程序最令头痛的问题就是内存管理.慎不留神,要么内存泄漏,要么内存破坏.虚拟机要解决的问题之一就是帮助应用程序自动分配和释放内存.为了达到这个目的,虚拟机在启动的时候向操作系统申请一大块内存当作对象堆.之后当应用程序创建对象时,虚拟机就会在堆上分配合适的内存块.而当对象不再使用时,虚拟机就会将它占用的内存块归还给堆.Dalvik虚拟机也不例外,本文就分析它的Java堆创建过程. 老罗的新浪微博:http://weibo.com/shengyangluo,欢迎关注! 从前面Da