高性能MySQL笔记-第5章Indexing for High Performance-005聚集索引

一、聚集索引介绍

1.什么是聚集索引?

InnoDB’s clustered indexes actually store a B-Tree index and the rows together in the same structure.

2.为什么一张表只能一个聚集索引?

When a table has a clustered index, its rows are actually stored in the index’s leaf pages.The term “clustered” refers to the fact that rows with adjacent key values are stored close to each other.  You can have only one clustered index per table, because you can’t store the rows in two places at once. (However, covering indexes let you emulate mul-
tiple clustered indexes; more on this later.)

3.聚集索引的优点

• You can keep related data close together. For example, when implementing a mailbox, you can cluster by user_id , so you can retrieve all of a single user’s messages by fetching only a few pages from disk. If you didn’t use clustering, each message might require its own disk I/O.
• Data access is fast. A clustered index holds both the index and the data together in one B-Tree, so retrieving rows from a clustered index is normally faster than a comparable lookup in a nonclustered index.
• Queries that use covering indexes can use the primary key values contained at the leaf node.

4.聚集索引的缺点

• Clustering gives the largest improvement for I/O-bound workloads. If the data fits in memory the order in which it’s accessed doesn’t really matter, so clustering doesn’t give much benefit.
• Insert speeds depend heavily on insertion order. Inserting rows in primary key order is the fastest way to load data into an InnoDB table. It might be a good idea to reorganize the table with OPTIMIZE TABLE after loading a lot of data if you didn’t load the rows in primary key order.
• Updating the clustered index columns is expensive, because it forces InnoDB to move each updated row to a new location.
• Tables built upon clustered indexes are subject to page splits when new rows are inserted, or when a row’s primary key is updated such that the row must be moved.A page split happens when a row’s key value dictates that the row must be placed into a page that is full of data. The storage engine must split the page into two to
accommodate the row. Page splits can cause a table to use more space on disk.
• Clustered tables can be slower for full table scans, especially if rows are less densely packed or stored nonsequentially because of page splits.
• Secondary (nonclustered) indexes can be larger than you might expect, because their leaf nodes contain the primary key columns of the referenced rows.

• Secondary index accesses require two index lookups instead of one.

二、聚集索引(用innodb)与非聚集索引(用MyISAM)的区别

表结构

1 CREATE TABLE layout_test (
2 col1 int NOT NULL,
3 col2 int NOT NULL,
4 PRIMARY KEY(col1),
5 KEY(col2)
6 );

1.MyISAM的结构

In fact, in MyISAM, there is no structural difference between a primary key and any other index. A primary key is simply a unique, nonnullable index named PRIMARY .

2.Innodb的结构

At first glance, that might not look very different from Figure 5-5. But look again, and notice that this illustration shows the whole table, not just the index. Because the clustered index “is” the table in InnoDB, there’s no separate row storage as there is for MyISAM.

Each leaf node in the clustered index contains the primary key value, the transaction ID, and rollback pointer InnoDB uses for transactional and MVCC purposes, and the rest of the columns (in this case, col2 ). If the primary key is on a column prefix, InnoDB includes the full column value with the rest of the columns.

Also in contrast to MyISAM, secondary indexes are very different from clustered indexes in InnoDB. Instead of storing “row pointers,” InnoDB’s secondary index leaf nodes contain the primary key values, which serve as the “pointers” to the rows. This strategy reduces the work needed to maintain secondary indexes when rows move or
when there’s a data page split. Using the row’s primary key values as the pointer makes the index larger, but it means InnoDB can move a row without updating pointers to it.

三、用聚集索引时,primary key是否连续的影响

1.

Notice that not only does it take longer to insert the rows with the UUID primary key,but the resulting indexes are quite a bit bigger. Some of that is due to the larger primary key, but some of it is undoubtedly due to page splits and resultant fragmentation as well.

2.主键是否连续为什么会有差别?

连续主键的插入

不连续主键的插入

插入不连续主键的缺点:

• The destination page might have been flushed to disk and removed from the caches,or might not have ever been placed into the caches, in which case InnoDB will have to find it and read it from the disk before it can insert the new row. This causes a lot of random I/O.
• When insertions are done out of order, InnoDB has to split pages frequently to make room for new rows. This requires moving around a lot of data, and modifying at least three pages instead of one.
• Pages become sparsely and irregularly filled because of splitting, so the final data is fragmented.

时间: 2024-10-05 04:14:54

高性能MySQL笔记-第5章Indexing for High Performance-005聚集索引的相关文章

高性能MySQL笔记-第5章Indexing for High Performance-003索引的作用

一. 1. 1). Indexes reduce the amount of data the server has to examine.2). Indexes help the server avoid sorting and temporary tables.3). Indexes turn random I/O into sequential I/O. Lahdenmaki and Leach’s book also introduces a three-star system for

高性能MySQL笔记-第4章Optimizing Schema and Data Types

1.Good schema design is pretty universal, but of course MySQL has special implementation details to consider. In a nutshell, it’s a good idea to keep things as small and simple as you can. MySQL likes simplicity, and so will the people who have to wo

高性能MySQL笔记 第4章 Schema与数据类型优化

4.1 选择优化的数据类型   通用原则 更小的通常更好 前提是要确保没有低估需要存储的值范围:因为它占用更少的磁盘.内存.CPU缓存,并且处理时需要的CPU周期也更少. 简单就好 简单数据类型的操作需要更少的CPU周期. 尽量避免NULL 值可为NULL的列使得索引.索引统计和值比较都更复杂化.可为NULL的列会使用更多的存储空间. 整数类型 TINYINT SMALLINT MEDIUMINT INT BIGINT.分别使用8,16,24,32,64位存储空间.他们可以存储的值的范围从 -2

高性能mysql 4,5,6章优化总结

针对数据库的优化,我们不能单纯的说从哪一个方面,需要结合数据表的建立,数据类型的选择,索引的设计和sql语句来考虑,我就针对怎么建表,怎么选择数据类型,如何应用B-tree索引,hash索引和覆盖索引的特点来建立高效的索引策略,然后我具体对 count()查询,最大最小值查询,关联查询,子查询,GROUP BY ,limit 分页,Union查询做一些具体的说明,最后我说一下怎样使用切分查询和分解关联查询来重构我们的查询方式, 一.数据表的设计 首先我们要根据范式化和反范式化各自的优缺点,选择一

<<高性能mysql>>笔记1

转载请注明: TheViper http://www.cnblogs.com/TheViper   <<高性能mysql>>这本书写的真的很好,只可惜本屌不才,大部分都看不懂,暂且记下与mysql优化有关,对自己有用的东西. 测试指标 吞吐量 吞吐量指的是单位时间内的事务处理数,单位tps(transaction per second).这一直是经典的数据库应用测试的指标. 响应时间或延迟 这个指标用于测试任务所需的整体时间 并发性 注意,web服务器并发性不等同于数据库的并发性.

《高性能MySQL》第三章MySQL服务器性能剖析学习笔记

MySQL性能优化介绍 什么是性能优化呢?其实我们往往从广义的定义是觉得一个MySQL系统的非功能性的优化都会看作是性能优化,比如我们会将数据库服务器的稳定性.每秒执行的SQL查询数目.系统的可扩展性.cpu利用率等等特性的优化都会看成是MySQL的性能优化. 我个人比较赞同本书的观点是MySQL性能优化应该就是指MySQL的查询响应时间的优化,MySQL性能优化就是将查询响应时间优化到一个客户或者用户体验能够接受的一个程度.

高性能MySQL笔记:第1章 MySQL架构

MySQL 最重要.最与众不同的特性是他的存储引擎架构,这种架构的设计将查询处理(Query Precessing)及其系统任务(Server Task)和数据的存储/提取相分离. 1.1 MySQL 逻辑架构 基础服务层 第一层构架 :包含连接处理.授权认证.安全等基础服务功能: 核心服务层 第二层构架 :包含查询解析.分析.优化(包括重写查询.决定表的读取顺序.选择合适的索引等).缓存以及内置函数,所有跨存储引擎的功能也在这一层实现:存储过程.触发器.视图等: 存储引擎层 第三层构架 :响应

&lt;&lt;高性能mysql&gt;&gt;笔记2

转载请注明: TheViper http://www.cnblogs.com/TheViper 这篇说下mysql查询语句优化 是否请求了不需要的数据 典型案例:查询不需要的记录,多表关联时返回全部列,总是取出全部列,重复查询相同的数据. 是否在扫描额外的记录 最简单的衡量查询开销的指标. 响应数据 扫描的行数 返回的行数 访问类型 在评估查询开销时,需要考虑下从表中找到某一行数据的成本,mysql有好多种方式可以查找并返回一行结果.有些访问方式可能需要扫描很多行才能返回一行结果,也有些方式可能

高性能MySQL中的三星索引

高性能MySQL中的三星索引 我对此提出了深深的疑问: 一星:相关的记录指的是什么??(相关这个词很深奥,“相关部门”是什么部门) 二星:如果建立了B-Tree(B+Tree)索引,数据就有序了.三星:索引的列包含了查询需要所有的列?根本不需要在where查询条件所有的列上建立索引! 我认为一星和二星的rows应该是columns,索引不关具体的数据行,只与查询的列有关.这样也与High Performance MySQL 后面提到的多列索引的观点相符合,特别是二星评估. 个人的观点: 评估一个