Efficiently traversing InnoDB B+Trees with the page directory--slot

Efficientlytraversing InnoDB B+Trees with the page directory

1、the purpose of the page directory

As described in the posts mentioned above,all records in INDEX pages are linked together in a singly-linked list inascending order. However, list traversal through a page with potentiallyseveral hundred records in it is very expensive: every record’s key
must becompared, and this needs to be done at each level of the B+Tree until therecord sought is found on a leaf page.

Index page页中的所有记录都以单链表递增的形式串联。但是在一页中以链表的形式检索记录代价很大:每一个记录的key必须比较,这个动作需要在所有高度的B+树上进行,知道在叶子节点找到记录。

The page directory greatly optimizes thissearch by providing a fixed-width data structure with direct pointers to 1 ofevery 4-8 records, in order. Thus, it can be used for a traditional binarysearch of the records in each page, starting at the mid-point
of the directoryand progressively pruning the directory by half until only a single entryremains, and then linear-scanning from there. Since the directory iseffectively an array, it can be traversed in either ascending or descendingorder, despite the records
being linked in only ascending order.

Page directory通过提供一个固定大小的数据结构(这个结构指向4-8个记录中的一个)优化查询。因此能够在每个页中使用二叉查找的方法。根据slot折半查找,知道只剩下一个条目,然后从这个条目开水线性扫描。由于directory是一个高效的数组,可以以递增或者递减的顺序进行扫描,即使记录只是以递增的顺序链接。

2、The physical structure of the pagedirectory

The structure is actually very simple. Thenumber of slots (the page directory length) is specified in the first field ofthe INDEX header of the page. The page directory always contains an entry forthe infimum and supremum system records (so the minimum size
is 2 entries), andmay contain 0 or more additional entries, one for each 4-8 system records. Arecord is said to “own” another record if it represents it in the pagedirectory. Each entry in the page directory “owns” the records between theprevious entry in
the directory, up to and including itself. The count ofrecords “owned” by each record is stored in the record header that precedeseach record.

Slots的个数在该页的index header部分的第一域指定。Page directory至少包含infimum和supremum的slot。因此directory最少有2个slot。一个记录如果own其他记录,表示在这个slot里。每个slot管理本身和上一个slot中的记录之间的记录。记录owned的个数存在每个记录的record header部分。

The page-directory-summary mode of innodb_spacecan be used to view the page directory contents, in this case for a completelyempty table (with the same schema as the 1 million row table used in A quickintroduction to innodb_ruby), showing the minimum possible
page directory:

$ innodb_space -f t_page_directory.ibd -p 3page-directory-summary

slot                       offset          type       owned   key

0             99     infimum       1

1         112     supremum      1

If we insert a single record, we can seethat it gets owned by the record with a greater key than itself that has anentry in the page directory. In this case, supremum will own the record (aspreviously discussed, supremum represents a record higher than any
possible keyin the page):

$ innodb_space -f t_page_directory.ibd -p 3page-directory-summary

slot   offset      type          owned   key

0      99      infimum          1

1      112     supremum        2

The infimum record always owns only itself,since no record can have a lower key. The supremum record always owns itself,but has no minimum record ownership. Each additional entry in the pagedirectory should own a minimum of 4 records (itself plus 3 others)
and amaximum of 8 records (itself plus 7 others).

Infimum记录总是只own自己,因为是最小记录。Supremum记录总是own自己。除了infimum和supremum的slot,每个slot都会至少管理4个记录(itself+3others),最多管理8个。

To illustrate, each record with an entry inthe page directory (bolded) owns the records immediately prior to it in thesingly-linked list (K = Key, O = Number of Records Owned):

3、Growth of the page directory

Once any page directory slot would exceed 8records owned, the page directory is rebalanced to distribute the records into4-record groups. If we insert 6 additional records into the table, supremumwill now own a total of 8 records:

一旦一个slot管理的记录超过8个,slot就会将之分成4个记录为一组。如果我们插入6个记录,supremum slot会拥有8个记录。

$ innodb_space -f t_page_directory.ibd -p 3page-directory-summary

slot   offset      type          owned   key

0      99      infimum          1

1      112     supremum        8

The next insert will cause are-organization:

在插入一个记录会引起重组

$ innodb_space -f t_page_directory.ibd -p 3page-directory-summary

slot   offset      type          owned   key

0      99      infimum          1

1      191     conventional      4

2      112     supremum        5

4、A logical view of the page directory

At a logical level, the page directory (andrecords) for a page with 24 records (with keys from 0 to 23) would look likethis:

Infimum总是只own自己,该slot的n_owned=1

Supremum总是owns一个页中最后几个记录,个数可以小于4.

其他slot至少有4个记录最多8个。

逆序排放。从16376个字节开始,即FIL trailer的开始位置。

Take note that:

Records are singly linked from infimum tosupremum through all 24 user records, as previously discussed.

Approximately each 4th record is enteredinto the page directory, represented in the illustration both by bolding thatrecord and by noting its offset in the page directory array represented at thetop of the illustration.

The page directory is stored “backwards” inthe page, so is reversed in this illustration compared to its ordering on disk.

记录是单链表形式链接

http://blog.jcole.us/2013/01/

时间: 2024-11-05 16:10:28

Efficiently traversing InnoDB B+Trees with the page directory--slot的相关文章

MySQL系列:innodb源码分析之page结构解析

在表空间结构分析当中,我们知道innodb的最小物理存储分配单位是page页,在MySQL-3.23版本的源码中,页只有两种页,一种是index page,一种是undo page.其类型值定义在fil0fil.h当中. FIL_PAGE_INDEX                         数据索引页,在表空间的inode page和xdes page都是属于这类. FIL_PAGE_UNDO_LOG                事务回滚日志页. 在这里我们主要分析的是 index p

解析MySQL的体系架构及学习Mysql存储引擎MyISAM和InnoDB

mysql体系结构: 由:连接池组件.管理服务和工具组件.sql接口组件.查询分析器组件.优化器组件. 缓冲组件.插件式存储引擎.物理文件组成.mysql是独有的插件式体系结构,各个存储引擎有自己的特点. mysql各个存储引擎概述: (1) innodb存储引擎:[/color][/b] 面向oltp(online transaction processing).行锁.支持外键.非锁定读.默认采用repeaable级别(可重复读)通过next-keylocking策略避免幻读.插入缓冲.二次写

MySql基础入门-mysql体系结构

mysql体系结构:     由:连接池组件.管理服务和工具组件.sql接口组件.查询分析器组件.优化器组件.                缓冲组件.插件式存储引擎.物理文件组成.     mysql是独有的插件式体系结构,各个存储引擎有自己的特点.              mysql各个存储引擎概述:     innodb存储引擎:[/color][/b] 面向oltp(online transaction processing).行锁.支持外键.非锁定读.默认采用repeaable级别(

剖析Mysql的InnoDB索引

摘要: 本篇介绍下Mysql的InnoDB索引相关知识,从各种树到索引原理到存储的细节. InnoDB是Mysql的默认存储引擎(Mysql5.5.5之前是MyISAM,文档).本着高效学习的目的,本篇以介绍InnoDB为主.少量涉及MyISAM作为对照. 这篇文章是我在学习过程中总结完毕的.内容主要来自书本和博客(參考文献会给出).过程中增加了一些自己的理解.描写叙述不准确的地方烦请指出. 1 各种树形结构 本来不打算从二叉搜索树開始,由于网上已经有太多相关文章,可是考虑到清晰的图示对理解问题

探究InnoDB数据页内部行的存储方式

*:first-child { margin-top: 0 !important; } .markdown-body>*:last-child { margin-bottom: 0 !important; } .markdown-body .anchor { position: absolute; top: 0; bottom: 0; left: 0; display: block; padding-right: 6px; padding-left: 30px; margin-left: -30

InnoDB存储引擎的B+树索引算法

关于B+树数据结构 ①InnoDB存储引擎支持两种常见的索引. 一种是B+树,一种是哈希. B+树中的B代表的意思不是二叉(binary),而是平衡(balance),因为B+树最早是从平衡二叉树演化来的,但是B+树又不是一个平衡二叉树. 同时,B+树索引并不能找到一个给定键值的具体行.B+树索引只能找到的是被查找数据行所在的页.然后数据库通过把页读入内存,再在内存中进行查找,最后得到查找的数据. 再说一下平衡二叉树: 这是一幅平衡二叉树,左子树的值总是小于根的值,右子树的值总是大于根的键值,因

MySQL的InnoDB索引原理详解

摘要 本篇介绍下Mysql的InnoDB索引相关知识,从各种树到索引原理到存储的细节. InnoDB是Mysql的默认存储引擎(Mysql5.5.5之前是MyISAM,文档).本着高效学习的目的,本篇以介绍InnoDB为主,少量涉及MyISAM作为对比. 这篇文章是我在学习过程中总结完成的,内容主要来自书本和博客(参考文献会给出),过程中加入了一些自己的理解,描述不准确的地方烦请指出. 1 各种树形结构 本来不打算从二叉搜索树开始,因为网上已经有太多相关文章,但是考虑到清晰的图示对理解问题有很大

MySQL:InnoDB存储引擎的B+树索引算法

很早之前,就从学校的图书馆借了MySQL技术内幕,InnoDB存储引擎这本书,但一直草草阅读,做的笔记也有些凌乱,趁着现在大四了,课程稍微少了一点,整理一下笔记,按照专题写一些,加深一下印象,不枉读了一遍书.与此同时,也加深一下对MySQL的了解,认识了原理,对优化的原则才有把握,对问题的分析才有源头. 关于B+树数据结构 ①InnoDB存储引擎支持两种常见的索引. 一种是B+树,一种是哈希.B+树中的B代表的意思不是二叉(binary),而是平衡(balance),因为B+树最早是从平衡二叉树

InnoDB的数据页结构

页是InnoDB存储引擎管理数据库的最小磁盘单位.页类型为B-tree node的页,存放的即是表中行的实际数据了. InnoDB数据页由以下七个部分组成,如图所示: File Header(文件头). Page Header(页头). Infimun+Supremum Records. User Records(用户记录,即行记录). Free Space(空闲空间). Page Directory(页目录). File Trailer(文件结尾信息). File Header.Page He