[转]How expensive are page splits in terms of transaction log?

How expensive are page splits in terms of transaction log?

By: Paul Randal

Page splits are always thought of as expensive, but just how bad are they? In this post I want to create an example to show how much more transaction log is created when a page in an index has to split. I‘m going to use the sys.dm_tran_database_transactions DMV to show how much more transaction log is generated when a page has to split. You can find the list of columns and a small amount of explanation of each column in Books Online here – I was reminded of its existence by someone on Twitter (sorry, don‘t remember who it was and I couldn‘t find it in search).

In the example, I‘m going to create a table with approximately 1000-byte long rows:

CREATE DATABASE PageSplitTest; GO USE pagesplittest; GO

CREATE TABLE BigRows (c1 INT, c2 CHAR (1000)); CREATE CLUSTERED INDEX BigRows_CL ON BigRows (c1); GO

INSERT INTO BigRows VALUES (1, ‘a‘); INSERT INTO BigRows VALUES (2, ‘a‘); INSERT INTO BigRows VALUES (3, ‘a‘); INSERT INTO BigRows VALUES (4, ‘a‘); INSERT INTO BigRows VALUES (6, ‘a‘); INSERT INTO BigRows VALUES (7, ‘a‘); GO

I‘ve engineered the case where the clustered index data page has space for one more row, and I‘ve left a ‘gap‘ at c1=5. Let‘s add it as part of an explicit transaction and see how much transaction log is generated:

BEGIN TRAN INSERT INTO BigRows VALUES (8, ‘a‘); GO

SELECT [database_transaction_log_bytes_used] FROM sys.dm_tran_database_transactions WHERE [database_id] = DB_ID (‘PageSplitTest‘); GO

database_transaction_log_bytes_used ———————————– 1228

That‘s about what I‘d expect for that row. Now what about when I cause a page split by inserting the ‘missing‘ c1=5 row into the full page?

— commit previous transaction COMMIT TRAN GO

BEGIN TRAN INSERT INTO BigRows VALUES (5, ‘a‘); GO

SELECT [database_transaction_log_bytes_used] FROM sys.dm_tran_database_transactions WHERE [database_id] = DB_ID (‘PageSplitTest‘); GO

database_transaction_log_bytes_used ———————————– 6724

Wow. 5.5x more bytes are written to the transaction log as part of the system transaction that does the split.

The ratio gets worse as the row size gets smaller. For a row with an approximately 100-byte long row (use the same code as above, but change to a CHAR (100), insert 67 rows with a ‘gap‘ somewhere then insert the 68th to cause the split), the two numbers are 328 and 5924 – the split cause 18 times more log to be generated! For a row with an approximately 10-byte long row, I got numbers of 240 and 10436, because I created skewed data (about 256 rows with the key value 8) and then inserted key value 5 which forced a (rare) non-middle page split. That‘s a ratio of more than 43 times more log generated! You can try this yourself if you want: I changed the code to have a CHAR (10), inserted values 1, 2, 3, 4, 6, 7, then inserted 256 key values of 8 and then 2 of 5. The resulting page had only 6 rows – it split after the key value 5 – the Storage Engine doesn‘t always do a 50/50 page split. And that‘s not even causing nasty cascading page-splits, or splits that have to split a page multiple times to fit a new (variable-sized) row in.

Bottom line: page splits don‘t just cause extra IOs and index fragmentation, they generate a *lot* more transaction log. And all that log has to be (potentially) backed up, log shipped, mirrored….

时间: 2024-10-29 19:11:46

[转]How expensive are page splits in terms of transaction log?的相关文章

14.7 InnoDB Table Compression

pdf:http://download.csdn.net/detail/paololiu/9576929 14.7 InnoDB Table Compression 14.7.1 Overview of Table Compression 14.7.2 Enabling Compression for a Table 14.7.3 Tuning Compression for InnoDB Tables 14.7.4 Monitoring Compression at Runtime 14.7.

MySQL 5.5: InnoDB Change Buffering

To speed up bulk loading of data, InnoDB implements an insert buffer, a special index in the InnoDB system tablespace that buffers modifications to secondary indexes when the leaf pages are not in the buffer pool. Batched merges from the insert buffe

MYSQL术语表

MYSQL术语表 http://dev.mysql.com/doc/refman/5.6/en/glossary.html MySQL Glossary These terms are commonly used in information about the MySQL database server. This glossary originated as a reference for terminology about the InnoDB storage engine, and th

Java性能提示(全)

http://www.onjava.com/pub/a/onjava/2001/05/30/optimization.htmlComparing the performance of LinkedLists and ArrayLists (and Vectors) (Page last updated May 2001, Added 2001-06-18, Author Jack Shirazi, Publisher OnJava). Tips: ArrayList is faster than

高性能MySQL笔记-第5章Indexing for High Performance-005聚集索引

一.聚集索引介绍 1.什么是聚集索引? InnoDB’s clustered indexes actually store a B-Tree index and the rows together in the same structure. 2.为什么一张表只能一个聚集索引? When a table has a clustered index, its rows are actually stored in the index’s leaf pages.The term “clustered

Index Fragmentation Report in SQL Server 2005 and 2008

ProblemWhile indexes can speed up execution of queries several fold as they can make the querying process faster, there is overhead associated with them. They consume additional disk space and require additional time to update themselves whenever dat

UniqueIdentifier 数据类型 和 GUID 生成函数

UniqueIdentifier 数据类型用于存储GUID的值,占用16Byte. SQL Server将UniqueIdentifier存储为16字节的二进制数值,Binary(16),按照特定的格式显示,显示的格式是:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx,其中,x是16进制数字,数值范围是从0-9,A-F.由于每个字节存储2个16进制数据,因此,按照存储字节,UniqueIdentifier的格式简写为:4B-2B-2B-2B-6B.使用GUID的好处是:

Index的填充属性:FillFactor 和 PAD_INDEX

在Create Index时,必须慎重考虑属性FillFactor 和 PAD_INDEX的设置,这两个属性只在create index 或 rebuild index时起作用,表示Index page(Leaf-Level 或 intermediate-Level)的填充程度.如果设置FillFactor=90,那么SQL Server 在create index 或 rebuild index时,不是将Index Page的空间全部用完,而是只使用接近90%的Page空间,预留10%的空闲空

第6周 聚集索引

上个星期我向你介绍了堆表(heap tables).我们说过,在SQL Server表可以是堆表(Heap Table)或者聚集表(Clustered Table)——一个在它上面有聚集索引(Clustered Index)定义的表.今天我们来谈论聚集索引(Clustered Index)的更多细节,还有如何选择正确的聚集键(Clustered Key). 每次你在SQL Server创建一个主键约束(Primary Key constraint),这个约束(默认情况)是通过唯一聚集索引(Uni