How expensive are page splits in terms of transaction log?
By: Paul Randal
Page splits are always thought of as expensive, but just how bad are they? In this post I want to create an example to show how much more transaction log is created when a page in an index has to split. I‘m going to use the sys.dm_tran_database_transactions DMV to show how much more transaction log is generated when a page has to split. You can find the list of columns and a small amount of explanation of each column in Books Online here – I was reminded of its existence by someone on Twitter (sorry, don‘t remember who it was and I couldn‘t find it in search).
In the example, I‘m going to create a table with approximately 1000-byte long rows:
CREATE DATABASE PageSplitTest; GO USE pagesplittest; GO
CREATE TABLE BigRows (c1 INT, c2 CHAR (1000)); CREATE CLUSTERED INDEX BigRows_CL ON BigRows (c1); GO
INSERT INTO BigRows VALUES (1, ‘a‘); INSERT INTO BigRows VALUES (2, ‘a‘); INSERT INTO BigRows VALUES (3, ‘a‘); INSERT INTO BigRows VALUES (4, ‘a‘); INSERT INTO BigRows VALUES (6, ‘a‘); INSERT INTO BigRows VALUES (7, ‘a‘); GO
I‘ve engineered the case where the clustered index data page has space for one more row, and I‘ve left a ‘gap‘ at c1=5. Let‘s add it as part of an explicit transaction and see how much transaction log is generated:
BEGIN TRAN INSERT INTO BigRows VALUES (8, ‘a‘); GO
SELECT [database_transaction_log_bytes_used] FROM sys.dm_tran_database_transactions WHERE [database_id] = DB_ID (‘PageSplitTest‘); GO
database_transaction_log_bytes_used ———————————– 1228
That‘s about what I‘d expect for that row. Now what about when I cause a page split by inserting the ‘missing‘ c1=5 row into the full page?
— commit previous transaction COMMIT TRAN GO
BEGIN TRAN INSERT INTO BigRows VALUES (5, ‘a‘); GO
SELECT [database_transaction_log_bytes_used] FROM sys.dm_tran_database_transactions WHERE [database_id] = DB_ID (‘PageSplitTest‘); GO
database_transaction_log_bytes_used ———————————– 6724
Wow. 5.5x more bytes are written to the transaction log as part of the system transaction that does the split.
The ratio gets worse as the row size gets smaller. For a row with an approximately 100-byte long row (use the same code as above, but change to a CHAR (100), insert 67 rows with a ‘gap‘ somewhere then insert the 68th to cause the split), the two numbers are 328 and 5924 – the split cause 18 times more log to be generated! For a row with an approximately 10-byte long row, I got numbers of 240 and 10436, because I created skewed data (about 256 rows with the key value 8) and then inserted key value 5 which forced a (rare) non-middle page split. That‘s a ratio of more than 43 times more log generated! You can try this yourself if you want: I changed the code to have a CHAR (10), inserted values 1, 2, 3, 4, 6, 7, then inserted 256 key values of 8 and then 2 of 5. The resulting page had only 6 rows – it split after the key value 5 – the Storage Engine doesn‘t always do a 50/50 page split. And that‘s not even causing nasty cascading page-splits, or splits that have to split a page multiple times to fit a new (variable-sized) row in.
Bottom line: page splits don‘t just cause extra IOs and index fragmentation, they generate a *lot* more transaction log. And all that log has to be (potentially) backed up, log shipped, mirrored….