Database Design
Rule |
Description |
Value |
Source |
Problem Description |
1 |
High Frequency queries having a high number of table joins. |
>4 |
Sys.dm_exec_sql_text Sys.dm_exec_cached_plans |
High frequency queries with lots of joins can be too normalized for high OLTP scalability. |
2 |
Frequently updated tables having # indexes. |
>3 |
Sys.indexes sys.dm_db_index_operational_stats |
Excessive index maintenance for OLTP. |
3 |
Big IOs Table Scans Range Scans |
>1 |
Perfmon object SQL Server Access Methods Sys.dm_exec_query_stats |
A missing index flushes the cache. |
4 |
Unused Indexes. |
Index not in Sys.dm_db_index_usage_stats. If an index is NEVER used, it will not appear in the DMV sys.dm_db_index_usage_stats |
Avoid Index maintenance for unused indexes. |
CPU
Rule |
Description |
Value |
Source |
Problem Description |
1 |
Signal Waits |
>25% |
Sys.dm_os_wait_stats |
Time in runnable queue is pure CPU wait. |
2 |
Plan reuse |
<90% |
Perfmon object SQL Server Statistics |
OLTP identical transactions should ideally have >95% plan reuse. |
3 |
Parallelism: Cxpacket waits |
>5% |
Sys.dm_os_wait_stats |
Parallelism reduces OLTP throughput. CXPACKET indicates that multiple CPUs are working in parallel, dividing up the query in smaller pieces. Ordinarily a well tuned OLTP application would not parallelize unless an index is missing, there is an incomplete WHERE clause, or the query is not a true OLTP transaction. |
Memory
Rule |
Description |
Value |
Source |
Problem Description |
1 |
Page life expectancy |
<300 sec |
Perfmon object SQL Server Buffer Manager SQL Server Buffer Nodes |
Page life expectancy is the average number of seconds a data page stays in cache. Low values could indicate a cache flush that is caused by a big read. Pure OLTP workloads do NOT issue big reads, thus possible missing index. |
2 |
Page life expectancy |
Drops by 50% |
Perfmon object SQL Server Buffer Manager |
Page life expectancy is the average number of seconds a data page stays in cache. Low values could indicate a cache flush that is caused by a big read. Pure OLTP workloads do NOT issue big reads, thus possible missing index. |
3 |
Memory Grants Pending |
>1 |
Perfmon object SQL Server Memory Manager |
Current number of processes waiting for a workspace memory grant. |
4 |
SQL cache hit ratio |
<90% |
SQL cache hit ratio falls under 90% for sustained periods of time greater than 60 sec. |
It is likely that large scans have to be performed, which in turn flushes out the buffer cache. |
IO
Rule |
Description |
Value |
Source |
Problem Description |
1 |
Average Disk sec/read |
>20 ms |
Perfmon object Physical Disk |
Reads should take 4-8 ms without any IO pressure. |
2 |
Average Disk sec/write |
>20 ms |
Perfmon object Physical Disk |
Writes (sequential) can be as fast as 1 ms for transaction log. |
3 |
Big IOs Table Scans Range Scans |
>1 |
Perfmon object SQL Server Access Methods |
A missing index flushes the cache. |
4 |
If Top 2 values for wait stats are any of the following: ASYNCH_IO_COMPLETION IO_COMPLETION LOGMGR WRITELOG PAGEIOLATCH_x |
Top 2 |
Sys.dm_os_wait_stats |
If top 2 wait_stats values include IO, there is an IO bottleneck. |
5 |
Low bytes per sec. |
Perfmon object Physical Disk |
Blocking
Rule |
Description |
Value |
Source |
Problem Description |
1 |
Block percentage |
>2% |
Sys.dm_db_index_operational_stats |
Frequency of blocks. |
2 |
Block process report |
30 sec |
Sp_configure profiler |
Report of statements. |
3 |
Average Row Lock Waits |
>100ms |
Sys.dm_db_index_operational_stats |
Duration of blocks. |
4 |
If Top 2 values for wait stats are any of the following: LCK_M_BU LCK_M_IS LCK_M_IU LCK_M_IX LCK_M_RIn_NL LCK_M_RIn_S LCK_M_RIn_U LCK_M_RIn_X LCK_M_RS_S LCK_M_RS_U LCK_M_RX_S LCK_M_RX_U LCK_M_RX_X LCK_M_S LCK_M_SCH_M LCK_M_SCH_S LCK_M_SIU LCK_M_SIX LCK_M_U LCK_M_UIX LCK_M_X |
Top 2 |
Sys.dm_os_wait_stats |
If top 2 wait_stats values include locking, there is a blocking bottleneck. |
5 |
High number of deadlocks |
>5 per hour |
Trace flag 1204 to display in the errorlog and or the profiler deadlock graph. |
If the deadlock occurs with the same participant SQL commands or operations multiple times, it is likely that there is a locking problem. |
Network
Rule |
Description |
Value |
Source |
Problem Description |
1 |
High network latency coupled with an application that has many round trips to the database. |
Output queue length >2 |
Perfmon object: Network Interface |
Indicates that the latency between the application server and the database is high. Could be caused by significant network infrastructure between the application and the instance of SQL Server. |
2 |
Network bandwidth is used up. |
Packets Outbound Discarded Packets Outbound Errors Packets Received Discarded Packets Received Errors |
Perfmon object: Network Interface |
Dropped packets are detected. |
In summary, given the high volume of identical small transactions that characterize OLTP, transactions per second and resource usage can be improved as follows:
- Database designs usually keep the number of indexes to a functional minimum as every insert, update, and delete incurs index maintenance.
- CPU can be reduced with plan reuse and join reduction.
- IO performance can be reduced with good indexing, join reduction, and high page life expectancy.
- Memory is optimal when there are no sudden drops in Page Life Expectancy.
- Sorts can be limited with index usage. That is, a certain sort order is supported by an index that is sorted the same way, either ascending or descending.
- Blocking can be reduced with index design and short transactions.