DBCC SHOW_STATISTICS 查看统计信息

使用DBCC Show_Statistics 能够查看 表或Indexed view上的统计信息。Query optimizer使用统计信息进行estimate,生成高质量的qeury plan。统计信息不是实时更新的,如果统计信息过期,Query optimizer可能不能生成高质量的query plan,所以,可以通过DBCC Show_Statistics查看统计信息最后一次更新的日期,并手动更新统计信息,以使query optimizer依据正确的统计信息生成高效的query plan。更新统计信息确保查询使用最新的统计信息编译。不过,更新统计信息会导致查询重新编译。我们建议不要太频繁地更新统计信息,因为需要在改进查询计划和重新编译查询所用时间之间权衡性能。

Updating statistics ensures that queries compile with up-to-date statistics. However, updating statistics causes queries to recompile. We recommend not updating statistics too frequently because there is a performance tradeoff between improving query plans and the time it takes to recompile queries.

Syntax

DBCC SHOW_STATISTICS ( table_or_indexed_view_name , target )
[ WITH [ NO_INFOMSGS ] < option > [ , n ] ]
< option > :: =
    STAT_HEADER | DENSITY_VECTOR | HISTOGRAM | STATS_STREAM

target

Name of the index, statistics, or column for which to display statistics information. If target is a name of an existing index or statistics on a table or indexed view, the statistics information about this target is returned. If target is the name of an existing column, and an automatically created statistics on this column exists, information about that auto-created statistic is returned. If an automatically created statistic does not exist for a column target, error message 2767 is returned.

在SSMS中打开Table的“+”号,能看到Statistics的Category,这就是Sql Server为该表生成的Statistics 信息。

要查看[PK__hierarch__3214EC27B3DD36CA]的统计信息的详情,可以使用DBCC Show_Statistics语句

DBCC SHOW_STATISTICS(‘[dbo].[hierarchy]‘,[PK__hierarch__3214EC27B3DD36CA])

统计信息包括三部分:a header with metadata about the statistics,a histogram with the distribution of values in the first key column of the statistics object, and a density vector to measure cross-column correlation.

第一个表是Header表,有几个非常重要的字段

Updated是统计最后更新的时间(Date and time the statistics were last updated),通过该字段,可以判断统计信息是否过期。

Rows是统计最后一次更新时,表或Indexed View中的数据行数目。Total number of rows in the table or indexed view when the statistics were last updated。

第二个表是Density Vector,用于Key Column的密度分析,计算公式非常简单,Density is 1 / distinct values。

Results display density for each prefix of columns in the statistics object, one row per density. A distinct value is a distinct list of the column values per row and per columns prefix. For example, if the statistics object contains key columns (A, B, C), the results report the density of the distinct lists of values in each of these column prefixes: (A), (A,B), and (A, B, C). Using the prefix (A, B, C), each of these lists is a distinct value list: (3, 5, 6), (4, 4, 6), (4, 5, 6), (4, 5, 7). Using the prefix (A, B) the same column values have these distinct value lists: (3, 5), (4, 4), and (4, 5)。

第三个表是Histogram,使用Target的第一个key column来统计

示例

1,创建示例表数据

if object_id(‘dbo.dt_test‘) is not null
    drop table dbo.dt_test

create table dbo.dt_test
(
id int,
code int,
name varchar(10)
)

create clustered index cix_dt_test_idcode
on dbo.dt_test(id,code)

insert into dbo.dt_test
values(1,1,‘a‘),(1,2,‘b‘),
(2,1,‘c‘),(2,2,‘d‘),
(3,1,‘e‘),(3,2,‘f‘)

2,查看索引的统计信息

dbcc show_statistics(‘dbo.dt_test‘,[cix_dt_test_idcode])

第一个表Updated字段是Null,说明并没有进行统计信息的更新

3,统计信息的更新

首先查看系统是否允许自动更新统计信息

SELECT  name AS dbName,
        is_auto_create_stats_on AS ‘Auto Create Stats‘,
        is_auto_update_stats_on AS ‘Auto Update Stats‘,
        is_read_only AS ‘Read Only‘
FROM sys.databases
WHERE database_ID =db_id();

手动更新统计信息

update statistics dbo.dt_test [cix_dt_test_idcode]

4,查看和分析统计信息

dbcc show_statistics(‘dbo.dt_test‘,[cix_dt_test_idcode])

4.1分析Density Vector,All Density字段,当Columns=id时共有三个distinct ID值,All Density=1/3,

当Columns=id,Code时共有6个Distinct (ID,Code),All Density=1/6

4.2 分析Histogram,Sql Server使用统计对象的First Key column进行统计,Histogram按照Range统计信息,每一个range包含很多列,每一个Range单独计算DISTINCT_RANGE_ROWS  和AVG_RANGE_ROWS  等统计信息。

A histogram measures the frequency of occurrence for each distinct value in a data set. The query optimizer computes a histogram on the column values in the first key column of the statistics object, selecting the column values by statistically sampling the rows or by performing a full scan of all rows in the table or view. If the histogram is created from a sampled set of rows, the stored totals for number of rows and number of distinct values are estimates and do not need to be whole integers.

Range_HI_Key是Range的上边界,

EQ_Rows是first key column等于Range_HI_Key的record的数目,

Range_Rows是本Range的Record的数目,计算公式是<本记录的Range_HI_Key,>上条记录的Range_HI_Key。


Column name


Description


RANGE_HI_KEY


Upper bound column value for a histogram step. The column value is also called a key value.


RANGE_ROWS


Estimated number of rows whose column value falls within a histogram step, excluding the upper bound.


EQ_ROWS


Estimated number of rows whose column value equals the upper bound of the histogram step.


DISTINCT_RANGE_ROWS


Estimated number of rows with a distinct column value within a histogram step, excluding the upper bound.


AVG_RANGE_ROWS


Average number of rows with duplicate column values within a histogram step, excluding the upper bound (RANGE_ROWS / DISTINCT_RANGE_ROWS for DISTINCT_RANGE_ROWS > 0).

To create the histogram, the query optimizer sorts the column values, computes the number of values that match each distinct column value and then aggregates the column values into a maximum of 200 contiguous histogram steps. Each step includes a range of column values followed by an upper bound column value. The range includes all possible column values between boundary values, excluding the boundary values themselves. The lowest of the sorted column values is the upper boundary value for the first histogram step.

参照MSDN文章

DBCC SHOW_STATISTICS displays current query optimization statistics for a table or indexed view. The query optimizer uses statistics to estimate the cardinality or number of rows in the query result, which enables the query optimizer to create a high quality query plan. For example, the query optimizer could use cardinality estimates to choose the index seek operator instead of the index scan operator in the query plan, improving query performance by avoiding a resource-intensive index scan.

The query optimizer stores statistics for a table or indexed view in a statistics object. For a table, the statistics object is created on either an index or a list of table columns. The statistics object includes a header with metadata about the statistics, a histogram with the distribution of values in the first key column of the statistics object, and a density vector to measure cross-column correlation. The Database Engine can compute cardinality estimates with any of the data in the statistics object.

DBCC SHOW_STATISTICS displays the header, histogram, and density vector based on data stored in the statistics object. The syntax lets you specify a table or indexed view along with a target index name, statistics name, or column name. This topic describes how to display the statistics and how to understand the displayed results.

Syntax

DBCC SHOW_STATISTICS ( table_or_indexed_view_name , target )
[ WITH [ NO_INFOMSGS ] < option > [ , n ] ]
< option > :: =
    STAT_HEADER | DENSITY_VECTOR | HISTOGRAM | STATS_STREAM

Arguments

table_or_indexed_view_name              

Name of the table or indexed view for which to display statistics information.

target                               

Name of the index, statistics, or column for which to display statistics information. If target is a name of an existing index or statistics on a table or indexed view, the statistics information about this target is returned. If target is the name of an existing column, and an automatically created statistics on this column exists, information about that auto-created statistic is returned. If an automatically created statistic does not exist for a column target, error message 2767 is returned.

NO_INFOMSGS              

Suppresses all informational messages that have severity levels from 0 through 10.

STAT_HEADER | DENSITY_VECTOR | HISTOGRAM | STATS_STREAM [ ,n ]               

Specifying one or more of these options limits the result sets returned by the statement to the specified option or options. If no options are specified, all statistics information is returned.

STATS_STREAM is Identified for informational purposes only. Not supported. Future compatibility is not guaranteed.

Result Sets  

The following table describes the columns returned in the result set when STAT_HEADER is specified.


Column name


Description


Name


Name of the statistics object.


Updated


Date and time the statistics were last updated. The STATS_DATE function is an alternate way to retrieve this information.


Rows


Total number of rows in the table or indexed view when the statistics were last updated. If the statistics are filtered or correspond to a filtered index, the number of rows might be less than the number of rows in the table. For more information, seeStatistics.


Rows Sampled


Total number of rows sampled for statistics calculations. If Rows Sampled < Rows, the displayed histogram and density results are estimates based on the sampled rows.


Steps


Number of steps in the histogram. Each step spans a range of column values followed by an upper bound column value. The histogram steps are defined on the first key column in the statistics. The maximum number of steps is 200.


Density


Calculated as 1 / distinct values for all values in the first key column of the statistics object, excluding the histogram boundary values. This Density value is not used by the query optimizer and is displayed for backward compatibility with versions before SQL Server 2008.


Average Key Length


Average number of bytes per value for all of the key columns in the statistics object.


String Index


Yes indicates the statistics object contains string summary statistics to improve the cardinality estimates for query predicates that use the LIKE operator; for example, WHERE ProductName LIKE ‘%Bike‘. String summary statistics are stored separately from the histogram and are created on the first key column of the statistics object when it is of type char, varchar, nchar, nvarchar, varchar(max), nvarchar(max), text, or ntext..


Filter Expression


Predicate for the subset of table rows included in the statistics object. NULL = non-filtered statistics. For more information about filtered predicates, see Create Filtered Indexes. For more information about filtered statistics, see Statistics.


Unfiltered Rows


Total number of rows in the table before applying the filter expression. If Filter Expression is NULL, Unfiltered Rows is equal to Rows.

The following table describes the columns returned in the result set when DENSITY_VECTOR is specified.


Column name


Description


All Density


Density is 1 / distinct values. Results display density for each prefix of columns in the statistics object, one row per density. A distinct value is a distinct list of the column values per row and per columns prefix. For example, if the statistics object contains key columns (A, B, C), the results report the density of the distinct lists of values in each of these column prefixes: (A), (A,B), and (A, B, C). Using the prefix (A, B, C), each of these lists is a distinct value list: (3, 5, 6), (4, 4, 6), (4, 5, 6), (4, 5, 7). Using the prefix (A, B) the same column values have these distinct value lists: (3, 5), (4, 4), and (4, 5)


Average Length


Average length, in bytes, to store a list of the column values for the column prefix. For example, if the values in the list (3, 5, 6) each require 4 bytes the length is 12 bytes.


Columns


Names of columns in the prefix for which All density and Average length are displayed.

The following table describes the columns returned in the result set when the HISTOGRAM option is specified.


Column name


Description


RANGE_HI_KEY


Upper bound column value for a histogram step. The column value is also called a key value.


RANGE_ROWS


Estimated number of rows whose column value falls within a histogram step, excluding the upper bound.


EQ_ROWS


Estimated number of rows whose column value equals the upper bound of the histogram step.


DISTINCT_RANGE_ROWS


Estimated number of rows with a distinct column value within a histogram step, excluding the upper bound.


AVG_RANGE_ROWS


Average number of rows with duplicate column values within a histogram step, excluding the upper bound (RANGE_ROWS / DISTINCT_RANGE_ROWS for DISTINCT_RANGE_ROWS > 0).

Remark

Histogram

A histogram measures the frequency of occurrence for each distinct value in a data set. The query optimizer computes a histogram on the column values in the first key column of the statistics object, selecting the column values by statistically sampling the rows or by performing a full scan of all rows in the table or view. If the histogram is created from a sampled set of rows, the stored totals for number of rows and number of distinct values are estimates and do not need to be whole integers.

To create the histogram, the query optimizer sorts the column values, computes the number of values that match each distinct column value and then aggregates the column values into a maximum of 200 contiguous histogram steps. Each step includes a range of column values followed by an upper bound column value. The range includes all possible column values between boundary values, excluding the boundary values themselves. The lowest of the sorted column values is the upper boundary value for the first histogram step.

The following diagram shows a histogram with six steps. The area to the left of the first upper boundary value is the first step.

For each histogram step:

  • Bold line represents the upper boundary value (RANGE_HI_KEY) and the number of times it occurs (EQ_ROWS)
  • Solid area left of RANGE_HI_KEY represents the range of column values and the average number of times each column value occurs (AVG_RANGE_ROWS). The AVG_RANGE_ROWS for the first histogram step is always 0.
  • Dotted lines represent the sampled values used to estimate total number of distinct values in the range (DISTINCT_RANGE_ROWS) and total number of values in the range (RANGE_ROWS). The query optimizer uses RANGE_ROWS and DISTINCT_RANGE_ROWS to compute AVG_RANGE_ROWS and does not store the sampled values.

The query optimizer defines the histogram steps according to their statistical significance. It uses a maximum difference algorithm to minimize the number of steps in the histogram while maximizing the difference between the boundary values. The maximum number of steps is 200. The number of histogram steps can be fewer than the number of distinct values, even for columns with fewer than 200 boundary points. For example, a column with 100 distinct values can have a histogram with fewer than 100 boundary points.

Density Vector

The query optimizer uses densities to enhance cardinality estimates for queries that return multiple columns from the same table or indexed view. The density vector contains one density for each prefix of columns in the statistics object. For example, if a statistics object has the key columns CustomerId, ItemId, Price, density is calculated on each of the following column prefixes.


Column prefix


Density calculated on


(CustomerId)


Rows with matching values for CustomerId


(CustomerId, ItemId)


Rows with matching values for CustomerId and ItemId


(CustomerId, ItemId, Price)


Rows with matching values for CustomerId, ItemId, and Price

Restrictions

DBCC SHOW_STATISTICS does not provide statistics for spatial or xVelocity memory optimized columnstore indexes.

参照MSDN

https://msdn.microsoft.com/en-us/library/ms174384(v=sql.110).aspx

时间: 2024-12-23 05:36:12

DBCC SHOW_STATISTICS 查看统计信息的相关文章

Oracle里收集与查看统计信息的方法

Oracle数据库里的统计信息是这样的一组数据:它存储在数据字典里,且从多个维度描述了Oracle数据库里对象的详细信息.CBO会利用这些统计信息来计算目标SQL各种可能的.不同的执行路径的成本,并从中选择一条成本值最小的执行路径来作为目标SQL的执行计划. Oracle数据库里的统计信息可以分为如下6种类型: 表的统计信息 索引的统计信息 列的统计信息 系统统计信息 数据字典统计信息 内部对象统计信息 表的统计信息用于描述Oracle数据库里表的详细信息,它包含了一些典型的维度,如记录数.表块

SQL Server统计信息:问题和解决方式

在网上看到一篇介绍使用统计信息出现的问题已经解决方式,感觉写的很全面. 在自己看的过程中顺便做了翻译. 因为本人英文水平有限,可能中间有一些错误. 假设有哪里有问题欢迎大家批评指正.建议英文好的直接看原文:SQL Server Statistics: Problems and Solutions 正文: SQL Server统计信息协助查询优化器计算执行查询的最优方式. Holger描写叙述了常见的统计信息出错的事情,而且怎样改善 通常你不须要太操心运行SQL查询的方式.他们被传送到查询优化器,

SQL Server统计信息:问题和解决方案

在网上看到一篇介绍使用统计信息出现的问题已经解决方案,感觉写的非常全面.在自己看的过程中顺便做了翻译.由于本人英文水平有限,可能中间有一些错误.如果有哪里有问题欢迎大家批评指正.建议英文好的直接看原文:SQL Server Statistics: Problems and Solutions 正文: SQL Server统计信息协助查询优化器计算运行查询的最优方式. Holger描述了常见的统计信息出错的事情,并且如何改善 通常你不需要太担心执行SQL查询的方式.他们被传送到查询优化器,首先检查

10-SQLServer中统计信息的使用

一.总结 1.网址https://docs.microsoft.com/en-us/sql/relational-databases/system-catalog-views/sys-stats-transact-sql?view=sql-server-2017(sys.stats)2.什么是统计信息? 统计信息描述了表格或者索引视图中的某些列的值的分布情况,属于数据库对象. 3.根据统计信息,查询优化器就能评估查询过程中需要读取的行数以及结果集情况,同时也能创建高质量的查询计划.其实统计信息就

SQL SERVER 统计信息概述(Statistics)

前言 查询优化器使用统计信息来创建可提高查询性能的查询计划,对于大多数查询,查询优化器已经为高质量查询计划生成必要的统计信息,但是在少数情况下,您需要创建附加的统计信息或者修改查询设计以得到最佳结果.因此理解和合理使用统计信息是数据库优化的方式之一.   统计信息的分类 根据创建源的不同,统计信息分为两种表统计信息和索引统计信息,除非你自定义它们,否则它们之间没有本质的区别. 索引统计信息建立在索引上面,因此对于创建已经存在的数据上,在创建索引的时候会扫描全部数据,这些数据也会创建在索引的统计信

SQL Server调优系列进阶篇(深入剖析统计信息)

前言 经过前几篇的分析,其实大体已经初窥到SQL Server统计信息的重要性了,所以本篇就要祭出这个神器了. 该篇内容会很长,坐好板凳,瓜子零食之类... 不废话,进正题 技术准备 数据库版本为SQL Server2008R2,利用微软的以前的案例库(Northwind)进行分析,部分内容也会应用微软的另一个案例库AdventureWorks 相信了解SQL Server的朋友,对这两个库都不会太陌生. 概念理解 关于SQL Server中的统计信息,在联机丛书中是这样解释的 查询优化的统计信

SQL Server-深入剖析统计信息

转自: http://www.cnblogs.com/zhijianliutang/p/4190669.html   概念理解 关于SQL Server中的统计信息,在联机丛书中是这样解释的 查询优化的统计信息是一些对象,这些对象包含与值在表或索引视图的一列或多列中的分布有关的统计信息.查询优化器使用这些统计信息来估计查询结果中的基数或行数.通过这些基数估计,查询优化器可以创建高质量的查询计划.例如,查询优化器可以使用基数估计选择索引查找运算符而不是耗费更多资源的索引扫描运算符,从而提高查询性能

SqlServer 聚集索引重建后变换列位置统计信息列名不变

原本是使用聚集索引主键发现的,确认不是主键问题,是聚集索引问题. version:Microsoft SQL Server 2008 R2 (SP1) -- 创建测试表 -- drop table [TestTable] create table [dbo].[TestTable]( [id] [int] not null, [name] [varchar](20) not null ) go -- 插入数据 insert into [TestTable]([id],[name]) select

UNIQUEIDENTIFIER列上的统计信息

UNIQUEIDENTIFIER列上的统计信息非常有意思,在它上面有一些很令人讨厌的行为.我们来看下. 问题重现(The repro) 为了向你展示我们刚抱怨的行为,我用下列简单的表定义创建了一个数据库,我在UNIQUEIDENTIFIER列上强制主键约束.这意味着SQL Server在后台会生成唯一聚集索引,聚集索引本身有一个统计信息对象来描述那列的数据分布情况.当然,数据分布是线性的,因为在UNIQUEIDENTIFIER列每个值本身都是唯一的. 1 -- Create a new tabl