Partition Table 查询性能

分区表的高效的查询性能是基于Partition Elimination 和 Partition Parallelism实现的。Partition Elimination 是指在执行TSql查询的时候,不是seek表的所有分区,而是根据Partition column排除部分分区,在符合 filtering the partition column 条件的 partition 上进行查询。Partition Parallelism是指分区之间可以并发执行查询。分区表查询使用更小的查询范围,更高的并发度,因而具有更高的查询性能。

一,Partition Elimination

A key benefit of table partitioning is partition elimination, whereby the query processor can eliminate inapplicable partitions of a table or index from a query plan. The more fine-grained the query filter is on the partition column, the more partitions can be eliminated and the more efficient the query can become. Partition elimination can occur with table partitions, index-aligned partitions, and partition-aligned indexed views in both SQL Server 2008 and SQL Server 2005. Partition elimination can also occur with nonaligned indexes.

实现分区消除主要有两种方式

1,Filtering the partition column

SQL Server 2008 adds a new partition-aware seek operation as the mechanism for partition elimination. It is based on a hidden computed column created internally to represent the ID of a table or index partition for a specific row. In effect, a new computed column, a partition ID, is added to the beginning of the clustered index of the partitioned table.

Partition elimination is most useful when a large set of table partitions are eliminated, because partition elimination helps reduce the amount of work needed to satisfy the query. So the key point is to filter on as small a number of partitions as possible based on filtering the partition column.

2,Join Collocation

If you query two tables that are partitioned with compatible partition functions, you may be able to join the two tables on their partition column. In that case, SQL Server may be able to pair up the partitions from each table and join them at the partition level. This is called ‘join collocation‘, implying compatibility between the partitions of the two tables that is leveraged in the query.

二,Partitioned Table Parallelism
Parallel operations are a key benefit of table partitioning. Indexes can be rebuilt in parallel, and the query processor can take advantage of multiple partitions to access a table in parallel.
If a partitioned table is sufficiently large and at least two CPU cores are available to SQL Server, a parallel execution strategy is used across the partitions that the query filter resolves to. Generally speaking, SQL Server attempts to balance the number of threads assigned to various partitions. The max degree of parallelism setting (which you set by using the sp_configure stored procedure or the MAXDOP query hint) determines the available thread count.
However, if the filter of the query specifically calls out ranges that determine a subset of the partitions so that partition elimination can occur, the number of partitions accessed will accordingly be less.

SQL Server 2005 was optimized for queries filtered to one partition. For such queries, more than one available thread could be assigned to scan the partition in parallel. However, for a filtered query touching more than one partition, only one thread could be assigned to any given partition. In SQL Server 2008, if the number of available threads is greater than the partitions accessed by a filtered query, more than one thread can be assigned to access data in each partition. This can improve performance in filtered queries that access more than one partition.

三,Designing Partitions to Improve Query Performance

Partitioning a table or index may improve query performance, based on the types of queries you frequently run and on your hardware configuration.

Partitioning for Join Queries

If you frequently run queries that involve an equi-join between two or more partitioned tables, their partitioning columns should be the same as the columns on which the tables are joined. Additionally, the tables, or their indexes, should be collocated. This means that they either use the same named partition function, or they use different ones that are essentially the same, in that they:

  • Have the same number of parameters that are used for partitioning, and the corresponding parameters are the same data types.
  • Define the same number of partitions.
  • Define the same boundary values for partitions.

In this way, the SQL Server query optimizer can process the join faster, because the partitions themselves can be joined. If a query joins two tables that are not collocated or are not partitioned on the join field, the presence of partitions may actually slow down query processing instead of accelerate it.

Taking Advantage of Multiple Disk Drives

It may be tempting to map your partitions to filegroups, each accessing a different physical disk drive, in order to improve I/O performance. When SQL Server performs data sorting for I/O operations, it sorts the data first by partition. Under this scenario, SQL Server accesses one drive at a time, and this might reduce performance. A better solution in terms of performance is to stripe the data files of your partitions across more than one disk by setting up a RAID. In this way, although SQL Server still sorts data by partition, it can access all the drives of each partition at the same time. This configuration can be designed regardless of whether all partitions are in one filegroup or multiple filegroups. For more information about how SQL Server works with different RAID levels, see RAID Levels and SQL Server.

Controlling Lock Escalation Behavior

Partitioning tables can improve performance by enabling lock escalation to a single partition instead of a whole table. To reduce lock contention by allowing lock escalation to the partition, use the LOCK_ESCALATION option of the ALTER TABLE statement.

参考文档:

https://msdn.microsoft.com/en-us/library/dd578580.aspx

https://msdn.microsoft.com/en-us/library/ms177411(v=sql.105).aspx

时间: 2024-07-31 14:31:50

Partition Table 查询性能的相关文章

PLSQL_性能优化系列09_Oracle Partition Table大数据分区表

2014-08-22 BaoXinjian 一.摘要 1.分区表: 随着表的不断增大,对于新纪录的增加.查找.删除等(DML)的维护也更加困难.对于数据库中的超大型表,可通过把它的数据分成若干个小表,从而简化数据库的管理活动.对于每一个简化后的小表,我们称为一个单个的分区 对于分区的访问,我们不需要使用特殊的SQL查询语句或特定的DML语句,而且可以单独的操作单个分区,而不是整个表.同时可以将不同分区的数据放置到不 同的表空间,比如将不同年份的销售数据,存放在不同的表空间,即年的销售数据存放到T

SQL Server-聚焦计算列或计算列持久化查询性能(二十二)

前言 上一节我们详细讲解了计算列以及计算列持久化的问题,本节我们依然如前面讲解来看看二者查询性能问题,简短的内容,深入的理解,Always to review the basics. 持久化计算列比非持久化计算列性能要好 我们开始创建两个一样的表并都插入100条数据来进行比较,对于计算列我们重新进行创建计算列和非计算列持久化. CREATE TABLE [dbo].[ComputeColumnCompare] (ID INT, FirstName VARCHAR(100), LastName C

《高性能MySQL》读书笔记--查询性能优化

对于高性能数据库操作,只靠设计最优的库表结构.建立最好的索引是不够的,还需要合理的设计查询.如果查询写得很糟糕,即使库表结构再合理.索引再合适,也无法实现高性能.查询优化.索引优化.库表结构优化需要齐头并进,一个不落. 6.1 为什么查询速度会慢 通常来说,查询的生命周期大致可以按照顺序来看:从客户端>>服务器>>在服务器上进行解析>>生成执行计划>>执行>>返回结果给客户端.其中执行可以认为是整个生命周期中最重要的阶段,这其中包括了大量为了检索

新建一个索引能够同时提升三条SQL的查询性能

如题 CREATE TABLE `score` (   `id` int(11) NOT NULL,   `studentid` int(11) NOT NULL,   `subjectid` int(11) NOT NULL,   `score` int(11) NOT NULL,   PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; -- 新建一个索引能够同时提升三条SQL的查询性能 ALTER TABLE `score` AD

聚焦-移除Bookmark Lookup、RID Lookup、Key Lookup提高SQL查询性能(六)

前言 前面几节都是讲的基础内容,本节我们讲讲索引性能优化,当对大数据进行处理时首先想到的就是索引,一旦遇到这样的问题则手忙脚乱,各种查资料,为何平常不扎实基本功呢,我们由浅入深,简短的内容,深入的理解,而非一上来就把问题给框死,立马给出解决方案,抛出问题,再到解决问题,你GET了没有. Bookmark Lookup.RID Lookup.Key Lookup定义 一说到这三者,如果对索引研究不深的童鞋估计是懵逼的,什么玩意,我们姑且将上面三者翻译为:标签查找.行ID查找.键查找.标签查找和键查

SQL Server-聚焦使用视图若干限制/建议、视图查询性能问题,你懵逼了?(二十五)

前言 上一节我们简单讲述了表表达式的4种类型,这一系列我们来讲讲使用视图的限制,简短的内容,深入的理解,Always to review the basics. 避免在视图中使用ORDER BY 上一节我们也讲述了使用表表达式必须满足的3个要求,其中就有一个无法保证顺序,也就是说的ORDER BY的问题,我们还是重点看看在视图中的限制.在常规查询中对于排序我们是这样做的. USE AdventureWorks2012 GO SELECT * FROM Sales.SalesOrderDetail

SQL Server-聚焦过滤索引提高查询性能(十)

前言 这一节我们还是继续讲讲索引知识,前面我们聚集索引.非聚集索引以及覆盖索引等,在这其中还有一个过滤索引,通过索引过滤我们也能提高查询性能,简短的内容,深入的理解. 过滤索引,在查询条件上创建非聚集索引(1) 过滤索引是SQL 2008的新特性,被应用在表中的部分行,所以利用过滤索引能够提高查询,相对于全表扫描它能减少索引维护和索引存储的代价.当我们在索引上应用WHERE条件时就是过滤索引.也就是满足如下格式: CREATE NONCLUSTERED INDEX <index name> O

HAWQ与Hive查询性能对比测试

一.实验目的 本实验通过模拟一个典型的应用场景和实际数据量,测试并对比HAWQ内部表.外部表与Hive的查询性能. 二.硬件环境 1. 四台VMware虚机组成的Hadoop集群.2. 每台机器配置如下:(1)15K RPM SAS 100GB(2)Intel(R) Xeon(R) E5-2620 v2 @ 2.10GHz,双核双CPU(3)8G内存,8GSwap(4)10000Mb/s虚拟网卡 三.软件环境 1. Linux:CentOS release 6.4,核心2.6.32-358.el

利用SET STATISTICS IO和SET STATISTICS TIME 优化SQL Server查询性能

首先需要说明的是这篇文章的内容并不是如何调节SQL Server查询性能的(有关这方面的内容能写一本书),而是如何在SQL Server查询性能的调节中利用SET STATISTICS IO和SET STATISTICS TIME这二条被经常忽略的Transact-SQL命令的. 从表面上看,查询性能的调节是一件十分简单的事.从本质上讲,我们希望查询的运行速度能够尽可能地快,无论是将查询运行的时间从10分钟缩减为1分钟,还是将运行的时间从2秒钟缩短为1秒种,我们最终的目标都是减少运行的时间. 尽