[转]Columnstore Performance Tuning

Fundamentals of Columnstore Index-Based Performance

Columnstore indexes can speed up some queries by a factor of 10X to 100X on the same hardware depending on the query and data. These key things make columnstore-based query processing so fast:

  • The columnstore index itself stores data in highly compressed format, with each column kept in a separate group of pages. This reduces I/O a lot for most data warehouse queries because many data warehouse fact tables contain 30 or more columns, while a typical query might touch only 5 or 6 columns. Only the columns touched by the query must be read from disk. Only the more frequently accessed columns have to take up space in main memory. The clustered B-tree or heap containing the primary copy of the data is normally used only to build the columnstore, and will typically not be accessed for the large majority of query processing. It‘ll be paged out of memory and won‘t take main memory resources during normal periods of query processing.
  • There is a highly efficient, vector-based query execution method called "batch processing" that works with the columnstore index. A "batch" is an object that contains about 1000 rows. Each column within the batch is represented internally as a vector. Batch processing can reduce CPU consumption 7X to 40X compared to the older, row-based query execution methods. Efficient vector-based algorithms allow this by dramatically reducing the CPU overhead of basic filter, expression evaluation, projection, and join operations.
  • Segment elimination can skip large chunks of data to speed up scans. Each partition in a columnstore indexes is broken into one million row chunks called segments. Each segment has metadata that stores the minimum and maximum value of each column for the segment. The storage engine checks filter conditions against the metadata. If it can detect that no rows will qualify then it skips the entire segment without even reading it from disk.
  • The storage engine pushes filters down into the scans of data. This eliminates data early during query execution, improving query response time.

The columnstore index and batch query execution mode are deeply integrated into SQL Server. A particular query can be processed entirely in batch mode, entirely in the standard row mode, or with a combination of batch and row-based processing. The key to getting the best performance is to make sure your queries process the large majority of data in batch mode. Even if the bulk of your query can‘t be executed in batch mode, you can still get significant performance benefits from columnstore indexes through reduced I/O, and through pushing down of predicates to the storage engine.

To tell if the main part of your query is running in batch mode, look at the graphical showplan, hover the mouse pointer over the most expensive scan operator (usually a scan of a large fact table) and check the tooltip. It will say whether the estimated and actual execution mode was Row or Batch. See herefor an example.

DOs and DON‘Ts for using Columnstores Effectively

Obeying the following do‘s and don‘ts will help you get the most out of columnstores for your decision support workload.

DOs

  • Put columnstore indexes on large tables only. Typically, you will put them on your fact tables in your data warehouse, but not the dimension tables. If you have a large dimension table, containing more than a few million rows, then you may want to put a columnstore index on it as well.
  • Include every column of the table in the columnstore index. If you don‘t, then a query that references a column not included in the index will not benefit from the columnstores index much or at all.
  • Structure your queries as star joins with grouping and aggregation as much as possible. Avoid joining pairs of large tables. Join a single large fact table to one or more smaller dimensions using standard inner joins. Use a dimensional modeling approach for your data as much as possible to allow you to structure your queries this way.
  • Use best practices for statistics management and query design. This is independent of columnstore technology. Use good statistics and avoid query design pitfalls to get the best performance. See the white paper on SQL Server statistics for guidance. In particular, see the section "Best Practices for Managing Statistics."

DON‘Ts

(Note: we are already working to improve the implementation to eliminate limitations associated with these "don‘ts" and we anticipate fixing them sometime after the SQL Server 2012 release. We‘re not ready to announce a timetable yet.) Later, we‘ll describe how to work around the limitations.

  • Avoid joins and string filters directly on columns of columnstore-indexed tables. String filters don‘t get pushed down into scans on columnstore indexes, and join processing on strings is less efficient than on integers. Filters on number and date types are pushed down. Consider using integer codes (or surrogate keys) instead of strings in columnstore indexed fact tables. You can move the string values to a dimension table. Joins on the integer columns normally will be processed very efficiently.
  • Avoid use of OUTER JOIN on columnstore-indexed tables. Outer joins don‘t benefit from batch processing. Instead, SQL Server 2012 reverts to row-at-a-time processing.
  • Avoid use of NOT IN on columnstore-indexed tables. NOT IN (<subquery>) (which internally uses an operator called "anti-semi-join") can prevent batch processing and cause the system to revert to row mode. NOT IN (<list of constants>) typically works fine though.
  • Avoid use of UNION ALL to directly combine columnstore-indexed tables with other tables. Batch processing doesn‘t get pushed down over UNION ALL. So, for example, creating a view vFact that does a UNION ALL of two tables, one with a columnstore indexes and one without, and then querying vFact in a star join query, will not use batch processing.

原文URL:SQL Server Columnstore Performance Tuning

时间: 2024-07-28 15:58:00

[转]Columnstore Performance Tuning的相关文章

WebSphere Application Server Performance Tuning Toolkit

WebSphere? Application Server Performance Tuning Toolkit 是一款基于 Eclipse 的智能工具,旨在帮助用户通过使用数据收集.数据分析和统计数据推断技术来调优 WebSphere Application Server 的性能.其目的是帮助用户查找瓶颈,并适当调优应用程序. 背景知识 随着基于 Java EE 的企业级多层架构应用程序的大范围部署,出现得越来越多的性能问题,而且难以诊断.多层架构使得性能问题难以定位,而且要花费更多的时间和精

9 tools to help you with Java Performance Tuning

9 tools to help you with Java Performance Tuning Java Performance Tuning tools to help you improve your Java Code Previously I wrote an article about 5 tools to help you write better java code which helped to improve our code but also our productivit

Windows性能优化关键点-Windows Performance tuning important settings

最近重装了windows8系统,发现性能差得很,原不如官方说的比win7好很多的说法.经过几个关键配置的调整,终于找回电脑原来的风采. 下面总结一下,希望对大家有帮助: 1. 检查windows服务,把不需要的服务关闭 其中最容易被遗忘的时windows media network service,不需要网络多媒体共享的朋友最好关掉它,它扫描媒体可是很迟硬盘的. 2. 看一下电源选项,是否已经配置成性能使用性能最优选项(本人使用的时英文版WINDOWS8,WINDOWS7的配置类似,到更改计划里

Performance Tuning of Spring/Hibernate Applications---reference

http://java.dzone.com/articles/performance-tuning For most typical Spring/Hibernate enterprise applications, the application performance depends almost entirely on the performance of it's persistence layer. This post will go over how to confirm that

Performance Tuning

本文译自Wikipedia的Performance tuning词条,原词条中的不少链接和扩展内容非常值得一读,翻译过程中暴露了个人工程学思想和英语水平的不足,翻译后的内容也失去很多准确性和丰富性,需要在之后的时间继续细读. Performance Tuning特指计算机系统的性能优化工作.它通常是根据某个现实或潜在的性能问题而发起的.绝大部分系统都会因为负载的升高而导致性能下降,一个系统对于负载的可承受能力称为可扩展性(scalability),调整系统使其可以承受更大的负载就是perform

老李分享: Oracle Performance Tuning Overview 翻译

老李分享: Oracle Performance Tuning Overview 翻译 poptest是国内唯一一家培养测试开发工程师的培训机构,以学员能胜任自动化测试,性能测试,测试工具开发等工作为目标.如果对课程感兴趣,请大家咨询qq:908821478,咨询电话010-84505200. 1 性能优化概述 This chapter provides an introduction toperformance tuning and contains the following section

PostgreSQL Hardware Performance Tuning

Bruce Momjian POSTGRESQL is an object-relational database developed on the Internet by a group of developers spread across the globe. It is an open-source alternative to commercial databases like Oracle and Informix. POSTGRESQL was originally develop

Oracle Performance Tuning Overview 翻译(Oracle性能优化概述 自己的中英文比对翻译)

Oracle? Database Performance Tuning Guide 10g Release 2 (10.2) B14211-03 Home Book List Contents Index Master Index Contact Us Previous Next PDF · Mobi · ePub 1 性能优化概述 This chapter provides an introduction toperformance tuning and contains the follow

Performance Tuning guide 翻译 || Performance Tuning Guide 11G中新增特性

Performance Tuning Guide 11G中新增特性 本章描述了Oracle11g Release2(11.2)中增加了哪些新的性能调整 特性,以及指向这些增加信息. 本章节描述的特性以及增强,包含了优化数据库性能的各个方面. 关于Oracle11gR2的所有新特性汇总,可以查看Oracle Database New Features Guide. 11.2.0.2中新增的新特性(关于性能调优) 新增的以及更新过的性能调整特性包括: 注:Resource Manager(资源管理器