HASH Partitioning--转载

原文地址:https://dev.mysql.com/doc/refman/5.1/en/partitioning-hash.html

HASH Partitioning

[+/-]

18.2.3.1 LINEAR HASH Partitioning

Partitioning by HASH is used primarily to ensure an even distribution of data among a predetermined number of partitions. With range or list partitioning, you must specify explicitly into which partition a given column value or set of column values is to be stored; with hash partitioning, MySQL takes care of this for you, and you need only specify a column value or expression based on a column value to be hashed and the number of partitions into which the partitioned table is to be divided.

To partition a table using HASH partitioning, it is necessary to append to the CREATE TABLE statement aPARTITION BY HASH (expr) clause, where expr is an expression that returns an integer. This can simply be the name of a column whose type is one of MySQL‘s integer types. In addition, you most likely want to follow this with PARTITIONS num, where num is a positive integer representing the number of partitions into which the table is to be divided.

Note

For simplicity, the tables in the examples that follow do not use any keys. You should be aware that, if a table has any unique keys, every column used in the partitioning expression for this this table must be part of every unique key, including the primary key. See Section 18.5.1, “Partitioning Keys, Primary Keys, and Unique Keys”, for more information.

The following statement creates a table that uses hashing on the store_id column and is divided into 4 partitions:

CREATE TABLE employees (
    id INT NOT NULL,
    fname VARCHAR(30),
    lname VARCHAR(30),
    hired DATE NOT NULL DEFAULT ‘1970-01-01‘,
    separated DATE NOT NULL DEFAULT ‘9999-12-31‘,
    job_code INT,
    store_id INT
)
PARTITION BY HASH(store_id)
PARTITIONS 4;

If you do not include a PARTITIONS clause, the number of partitions defaults to 1.

Using the PARTITIONS keyword without a number following it results in a syntax error.

You can also use an SQL expression that returns an integer for expr. For instance, you might want to partition based on the year in which an employee was hired. This can be done as shown here:

CREATE TABLE employees (
    id INT NOT NULL,
    fname VARCHAR(30),
    lname VARCHAR(30),
    hired DATE NOT NULL DEFAULT ‘1970-01-01‘,
    separated DATE NOT NULL DEFAULT ‘9999-12-31‘,
    job_code INT,
    store_id INT
)
PARTITION BY HASH( YEAR(hired) )
PARTITIONS 4;

expr must return a nonconstant, nonrandom integer value (in other words, it should be varying but deterministic), and must not contain any prohibited constructs as described in Section 18.5, “Restrictions and Limitations on Partitioning”. You should also keep in mind that this expression is evaluated each time a row is inserted or updated (or possibly deleted); this means that very complex expressions may give rise to performance issues, particularly when performing operations (such as batch inserts) that affect a great many rows at one time.

The most efficient hashing function is one which operates upon a single table column and whose value increases or decreases consistently with the column value, as this allows for “pruning” on ranges of partitions. That is, the more closely that the expression varies with the value of the column on which it is based, the more efficiently MySQL can use the expression for hash partitioning.

For example, where date_col is a column of type DATE, then the expression TO_DAYS(date_col) is said to vary directly with the value of date_col, because for every change in the value of date_col, the value of the expression changes in a consistent manner. The variance of the expression YEAR(date_col) with respect todate_col is not quite as direct as that of TO_DAYS(date_col), because not every possible change in date_colproduces an equivalent change in YEAR(date_col). Even so, YEAR(date_col) is a good candidate for a hashing function, because it varies directly with a portion of date_col and there is no possible change indate_col that produces a disproportionate change in YEAR(date_col).

By way of contrast, suppose that you have a column named int_col whose type is INT. Now consider the expression POW(5-int_col,3) + 6. This would be a poor choice for a hashing function because a change in the value of int_col is not guaranteed to produce a proportional change in the value of the expression. Changing the value of int_col by a given amount can produce by widely different changes in the value of the expression. For example, changing int_col from 5 to 6 produces a change of -1 in the value of the expression, but changing the value of int_col from 6 to 7 produces a change of -7 in the expression value.

In other words, the more closely the graph of the column value versus the value of the expression follows a straight line as traced by the equation y=cx where c is some nonzero constant, the better the expression is suited to hashing. This has to do with the fact that the more nonlinear an expression is, the more uneven the distribution of data among the partitions it tends to produce.

In theory, pruning is also possible for expressions involving more than one column value, but determining which of such expressions are suitable can be quite difficult and time-consuming. For this reason, the use of hashing expressions involving multiple columns is not particularly recommended.

When PARTITION BY HASH is used, MySQL determines which partition of num partitions to use based on the modulus of the result of the user function. In other words, for an expression expr, the partition in which the record is stored is partition number N, where N = MOD(exprnum). Suppose that table t1 is defined as follows, so that it has 4 partitions:

CREATE TABLE t1 (col1 INT, col2 CHAR(5), col3 DATE)
    PARTITION BY HASH( YEAR(col3) )
    PARTITIONS 4;

If you insert a record into t1 whose col3 value is ‘2005-09-15‘, then the partition in which it is stored is determined as follows:

MOD(YEAR(‘2005-09-01‘),4)
=  MOD(2005,4)
=  1

MySQL 5.1 also supports a variant of HASH partitioning known as linear hashing which employs a more complex algorithm for determining the placement of new rows inserted into the partitioned table. See Section 18.2.3.1, “LINEAR HASH Partitioning”, for a description of this algorithm.

The user function is evaluated each time a record is inserted or updated. It may also—depending on the circumstances—be evaluated when records are deleted.

Note

If a table to be partitioned has a UNIQUE key, then any columns supplied as arguments to the HASH user function or to the KEY‘s column_list must be part of that key.

时间: 2024-10-14 20:40:10

HASH Partitioning--转载的相关文章

字符串Hash总结(转载)

转载地址 Hash是什么意思呢?某度翻译告诉我们: hash 英[hæ?] 美[hæ?] n. 剁碎的食物; #号; 蔬菜肉丁; vt. 把…弄乱; 切碎; 反复推敲; 搞糟; 我觉得Hash是引申出 把...弄乱 的意思. 今天就来谈谈Hash的一种——字符串hash. 据我的理解,Hash就是一个像函数一样的东西,你放进去一个值,它给你输出来一个值.输出的值就是Hash值.一般Hash值会比原来的值更好储存(更小)或比较. 那字符串Hash就非常好理解了.就是把字符串转换成一个整数的函数.而

Oracle Hash分区的使用总结

近期项目需要用到分区表,但是分区键值有无法确定,因此只能使用hash分区(range.list分区以前常用,比hash分区简单),查询了文档,发现上面说的和实际使用时有点差距,就专门做实验验证下. 官方文档(11g.12c的解释都是一样的): docs.oracle.com/database/121/CNCPT/schemaob.htm Hash Partitioning In hash partitioning, the database maps rows to partitions bas

redis该如何分区-译文(原创)

写在最前,最近一直在研究redis的使用,包括redis应用场景.性能优化.可行性.这是看到redis官网中一个链接,主要是讲解redis数据分区的,既然是官方推荐的,那我就翻译一下,与大家共享. Partitioning: how to split data among multiple Redis instances. 分区:如何把数据存储在多个实例中. Partitioning is the process of splitting your data into multiple Redi

Oracle性能优化——总体介绍

最近参加Oracle的培训,在此做个学习总结. 1.Oracle数据库调优不能仅指望修改几项数据库参数就能有明显效果,问题更多出在应用方面,教育开发者正确地使用数据库是一项任重道远的工作. 2.Oracle数据库性能瓶颈主要有:CPU,网络,磁盘I/O,进程间协调: 3.数据库反应时间是影响用户体验和体现数据库性能的重要指标,一般为2ms: 4.数据库响应时间是用户感知到的上层应用响应时间的一部分: 5.SQL中的JOIN和集合运算 6.物理数据库设计: 1)索引设计(Index Design)

系统瓶颈优化

1.cpu问题       考虑使用更高的cpu代替当前cpu.        对于多cpu,考虑cpu之间的负载分配       考虑在其他体系上设计系统,当前值. 2.内存和高速缓存       内存的优化包括操作系统,数据库,应用程序的内存优化.       过多的分页与交换可能降低系统的性能.       内存分配也是影响系统性能的主要原因.       保证保留列表具有较大的邻接内存块.       调整数据块缓冲区大小(用数据块的个数表示)是一个重要内容.       将最频繁的使用

分区表与分区索引

(一)什么是分区 所谓分区,就是将一张巨型表或巨型索引分成若干个独立的组成部分进行存储和管理,每一个相对小的,可独立管理的部分,称为分区. (二)分区的优势 提高数据可管理性.对表进行分区,数据的加载.索引的创建与重建.数据的备份与恢复等操作都可以在分区表上进行,而不必在表级别上进行,提高了数据的可管理性: 增强数据库的可用性.某个分区出现问题,只影响该分区,其它分区照常运作: 改善查询性能.将对整个表的查询转化为对某个分区表的查询,提高了查询速度: 提高数据库操作的并行性.可对分区表进行并行操

Spark Streaming发行版笔记14:updateStateByKey和mapWithState源码解密

本篇从二个方面进行源码分析: 一.updateStateByKey解密 二.mapWithState解密 通过对Spark研究角度来研究jvm.分布式.图计算.架构设计.软件工程思想,可以学到很多东西. 进行黑名单动态生成和过滤例子中会用到updateStateByKey方法,此方法在DStream类中没有定义,需要在 DStream的object区域通过隐式转换来找,如下面的代码: object DStream {   // `toPairDStreamFunctions` was in Sp

自增长的聚集键值不会扩展(scale)

如何选择聚集键值的最佳实践是什么?一个好的聚集键值应该有下列属性: 范围小的(Narrow) 静态的(Static) 自增长的(Ever Increasing) 我们来具体看下所有这3个属性,还有在SQL Server里为什么自增长值实际上是不会扩展的. 范围小的(Narrow) 聚集键值应该i越小越好.为什么?因为它要占用空间,聚集键值也在每个非聚集索引的叶子曾作为逻辑指针.如果你的聚集键值很广,你的非聚集索引也会很大.如果你定义了非唯一非聚集索引(Non-Unique Non-Cluster

The NoSQL System

Unlike most of the other projects in this book, NoSQL is not a tool, but an ecosystem composed of several complimentary and competing tools. The tools branded with the NoSQL monicker provide an alternative to SQL-based relational database systems for