mysql分表和分区
1.mysql分表
什么是分表?
分表是将一个大表按照一定的规则分解成多张具有独立存储空间的实体表,每个表都对应三个文件,MYD数据文件,MYI索引文件,frm表结构文件。如果是Innodb存储引擎,索引文件和数据文件存放在同一个位置。这些表可以分布在同一块磁盘上,也可以在不同的机器上。
app读写的时候根据事先定义好的规则得到对应的的表明,然后去操作它。
将单个数据库表进行拆分,拆分成多个数据表,然后用户访问的时候,根据一定的算法(如用hash的方式,也可以用取余的方式),让用户访问不同的表,这样数据分散到多个数据表中,减少了单个数据表的访问压力。提升了数据库访问性能。
mysql分表分为垂直切分和水平切分
垂直切分是指数据表列的拆分,把一张列比较多的表拆分为多张表。
通常按一下原则进行垂直切分:
把不常用的字段单独放在一张表;
把text,blob(binary large object,二进制大对象)等大字段拆分出来放在附表中;
经常组合查询的列放在一张表中;
水平拆分是指数据表行的拆分,把一张表的数据拆分成多张表来存放。
水平拆分原则
通常情况下,我们使用hash、取模等方式来进行表的拆分
进行拆分后的表,这时我们就要约束用户查询行为。
分表的几种方式:
1)预先估计会出现大数据量并且访问频繁的表,将其分为若干个表
2)利用merge存储引擎来实现分表
创建一个完整表存储着所有的成员信息(表名为member)
并往里面插入点数据:
mysql> select * from member;
+----+------+-----+
| id | name | sex |
+----+------+-----+
| 1 | tom | 1 |
| 2 | tom | 1 |
| 3 | tom | 1 |
| 4 | tom | 1 |
| 5 | tom | 1 |
| 6 | tom | 1 |
| 7 | tom | 1 |
| 8 | tom | 1 |
| 9 | tom | 1 |
| 10 | tom | 1 |
| 11 | tom | 1 |
| 12 | tom | 1 |
| 13 | tom | 1 |
| 14 | tom | 1 |
| 15 | tom | 1 |
| 16 | tom | 1 |
+----+------+-----+
下面我们进行分表,这里我们把member分两个表tb_member1,tb_member2
mysql> use test;
mysql> create table tb_member1(
-> id bigint primary key,
-> name varchar(20),
-> sex tinyint not null default ‘0‘
-> )engine=myisam default charset=utf8;
用下面命令可以更简洁的创建出与tb_member1一样的表:
mysql>create table tb_member2 like tb_member1;
创建主表tb_member
mysql> create table tb_member(
-> id bigint primary key,
-> name varchar(20),
-> sex tinyint not null default ‘0‘
-> ) engine=merge union=(tb_member1,tb_member2) insert_method=last charset=utf8;
查看一下tb_member表的结构:
mysql> desc tb_member; +-------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| id | bigint(20) | NO | PRI | NULL | |
| name | varchar(20) | YES | | NULL | |
| sex | tinyint(4) | NO | | 0 | |
+-------+-------------+------+-----+---------+-------+
注:查看子表与主表的字段定义要一致
接下来,把数据分到两个分表中去:
mysql> insert into tb_member1(id,name,sex) select id,name,sex from member where id%2=0;
mysql> insert into tb_member2(id,name,sex) select id,name,sex from member where id%2=1;
查看两个子表的数据:
mysql> select * from tb_member1;
+----+------+-----+
| id | name | sex |
+----+------+-----+
| 2 | tom | 1 |
| 4 | tom | 1 |
| 6 | tom | 1 |
| 8 | tom | 1 |
| 10 | tom | 1 |
| 12 | tom | 1 |
| 14 | tom | 1 |
| 16 | tom | 1 |
+----+------+-----+
8 rows in set (0.00 sec)
mysql> select * from tb_member2;
+----+------+-----+
| id | name | sex |
+----+------+-----+
| 1 | tom | 1 |
| 3 | tom | 1 |
| 5 | tom | 1 |
| 7 | tom | 1 |
| 9 | tom | 1 |
| 11 | tom | 1 |
| 13 | tom | 1 |
| 15 | tom | 1 |
+----+------+-----+
8 rows in set (0.00 sec)
查看一下主表的数据:
mysql> select * from tb_member;
+----+------+-----+
| id | name | sex |
+----+------+-----+
| 2 | tom | 1 |
| 4 | tom | 1 |
| 6 | tom | 1 |
| 8 | tom | 1 |
| 10 | tom | 1 |
| 12 | tom | 1 |
| 14 | tom | 1 |
| 16 | tom | 1 |
| 1 | tom | 1 |
| 3 | tom | 1 |
| 5 | tom | 1 |
| 7 | tom | 1 |
| 9 | tom | 1 |
| 11 | tom | 1 |
| 13 | tom | 1 |
| 15 | tom | 1 |
+----+------+-----+
16 rows in set (0.00 sec)
总结:每个子表都有自己独立的表文件,主表只是一个壳,并没有完整的表文件。
[[email protected] ~]# ls -l /usr/local/mysql/data/test/tb_member*
-rw-r----- 1 mysql mysql 8614 Feb 13 21:44 /usr/local/mysql/data/test/tb_member1.frm
-rw-r----- 1 mysql mysql 160 Feb 13 21:47 /usr/local/mysql/data/test/tb_member1.MYD
-rw-r----- 1 mysql mysql 2048 Feb 13 21:47 /usr/local/mysql/data/test/tb_member1.MYI
-rw-r----- 1 mysql mysql 8614 Feb 13 21:44 /usr/local/mysql/data/test/tb_member2.frm
-rw-r----- 1 mysql mysql 160 Feb 13 21:47 /usr/local/mysql/data/test/tb_member2.MYD
-rw-r----- 1 mysql mysql 2048 Feb 13 21:47 /usr/local/mysql/data/test/tb_member2.MYI
-rw-r----- 1 mysql mysql 8614 Feb 13 21:46 /usr/local/mysql/data/test/tb_member.frm
-rw-r----- 1 mysql mysql 42 Feb 13 21:46 /usr/local/mysql/data/test/tb_member.MRG
2.分区
什么是分区?
分区与分表区别:分表将大表分解为若干个独立的实体表,而分区是将数据分段划分在多个位置存放,分区后,表还是一张大表,但数据散列到多个位置了。
app读写的时候操作的还是表名字,db自动去组织分区的数据。
分区主要有两种形式:
水平分区:对表的行进行分区,所有在表中定义的列在每个数据集中都能找到,所以表的特性得以保持。
垂直分区:通过对表的垂直划分来减少目标表的宽度,使某些特定的列被划分到特定的分区,每个分区都包含了其中的列所对应的行。
分区技术支持
在5.6之前,使用以下参数查看当前配置是否支持分区
mysql> show variables like ‘%partition%‘;
显示have_partition_engine选项后为YES
在5.6之后,则采用以下方式查看
mysql> show plugins;
显示结果中,可以看到partition是active的,表示支持分区
下面演示一个按照范围(range)方式的表分区
创建range分区表
mysql> create table user(
-> id int not null auto_increment,
-> name varchar(30) not null default‘‘,
-> sex int(1) not null default‘0‘,
-> primary key(id)
-> )default charset=utf8 auto_increment=1
-> partition by range(id)(
-> partition p0 values less than (3),
-> partition p1 values less than (6),
-> partition p2 values less than (9),
-> partition p3 values less than (12),
-> partition p4 values less than maxvalue);
插入数据:
mysql> insert into user(name,sex) values(‘tom1‘,‘0‘);
mysql> insert into user(name,sex) values(‘tom2‘,‘1‘);
mysql> insert into user(name,sex) values(‘tom3‘,‘2‘);
mysql> insert into user(name,sex) select name,sex from user;(多重复几遍得到双倍数据)
到存放数据库表文件的地方看一下
[[email protected] ~]# ls -l /usr/local/mysql/data/test/user*
-rw-r----- 1 mysql mysql 8614 Feb 13 21:59 /usr/local/mysql/data/test/user.frm
-rw-r----- 1 mysql mysql 98304 Feb 13 22:00 /usr/local/mysql/data/test/user#P#p0.ibd
-rw-r----- 1 mysql mysql 98304 Feb 13 22:00 /usr/local/mysql/data/test/user#P#p1.ibd
-rw-r----- 1 mysql mysql 98304 Feb 13 22:00 /usr/local/mysql/data/test/user#P#p2.ibd
-rw-r----- 1 mysql mysql 98304 Feb 13 22:00 /usr/local/mysql/data/test/user#P#p3.ibd
-rw-r----- 1 mysql mysql 98304 Feb 13 22:00 /usr/local/mysql/data/test/user#P#p4.ibd
从系统数据库中的partition表中查看分区信息
mysql> select * from information_schema.partitions where table_schema=‘test‘ and table_name=‘user‘\G;
合并分区:
Eg:将p1 - p3合并为2个p01 - p02
mysql> alter table user
-> reorganize partition p1,p2,p3 into
-> (partition p01 values less than (8),
-> partition p02 values less than (12)
-> );
再次查看数据库表文件:
[[email protected] ~]# ls -l /usr/local/mysql/data/test/user*
-rw-r----- 1 mysql mysql 8614 Feb 13 22:03 /usr/local/mysql/data/test/user.frm
-rw-r----- 1 mysql mysql 98304 Feb 13 22:03 /usr/local/mysql/data/test/user#P#p01.ibd
-rw-r----- 1 mysql mysql 98304 Feb 13 22:03 /usr/local/mysql/data/test/user#P#p02.ibd
-rw-r----- 1 mysql mysql 98304 Feb 13 22:00 /usr/local/mysql/data/test/user#P#p0.ibd
-rw-r----- 1 mysql mysql 98304 Feb 13 22:00 /usr/local/mysql/data/test/user#P#p4.ibd
未分区表和分区表性能测试
创建一个未分区的表
mysql> create table tab2(c1 int,c2 varchar(30),c3 date)
-> partition by range(year(c3))(partition p0 values less than (1995),
-> partition p1 values less than (1996),
-> partition p2 values less than (1997),
-> partition p3 values less than (1998),
-> partition p4 values less than (1999),
-> partition p5 values less than (2000),
-> partition p6 values less than (2001),
-> partition p7 values less than (2002),
-> partition p8 values less than (2003),
-> partition p9 values less than (2004),
-> partition p10 values less than (2010),
-> partition p11 values less than maxvalue);
通过存储过程插入10万条数据
创建存储过程:
mysql> delimiter $$
mysql> create procedure load_part_tab()
-> begin
-> declare v int default 0;
-> while v < 10000
-> do
-> insert into tab1
-> values (v,‘testing partitions‘,adddate(‘1995-01-01‘,(rand(v)*36520) mod 3652));
-> set v = v + 1;
-> end while;
-> end
-> $$
执行存储过程:
mysql> call load_part_tab();
向tab2表中插入数据
Insert into tab2 select * from tab1;
测试SQL性能
mysql> select count(*) from tab1 where c3 > ‘1995-01-01‘ and c3 < ‘1995-12-31‘;
+----------+
| count(*) |
+----------+
| 990 |
+----------+
1 row in set (0.11 sec)
mysql> select count(*) from tab2 where c3 > ‘1995-01-01‘ and c3 < ‘1995-12-31‘;
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.03 sec)
分区表比未分区表的执行时间少很多。
创建索引后情况测试
mysql> create index idx_of_c3 on tab1(c3);
Query OK, 0 rows affected (0.28 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> create index idx_of_c3 on tab2(c3);
Query OK, 0 rows affected (0.22 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> select count(*) from tab1 where c3 > ‘1996-01-01‘ and c3 < ‘1996-12-31‘;
+----------+
| count(*) |
+----------+
| 1006 |
+----------+
1 row in set (0.11 sec)
重启mysql服务
mysql> select count(*) from tab1 where c3 > ‘1996-01-01‘ and c3 < ‘1996-12-31‘;
+----------+
| count(*) |
+----------+
| 1006 |
+----------+
1 row in set (0.00 sec)
创建索引后分区表和未分区表相差不大(数据量越大差别会明显些)