insert select (制造百万条记录)
在开始百万级数据的查询之前,自己先动手制造百万级的记录来供我们使用,使用的方法是insert select方法
INSERT 一般用来给表插入一个指定列值的行。但是,INSERT 还存在另一种形式,可以利用它将一条SELECT 语句的结果插入表中。这就是所谓的INSERT SELECT, 顾名思义,它是有一条INSERT语句和一条SELECT语句组成的。
现在,有一个warning_reparied表,有2447条记录,如下:
mysql> select count(*) from warning_repaired; +----------+ | count(*) | +----------+ | 2447 | +----------+ 1 row in set (0.00 sec) mysql>
使用这个warning_repaired表创建出一个百万级数量的表:
首先,创建一个新表warning_repaired1,
mysql> CREATE TABLE `warning_repaired1` ( -> `id` int(11) NOT NULL AUTO_INCREMENT, -> `device_moid` varchar(36) NOT NULL, -> `device_name` varchar(128) DEFAULT NULL, -> `device_type` varchar(36) DEFAULT NULL, -> `device_ip` varchar(128) DEFAULT NULL, -> `warning_type` enum(‘0‘,‘1‘,‘2‘) NOT NULL, -> `domain_moid` varchar(36) NOT NULL, -> `domain_name` varchar(128) DEFAULT NULL, -> `code` smallint(6) NOT NULL, -> `level` varchar(16) NOT NULL, -> `description` varchar(128) DEFAULT NULL, -> `start_time` datetime NOT NULL, -> `resolve_time` datetime NOT NULL, -> PRIMARY KEY (`id`), -> UNIQUE KEY `id` (`id`) -> ) ENGINE=InnoDB AUTO_INCREMENT=4895 DEFAULT CHARSET=utf8; Query OK, 0 rows affected (0.39 sec) mysql> select count(*) from warning_repaired1; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.00 sec) mysql> select count(*) from warning_repaired; +----------+ | count(*) | +----------+ | 2447 | +----------+ 1 row in set (0.00 sec) mysql>
其次,使用insert select语句插入把warning_repaired中的记录插入到warning_repaired1表中:
mysql> insert into warning_repaired1(device_moid, device_name, device_type, device_ip, warning_type, domain_moid, domain_name, code, level, description, start_time, resolve_time) select device_moid, device_name, device_type, device_ip, warning_type, domain_moid, domain_name, code, level, description, start_time, resolve_time from warning_repaired; Query OK, 2447 rows affected (1.07 sec) Records: 2447 Duplicates: 0 Warnings: 0 mysql> select count(*) from warning_repaired; +----------+ | count(*) | +----------+ | 2447 | +----------+ 1 row in set (0.00 sec) mysql> select count(*) from warning_repaired1; +----------+ | count(*) | +----------+ | 2447 | +----------+ 1 row in set (0.00 sec)
插入成功后,把INSERT SELECT语句用的查询表也改为warning_repaired1,如下:
insert into warning_repaired1(device_moid, device_name, device_type, device_ip, warning_type, domain_moid, domain_name, code, level, description, start_time, resolve_time) select device_moid, device_name, device_type, device_ip, warning_type, domain_moid, domain_name, code, level, description, start_time, resolve_time from warning_repaired1;
这样多运行几次(记录指数级的增长)就可以很快的制造出百万条的记录了。
最常见MYSQL 最基本的分页方式limit
mysql> select count(*) from warning_repaired; +----------+ | count(*) | +----------+ | 2447 | +----------+ 1 row in set (0.00 sec) mysql> select count(*) from warning_repaired5; +----------+ | count(*) | +----------+ | 5990256 | +----------+ 1 row in set (10.11 sec) mysql> select code,level,description from warning_repaired5 limit 1000,2; +------+----------+----------------+ | code | level | description | +------+----------+----------------+ | 1006 | critical | 注册GK失败 | | 1006 | critical | 注册GK失败 | +------+----------+----------------+ 2 rows in set (0.00 sec) mysql> select code,level,description from warning_repaired5 limit 10000,2; +------+----------+----------------+ | code | level | description | +------+----------+----------------+ | 1006 | critical | 注册GK失败 | | 1006 | critical | 注册GK失败 | +------+----------+----------------+ 2 rows in set (0.05 sec) mysql> select code,level,description from warning_repaired5 limit 100000,2; +------+----------+------------------------------------------------------+ | code | level | description | +------+----------+------------------------------------------------------+ | 2003 | critical | 服务器内存5分钟内平均使用率超过阈值 | | 2019 | critical | 网卡的吞吐量超阈值 | +------+----------+------------------------------------------------------+ 2 rows in set (0.26 sec) mysql> select code,level,description from warning_repaired5 limit 1000000,2; +------+----------+----------------+ | code | level | description | +------+----------+----------------+ | 1006 | critical | 注册GK失败 | | 1006 | critical | 注册GK失败 | +------+----------+----------------+ 2 rows in set (1.56 sec) mysql> select code,level,description from warning_repaired5 limit 5000000,2; +------+----------+----------------+ | code | level | description | +------+----------+----------------+ | 1006 | critical | 注册GK失败 | | 1006 | critical | 注册GK失败 | +------+----------+----------------+ 2 rows in set (7.15 sec) mysql>
在不超过100万条记录时,可以看出花费的时间还是比较少。所以在中小数量的情况下,这样的SQL足够用了,唯一需要注意的问题就是确保使用了索引。但是随着数据量的增加,页数会越来越多,在数据慢慢增长的过程中,可能出现limit 5000000,2这样的情况,limit 5000000,2的意思是扫描满足条件的l5000002行,扔掉前面的5000000行,返回最后的2行,问题就在这里,如果limit 5000000,2,需要扫描5000002行,在一个高并发的应用里,每次查询需要扫描超过500w行,性能肯定大打折扣。
这种方式有几个不足: 较大的偏移(OFFSET)会增加结果集,小比例的低效分页足够产生磁盘I/O瓶颈,需要扫描的行多。
简单的解决办法: 不显示记录总数,没用户在乎这个数字;不让用户访问页数比较大的记录,重定向他们;避免count(*),不显示总数,让用户通过"下一页"来翻页,缓存总数;单独统计总数,在插入和删除时递增/递减。
mysql> select code,level,description from warning_repaired5 limit 5000000,2; +------+----------+----------------+ | code | level | description | +------+----------+----------------+ | 1006 | critical | 注册GK失败 | | 1006 | critical | 注册GK失败 | +------+----------+----------------+ 2 rows in set (2.98 sec) mysql> select code,level,description from warning_repaired5 order by id desc limit 5000000,2; +------+----------+----------------+ | code | level | description | +------+----------+----------------+ | 1006 | critical | 注册GK失败 | | 1006 | critical | 注册GK失败 | +------+----------+----------------+ 2 rows in set (8.04 sec)
从上面可以看出再加了order by id desc后,花费的时间又增长了。
第二种就是分表,计算HASH值,这儿不做介绍。
第三种:偏移
mysql> select code,level,description from warning_repaired5 order by id desc limit 5000000,20; +------+----------+----------------+ | code | level | description | +------+----------+----------------+ | 1006 | critical | 注册GK失败 | …… | 1006 | critical | 注册GK失败 | +------+----------+----------------+ 20 rows in set (4.77 sec) mysql> select code,level,description from warning_repaired5 where id <=( select id from warning_repaired5 order by id desc limit 5000000,1) order by id desc limit 20; +------+----------+----------------+ | code | level | description | +------+----------+----------------+ | 1006 | critical | 注册GK失败 | | 1006 | critical | 注册GK失败 | …… | 1006 | critical | 注册GK失败 | +------+----------+----------------+ 20 rows in set (4.26 sec)
可以查出时间相对第一种少了一点。
整体来说在面对百万级数据的时候如果使用上面第三种方法来优化,系统性能上是能够得到很好的提升,在遇到复杂的查询时也尽量简化,减少运算量。 同时也尽量多的使用内存缓存,有条件的可以考虑分表、分库、阵列之类的大型解决方案了。
参考文章:
http://www.lvtao.net/database/mysql_page_limit.html
http://my.oschina.net/u/1540325/blog/477126