Mysql Join大表小表到底谁驱动谁

1、准备

mysql> create table dept(
    id int unsigned auto_increment not null primary key,
    name varchar(20) default ‘‘ not null,
    key(name)
)engine=innodb default charset=utf8mb4;
CREATE TABLE `userinfo` (
  `id` int NOT NULL AUTO_INCREMENT,
  `name` varchar(255) COLLATE utf8mb4_general_ci NOT NULL,
  `passwd` varchar(255) COLLATE utf8mb4_general_ci DEFAULT NULL,
  `phone` varchar(255) COLLATE utf8mb4_general_ci DEFAULT NULL,
  `dept` int NOT NULL DEFAULT ‘0‘,
  PRIMARY KEY (`id`),
  KEY `name` (`name`),
  KEY `dept` (`dept`)
) ENGINE=InnoDB AUTO_INCREMENT=1001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci 

用 excel 伪造数据,并导入

mysql> load data infile ‘/var/lib/mysql-files/new5.txt‘ into table dept fields terminated by ‘ ‘ lines terminated by ‘\n‘;
Query OK, 1000 rows affected (0.03 sec)
Records: 1000  Deleted: 0  Skipped: 0  Warnings: 0

mysql> load data infile ‘/var/lib/mysql-files/new.txt‘ into table userinfo fields terminated by ‘ ‘ lines terminated by ‘\n‘;
Query OK, 120000 rows affected (0.02 sec)
Records: 120000  Deleted: 0  Skipped: 0  Warnings: 0

dept 1000 条, userinfo 120000 条

一、查找 userinfo 名字是 a25 的 id,name,及 dept name

由于有 userinfo = ‘a25‘ 的限制条件,不管谁 join 谁,或者 left join,结果都是一样的

mysql> select count(1) from userinfo left join dept using(dept_id) where userinfo.name = ‘a25‘;
+----------+
| count(1) |
+----------+
|      187 |
+----------+
1 row in set

mysql> select count(1) from dept left join userinfo using(dept_id) where userinfo.name = ‘a25‘;
+----------+
| count(1) |
+----------+
|      187 |
+----------+
1 row in set

mysql> select count(1) from dept join userinfo using(dept_id) where userinfo.name = ‘a25‘;
+----------+
| count(1) |
+----------+
|      187 |
+----------+
1 row in set

mysql> select count(1) from userinfo join dept using(dept_id) where userinfo.name = ‘a25‘;
+----------+
| count(1) |
+----------+
|      187 |
+----------+
1 row in set

那就比较四种方式的效率:

开启慢查询:

set long_query_time = 0.0001;
set global slow_query_log = ON;
mysql> show variables like "slow%";
+---------------------+--------------------------------------+
| Variable_name       | Value                                |
+---------------------+--------------------------------------+
| slow_launch_time    | 2                                    |
| slow_query_log      | ON                                   |
| slow_query_log_file | /var/lib/mysql/muqing-web-2-slow.log |
+---------------------+--------------------------------------+
tail -f /var/lib/mysql/muqing-web-2-slow.log // 实时查看耗时

1、userinfo left join dept

mysql> explain select userinfo.user_id, userinfo.name, dept.name from userinfo left join dept using(dept_id) where userinfo.name = ‘a25‘;
+----+-------------+----------+------------+--------+---------------+---------+---------+-----------------------+------+----------+-------------+
| id | select_type | table    | partitions | type   | possible_keys | key     | key_len | ref                   | rows | filtered | Extra       |
+----+-------------+----------+------------+--------+---------------+---------+---------+-----------------------+------+----------+-------------+
|  1 | SIMPLE      | userinfo | NULL       | ref    | name,name_2   | name    | 1022    | const                 |  187 |      100 | NULL        |
|  1 | SIMPLE      | dept     | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | test.userinfo.dept_id |    1 |      100 | Using where |
+----+-------------+----------+------------+--------+---------------+---------+---------+-----------------------+------+----------+-------------+
2 rows in set

mysql> show status like ‘last_query_cost‘;
+-----------------+------------+
| Variable_name   | Value      |
+-----------------+------------+
| Last_query_cost | 130.899000 |
+-----------------+------------+
1 row in set

mysql> explain select straight_join userinfo.user_id, userinfo.name, dept.name from userinfo left join dept using(dept_id) where userinfo.name = ‘a25‘;
+----+-------------+----------+------------+--------+---------------+---------+---------+-----------------------+------+----------+-------------+
| id | select_type | table    | partitions | type   | possible_keys | key     | key_len | ref                   | rows | filtered | Extra       |
+----+-------------+----------+------------+--------+---------------+---------+---------+-----------------------+------+----------+-------------+
|  1 | SIMPLE      | userinfo | NULL       | ref    | name,name_2   | name    | 1022    | const                 |  187 |      100 | NULL        |
|  1 | SIMPLE      | dept     | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | test.userinfo.dept_id |    1 |      100 | Using where |
+----+-------------+----------+------------+--------+---------------+---------+---------+-----------------------+------+----------+-------------+
2 rows in set

mysql> show status like ‘last_query_cost‘;
+-----------------+------------+
| Variable_name   | Value      |
+-----------------+------------+
| Last_query_cost | 130.899000 |
+-----------------+------------+
1 row in set

可以看到优化器选择的是顺序是先 userinfo 表,warnings 信息:

mysql> explain select straight_join userinfo.user_id, userinfo.name, dept.name from userinfo left join dept using(dept_id) where userinfo.name = ‘a25‘;
+----+-------------+----------+------------+--------+---------------+---------+---------+-----------------------+------+----------+-------------+
| id | select_type | table    | partitions | type   | possible_keys | key     | key_len | ref                   | rows | filtered | Extra       |
+----+-------------+----------+------------+--------+---------------+---------+---------+-----------------------+------+----------+-------------+
|  1 | SIMPLE      | userinfo | NULL       | ref    | name,name_2   | name    | 1022    | const                 |  187 |      100 | NULL        |
|  1 | SIMPLE      | dept     | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | test.userinfo.dept_id |    1 |      100 | Using where |
+----+-------------+----------+------------+--------+---------------+---------+---------+-----------------------+------+----------+-------------+
2 rows in set

mysql> show warnings;
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                                                                                                                            |
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select straight_join `test`.`userinfo`.`user_id` AS `user_id`,`test`.`userinfo`.`name` AS `name`,`test`.`dept`.`name` AS `name` from `test`.`userinfo` left join `test`.`dept` on((`test`.`userinfo`.`dept_id` = `test`.`dept`.`dept_id`)) where (`test`.`userinfo`.`name` = ‘a25‘) |
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set

2、userinfo join dept

mysql> explain select userinfo.user_id, userinfo.name, dept.name from userinfo join dept using(dept_id) where userinfo.name = ‘a25‘;
+----+-------------+----------+------------+--------+---------------+---------+---------+-----------------------+------+----------+-------------+
| id | select_type | table    | partitions | type   | possible_keys | key     | key_len | ref                   | rows | filtered | Extra       |
+----+-------------+----------+------------+--------+---------------+---------+---------+-----------------------+------+----------+-------------+
|  1 | SIMPLE      | userinfo | NULL       | ref    | name,name_2   | name    | 1022    | const                 |  187 |      100 | NULL        |
|  1 | SIMPLE      | dept     | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | test.userinfo.dept_id |    1 |      100 | Using where |
+----+-------------+----------+------------+--------+---------------+---------+---------+-----------------------+------+----------+-------------+
2 rows in set

mysql> show status like ‘last_query_cost‘;
+-----------------+------------+
| Variable_name   | Value      |
+-----------------+------------+
| Last_query_cost | 130.899000 |
+-----------------+------------+
1 row in set

mysql> explain select straight_join userinfo.user_id, userinfo.name, dept.name from userinfo join dept using(dept_id) where userinfo.name = ‘a25‘;
+----+-------------+----------+------------+--------+---------------+---------+---------+-----------------------+------+----------+-------------+
| id | select_type | table    | partitions | type   | possible_keys | key     | key_len | ref                   | rows | filtered | Extra       |
+----+-------------+----------+------------+--------+---------------+---------+---------+-----------------------+------+----------+-------------+
|  1 | SIMPLE      | userinfo | NULL       | ref    | name,name_2   | name    | 1022    | const                 |  187 |      100 | NULL        |
|  1 | SIMPLE      | dept     | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | test.userinfo.dept_id |    1 |      100 | Using where |
+----+-------------+----------+------------+--------+---------------+---------+---------+-----------------------+------+----------+-------------+
2 rows in set

mysql> show warnings;
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                                                                                                                         |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select straight_join `test`.`userinfo`.`user_id` AS `user_id`,`test`.`userinfo`.`name` AS `name`,`test`.`dept`.`name` AS `name` from `test`.`userinfo` join `test`.`dept` where ((`test`.`userinfo`.`name` = ‘a25‘) and (`test`.`userinfo`.`dept_id` = `test`.`dept`.`dept_id`)) |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set
# Time: 2020-04-10T17:00:38.452393Z
# User@Host: root[root] @ localhost []  Id:    11
# Query_time: 0.000741  Lock_time: 0.000137 Rows_sent: 187  Rows_examined: 374
SET timestamp=1586538038;
select sql_no_cache userinfo.user_id, userinfo.name, dept.name from userinfo left join dept using(dept_id) where userinfo.name = ‘a25‘;
# Time: 2020-04-10T17:01:49.946904Z
# User@Host: root[root] @ localhost []  Id:    11
# Query_time: 0.000766  Lock_time: 0.000149 Rows_sent: 187  Rows_examined: 374
SET timestamp=1586538109;
select straight_join userinfo.user_id, userinfo.name, dept.name from userinfo join dept using(dept_id) where userinfo.name = ‘a25‘;

可以看出两个时间差不多

3、dept left join userinfo

mysql> explain select sql_no_cache straight_join userinfo.user_id, userinfo.name, dept.name from dept left join userinfo using(dept_id) where userinfo.name = ‘a25‘;

+----+-------------+----------+------------+-------+---------------+------+---------+-------+------+----------+-------------+
| id | select_type | table    | partitions | type  | possible_keys | key  | key_len | ref   | rows | filtered | Extra       |
+----+-------------+----------+------------+-------+---------------+------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | dept     | NULL       | index | PRIMARY       | name | 82      | NULL  | 1000 |      100 | Using index |
|  1 | SIMPLE      | userinfo | NULL       | ref   | name,name_2   | name | 1022    | const |  187 |       10 | Using where |
+----+-------------+----------+------------+-------+---------------+------+---------+-------+------+----------+-------------+
2 rows in set

mysql> show status like ‘last_query_cost‘;
+-----------------+---------------+
| Variable_name   | Value         |
+-----------------+---------------+
| Last_query_cost | 106671.417848 |
+-----------------+---------------+
1 row in set
# Time: 2020-04-10T17:32:02.890446Z
# User@Host: root[root] @  [10.0.0.105]  Id:    13
# Query_time: 0.134147  Lock_time: 0.000128 Rows_sent: 187  Rows_examined: 188000
SET timestamp=1586539922;
select sql_no_cache straight_join userinfo.user_id, userinfo.name, dept.name from dept left join userinfo using(dept_id) where userinfo.name = ‘a25‘;

明显比上面的要慢

4、dept join userinfo

mysql> explain select sql_no_cache straight_join userinfo.user_id, userinfo.name, dept.name from dept join userinfo using(dept_id) where userinfo.name = ‘a25‘;
+----+-------------+----------+------------+-------+---------------+------+---------+-------+------+----------+-------------+
| id | select_type | table    | partitions | type  | possible_keys | key  | key_len | ref   | rows | filtered | Extra       |
+----+-------------+----------+------------+-------+---------------+------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | dept     | NULL       | index | PRIMARY       | name | 82      | NULL  | 1000 |      100 | Using index |
|  1 | SIMPLE      | userinfo | NULL       | ref   | name,name_2   | name | 1022    | const |  187 |       10 | Using where |
+----+-------------+----------+------------+-------+---------------+------+---------+-------+------+----------+-------------+
2 rows in set

mysql> show status like ‘last_query_cost‘;
+-----------------+---------------+
| Variable_name   | Value         |
+-----------------+---------------+
| Last_query_cost | 106671.417848 |
+-----------------+---------------+
1 row in set
# Time: 2020-04-10T17:33:35.458204Z
# User@Host: root[root] @  [10.0.0.105]  Id:    13
# Query_time: 0.133475  Lock_time: 0.000164 Rows_sent: 187  Rows_examined: 188000
SET timestamp=1586540015;
select sql_no_cache straight_join userinfo.user_id, userinfo.name, dept.name from dept join userinfo using(dept_id) where userinfo.name = ‘a25‘;

速度和 3 一样,也远大于前2

所以是大表驱动小表。而且 show status like ‘last_query_cost‘ 还挺准的

二、查找 dept 名字是 a25 的 id,name,及 dept name

四种查询等效

mysql> select count(1) from userinfo left join dept using(dept_id) where dept.name = ‘a25‘;
+----------+
| count(1) |
+----------+
|      360 |
+----------+
1 row in set

mysql> select count(1) from userinfo join dept using(dept_id) where dept.name = ‘a25‘;
+----------+
| count(1) |
+----------+
|      360 |
+----------+
1 row in set

mysql> select count(1) from dept join userinfo using(dept_id) where dept.name = ‘a25‘;
+----------+
| count(1) |
+----------+
|      360 |
+----------+
1 row in set

mysql> select count(1) from dept left join userinfo using(dept_id) where dept.name = ‘a25‘;
+----------+
| count(1) |
+----------+
|      360 |
+----------+
1 row in set

1、userinfo left join dept

mysql> explain select sql_no_cache straight_join userinfo.user_id, userinfo.name, dept.name from userinfo left join dept using(dept_id) where dept.name = ‘a25‘;
+----+-------------+----------+------------+--------+---------------+---------+---------+-----------------------+--------+----------+-------------+
| id | select_type | table    | partitions | type   | possible_keys | key     | key_len | ref                   | rows   | filtered | Extra       |
+----+-------------+----------+------------+--------+---------------+---------+---------+-----------------------+--------+----------+-------------+
|  1 | SIMPLE      | userinfo | NULL       | ALL    | NULL          | NULL    | NULL    | NULL                  | 120234 |      100 | NULL        |
|  1 | SIMPLE      | dept     | NULL       | eq_ref | PRIMARY,name  | PRIMARY | 4       | test.userinfo.dept_id |      1 |        5 | Using where |
+----+-------------+----------+------------+--------+---------------+---------+---------+-----------------------+--------+----------+-------------+
2 rows in set

mysql> show status like ‘last_query_cost‘;
+-----------------+--------------+
| Variable_name   | Value        |
+-----------------+--------------+
| Last_query_cost | 54209.549000 |
+-----------------+--------------+
1 row in set
# Time: 2020-04-10T17:40:53.462276Z
# User@Host: root[root] @  [10.0.0.105]  Id:    13
# Query_time: 0.110679  Lock_time: 0.000145 Rows_sent: 360  Rows_examined: 240000
SET timestamp=1586540453;
select sql_no_cache straight_join userinfo.user_id, userinfo.name, dept.name from userinfo left join dept using(dept_id) where dept.name = ‘a25‘;

2、userinfo join dept

mysql> explain select sql_no_cache straight_join userinfo.user_id, userinfo.name, dept.name from userinfo join dept using(dept_id) where dept.name = ‘a25‘;
+----+-------------+----------+------------+--------+---------------+---------+---------+-----------------------+--------+----------+-------------+
| id | select_type | table    | partitions | type   | possible_keys | key     | key_len | ref                   | rows   | filtered | Extra       |
+----+-------------+----------+------------+--------+---------------+---------+---------+-----------------------+--------+----------+-------------+
|  1 | SIMPLE      | userinfo | NULL       | ALL    | NULL          | NULL    | NULL    | NULL                  | 120234 |      100 | NULL        |
|  1 | SIMPLE      | dept     | NULL       | eq_ref | PRIMARY,name  | PRIMARY | 4       | test.userinfo.dept_id |      1 |        5 | Using where |
+----+-------------+----------+------------+--------+---------------+---------+---------+-----------------------+--------+----------+-------------+
2 rows in set

mysql> show status like ‘last_query_cost‘;
+-----------------+--------------+
| Variable_name   | Value        |
+-----------------+--------------+
| Last_query_cost | 54209.549000 |
+-----------------+--------------+
# Time: 2020-04-10T17:44:21.701356Z
# User@Host: root[root] @  [10.0.0.105]  Id:    13
# Query_time: 0.105262  Lock_time: 0.000137 Rows_sent: 360  Rows_examined: 240000
SET timestamp=1586540661;
select sql_no_cache straight_join userinfo.user_id, userinfo.name, dept.name from userinfo join dept using(dept_id) where dept.name = ‘a25‘;

速度比上面的好一点点

3、dept left join userinfo

mysql> explain select sql_no_cache straight_join userinfo.user_id, userinfo.name, dept.name from dept left join userinfo using(dept_id) where dept.name = ‘a25‘;
+----+-------------+----------+------------+------+---------------+------+---------+-------+--------+----------+----------------------------------------------------+
| id | select_type | table    | partitions | type | possible_keys | key  | key_len | ref   | rows   | filtered | Extra                                              |
+----+-------------+----------+------------+------+---------------+------+---------+-------+--------+----------+----------------------------------------------------+
|  1 | SIMPLE      | dept     | NULL       | ref  | name          | name | 82      | const |      1 |      100 | Using index                                        |
|  1 | SIMPLE      | userinfo | NULL       | ALL  | NULL          | NULL | NULL    | NULL  | 120234 |      100 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+----------+------------+------+---------------+------+---------+-------+--------+----------+----------------------------------------------------+
2 rows in set

mysql> show status like ‘last_query_cost‘;
+-----------------+--------------+
| Variable_name   | Value        |
+-----------------+--------------+
| Last_query_cost | 12128.032803 |
+-----------------+--------------+
# Time: 2020-04-10T17:47:06.147081Z
# User@Host: root[root] @  [10.0.0.105]  Id:    13
# Query_time: 0.031848  Lock_time: 0.000137 Rows_sent: 360  Rows_examined: 120001
SET timestamp=1586540826;
select sql_no_cache straight_join userinfo.user_id, userinfo.name, dept.name from dept left join userinfo using(dept_id) where dept.name = ‘a25‘;

速度竟然比上面的块,不是大表驱动小表吗,继续看

4、dept join userinfo

mysql> explain select sql_no_cache straight_join userinfo.user_id, userinfo.name, dept.name from dept join userinfo using(dept_id) where dept.name = ‘a25‘;
+----+-------------+----------+------------+------+---------------+------+---------+-------+--------+----------+----------------------------------------------------+
| id | select_type | table    | partitions | type | possible_keys | key  | key_len | ref   | rows   | filtered | Extra                                              |
+----+-------------+----------+------------+------+---------------+------+---------+-------+--------+----------+----------------------------------------------------+
|  1 | SIMPLE      | dept     | NULL       | ref  | PRIMARY,name  | name | 82      | const |      1 |      100 | Using index                                        |
|  1 | SIMPLE      | userinfo | NULL       | ALL  | NULL          | NULL | NULL    | NULL  | 120234 |       10 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+----------+------------+------+---------------+------+---------+-------+--------+----------+----------------------------------------------------+
2 rows in set

mysql> show status like ‘last_query_cost‘;
+-----------------+--------------+
| Variable_name   | Value        |
+-----------------+--------------+
| Last_query_cost | 12128.032803 |
+-----------------+--------------+
# Time: 2020-04-10T17:48:44.738339Z
# User@Host: root[root] @  [10.0.0.105]  Id:    13
# Query_time: 0.032005  Lock_time: 0.000143 Rows_sent: 360  Rows_examined: 120001
SET timestamp=1586540924;
select sql_no_cache straight_join userinfo.user_id, userinfo.name, dept.name from dept join userinfo using(dept_id) where dept.name = ‘a25‘;

速度也比 1,2快

left join 和 join 相比 按理说会慢一点,因为:join 的伪代码为

outer_iter = iterator over tb1 where col1 in(5, 6)

outer_row = outer_iter.next

while outer_row

  inner_iter = iterator over tb2 where col3 = outer_row.col3

  inner_row = inner_iter.next

  while inner_row

    out_put [ outer_row.col1, inner_row.col2 ]
    inner_row = inner_iter.next
  end

  outer_row = outer_iter.next

end

而 left join 会多一次判断,多一次返回 null 结果集。

猜想是 where 条件 获取的过滤多,就用谁来做外表,所以讲不定 一定是大表驱动小表

原文地址:https://www.cnblogs.com/ValleyUp/p/12664289.html

时间: 2024-08-29 15:03:39

Mysql Join大表小表到底谁驱动谁的相关文章

Mysql优化原则_小表驱动大表IN和EXISTS的合理利用

//假设一个for循环for($i = 0; $i < 10000; $i++) { for ($j = 0; $i < 50; $j++) { }} for($i = 0; $i < 50; $i++) { for ($j = 0; $i < 10000; $j++) { }} 看以上两个for循环,总共循环的次数是一样的.但是对于mysql数据库而言,并不是这样了,我们尽量选择第②个for循环,也就是小表驱动大表.数据库最伤神的就是跟程序链接释放,第一个建立了10000次链接,

MySQL JOIN原理

先看一下实验的两张表: 表comments,总行数28856 表comments_for,总行数57,comments_id是有索引的,ID列为主键. 以上两张表是我们测试的基础,然后看一下索引,comments_for这个表comments_id是有索引的,ID为主键. 最近被公司某一开发问道JOIN了MySQL JOIN的问题,细数之下发下我对MySQL JOIN的理解并不是很深刻,所以也查看了很多文档,最后在InsideMySQL公众号看到了两篇关于JOIN的分析,感觉写的太好了,拿出来分

hive join 优化 --小表join大表

1.小.大表 join 在小表和大表进行join时,将小表放在前边,效率会高,hive会将小表进行缓存. 2.mapjoin 使用mapjoin将小表放入内存,在map端和大表逐一匹配,从而省去reduce. 例子: select /*+MAPJOIN(b)*/ a.a1,a.a2,b.b2 from tablea a JOIN tableb b ON a.a1=b.b1 在0.7版本后,也可以用配置来自动优化 set hive.auto.convert.join=true;

【Spark调优】小表join大表数据倾斜解决方案

[使用场景] 对RDD使用join类操作,或者是在Spark SQL中使用join语句时,而且join操作中的一个RDD或表的数据量比较小(例如几百MB或者1~2GB),比较适用此方案.. [解决方案] 小表join大表转为小表broadcast+map大表实现.具体为: 普通的join是会shuffle的,而一旦shuffle,就相当于会将相同key的数据拉取到一个shuffle read task中再进行join,此时就是reduce join,此时如果发生数据倾斜,影响处理性能,而此时恰好

查询优化--小表驱动大表(In,Exists区别)

Mysql 系列文章主页 =============== 本文将以真实例子来讲解小表驱动大表(In,Exists区别) 1 准备数据 1.1 创建表.函数.存储过程 参照  这篇(调用函数和存储过程批量插入数据)  文章中的第 1-7 步,注意,不要执行第8步 1.2 插入数据 现在来执行第8步. 1.2.1 向 Department 表中插入 100 条记录 CALL insert_dept(1000, 100) 1.2.2 向 Employee 表中插入 100000 条记录 CALL in

MySQL JOIN 多表连接

除了常用的两个表连接之外,SQL(MySQL) JOIN 语法还支持多表连接.多表连接基本语法如下: 1 ... FROM table1 INNER|LEFT|RIGHT JOIN table2 ON condition INNER|LEFT|RIGHTJOIN table3 ON condition ... JOIN 多表连接实现了从多个表中获取相关数据,下面是三个原始数据表: article 文章表: aid title content uid tid 1 文章1 文章1正文内容… 1 1

【Spark调优】大表join大表,少数key导致数据倾斜解决方案

[使用场景] 两个RDD进行join的时候,如果数据量都比较大,那么此时可以sample看下两个RDD中的key分布情况.如果出现数据倾斜,是因为其中某一个RDD中的少数几个key的数据量过大,而另一个RDD中的所有key都分布比较均匀,此时可以考虑采用本解决方案. [解决方案] 对有数据倾斜那个RDD,使用sample算子采样出一份样本,统计下每个key的数量,看看导致数据倾斜数据量最大的是哪几个key. 然后将这几个key对应的数据从原来的RDD中拆分出来,形成一个单独的RDD,并给每个ke

mysql大数据量表索引与非索引对比

1:不要在大数据量表中轻易改名字(做任何操作都是非常花费时间) 2个多亿数据量的表 改名操作  执行时间花费8分多钟 (如果是加索引等其他操作 那时间花费不可预估) 2:给大数据量的mysql表 添加索引 所花费的时间 如下 在日后生产环境 如果需要给表添加索引等操作 心里要有预估时间的花费范围 3: explain 解释 语句 type:ALL 进行完整的表扫描 .row:213284372  mysql预估需要扫描213284372 条记录来完成这个查询.可想而知 表数据量越大全表扫描越慢.

hive大表和小表MapJoin关联查询优化

大表和小表关联查询可以采用mapjoin优化查询速度.那什么是mapjoin呢?理解MapJoin之前先介绍另一种Join方式,CommonJoin.我们知道Hive编写SQL语句,Hive会将SQL解析成MapReduce任务.对于一个简单的关联查询,CommonJoin任务设计Map阶段和Reduce阶段.Mapper 从连接表中读取数据并将连接的 key 和连接的 value 键值对输出到中间文件中.Hadoop 在所谓的 shuffle 阶段对这些键值对进行排序和合并.Reducer 将