Left Join

开发有个语句执行了超过2个小时没有结果,询问我到底为什么执行这么久。

语句格式如下select * from tgt1 a left join tgt2 b on a.id=b.id and a.id>=6 order by a.id;
这个是典型的理解错误,本意是要对a表进行过滤后进行[]left join]的,我们来看看到底什么是真正的[left join]



[[email protected] ~]$ psql bigdatagp

psql (8.2.15)

Type "help" for help.

bigdatagp=# drop table tgt1;

DROP TABLE

bigdatagp=# drop table tgt2;

DROP TABLE

bigdatagp=# explain  select t1.telnumber,t2.ua,t2.url,t1.apply_name,t2.apply_name from gpbase.tb_csv_gn_ip_session t1 ,gpbase.tb_csv_gn_http_session_hw t2 where  t1.bigdatagp=# \q                                                                                                                                                       bigdatagp=# create table tgt1(id int, name varchar(20));                                                                                                             NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table.

HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.

CREATE TABLE

bigdatagp=# create table tgt2(id int, name varchar(20)); 

NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table.

HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.

CREATE TABLE

bigdatagp=# insert into tgt1 select generate_series(1,3),('a','b');

ERROR:  column "name" is of type character varying but expression is of type record

HINT:  You will need to rewrite or cast the expression.

bigdatagp=# insert into tgt1 select generate_series(1,5),generate_series(1,5)||'a';

INSERT 0 5

bigdatagp=# insert into tgt2 select generate_series(1,2),generate_series(1,2)||'a';    

INSERT 0 2

bigdatagp=# select * from tgt1;

 id | name 

----+------

  2 | 2a

  4 | 4a

  1 | 1a

  3 | 3a

  5 | 5a

(5 rows)

bigdatagp=# select * from tgt1 order by id;

 id | name 

----+------

  1 | 1a

  2 | 2a

  3 | 3a

  4 | 4a

  5 | 5a

(5 rows)

bigdatagp=# select * from tgt2 order by id; 

 id | name 

----+------

  1 | 1a

  2 | 2a

(2 rows)

bigdatagp=# select * from tgt1 a left join tgt2 b on a.id=b.id;

 id | name | id | name 

----+------+----+------

  3 | 3a   |    | 

  5 | 5a   |    | 

  1 | 1a   |  1 | 1a

  2 | 2a   |  2 | 2a

  4 | 4a   |    | 

(5 rows)

bigdatagp=# select * from tgt1 a left join tgt2 b on a.id=b.id order by a.id;

 id | name | id | name 

----+------+----+------

  1 | 1a   |  1 | 1a

  2 | 2a   |  2 | 2a

  3 | 3a   |    | 

  4 | 4a   |    | 

  5 | 5a   |    | 

(5 rows)

bigdatagp=# select * from tgt1 a left join tgt2 b on a.id=b.id where id>=3 order by a.id;

ERROR:  column reference "id" is ambiguous

LINE 1: ...* from tgt1 a left join tgt2 b on a.id=b.id where id>=3 orde...

                                                             ^

bigdatagp=# select * from tgt1 a left join tgt2 b on a.id=b.id where a.id>=3 order by a.id;

 id | name | id | name 

----+------+----+------

  3 | 3a   |    | 

  4 | 4a   |    | 

  5 | 5a   |    | 

(3 rows)

bigdatagp=# select * from tgt1 a left join tgt2 b on a.id=b.id and a.id>=3 order by a.id;        

 id | name | id | name 

----+------+----+------

  1 | 1a   |    | 

  2 | 2a   |    | 

  3 | 3a   |    | 

  4 | 4a   |    | 

  5 | 5a   |    | 

(5 rows)

bigdatagp=# select * from tgt1 a left join tgt2 b on a.id=b.id where a.id>=6 order by a.id; 

 id | name | id | name 

----+------+----+------

(0 rows)

bigdatagp=# select * from tgt1 a left join tgt2 b on a.id=b.id and a.id>=6 order by a.id;     

 id | name | id | name 

----+------+----+------

  1 | 1a   |    | 

  2 | 2a   |    | 

  3 | 3a   |    | 

  4 | 4a   |    | 

  5 | 5a   |    | 

(5 rows)

bigdatagp=# explain analyze select * from tgt1 a left join tgt2 b on a.id=b.id where a.id>=3 order by a.id;

                                                                    QUERY PLAN                                                                     

---------------------------------------------------------------------------------------------------------------------------------------------------

 Gather Motion 64:1  (slice1; segments: 64)  (cost=7.18..7.19 rows=1 width=14)

   Merge Key: "?column5?"

   Rows out:  3 rows at destination with 21 ms to end, start offset by 559 ms.

   ->  Sort  (cost=7.18..7.19 rows=1 width=14)

         Sort Key: a.id

         Rows out:  Avg 1.0 rows x 3 workers.  Max 1 rows (seg52) with 5.452 ms to first row, 5.454 ms to end, start offset by 564 ms.

         Executor memory:  63K bytes avg, 74K bytes max (seg2).

         Work_mem used:  63K bytes avg, 74K bytes max (seg2). Workfile: (0 spilling, 0 reused)

         ->  Hash Left Join  (cost=2.04..7.15 rows=1 width=14)

               Hash Cond: a.id = b.id

               Rows out:  Avg 1.0 rows x 3 workers.  Max 1 rows (seg52) with 4.190 ms to first row, 4.598 ms to end, start offset by 565 ms.

               ->  Seq Scan on tgt1 a  (cost=0.00..5.06 rows=1 width=7)

                     Filter: id >= 3

                     Rows out:  Avg 1.0 rows x 3 workers.  Max 1 rows (seg52) with 0.156 ms to first row, 0.158 ms to end, start offset by 565 ms.

               ->  Hash  (cost=2.02..2.02 rows=1 width=7)

                     Rows in:  (No row requested) 0 rows (seg0) with 0 ms to end.

                     ->  Seq Scan on tgt2 b  (cost=0.00..2.02 rows=1 width=7)

                           Rows out:  (No row requested) 0 rows (seg0) with 0 ms to end.

 Slice statistics:

   (slice0)    Executor memory: 332K bytes.

   (slice1)    Executor memory: 446K bytes avg x 64 workers, 4329K bytes max (seg52).  Work_mem: 74K bytes max.

 Statement statistics:

   Memory used: 128000K bytes

 Total runtime: 580.630 ms

(24 rows)

bigdatagp=# explain analyze  select * from tgt1 a left join tgt2 b on a.id=b.id and a.id>=3 order by a.id; 

                                                                       QUERY PLAN                                                                        

---------------------------------------------------------------------------------------------------------------------------------------------------------

 Gather Motion 64:1  (slice1; segments: 64)  (cost=7.23..7.24 rows=1 width=14)

   Merge Key: "?column5?"

   Rows out:  5 rows at destination with 24 ms to end, start offset by 701 ms.

   ->  Sort  (cost=7.23..7.24 rows=1 width=14)

         Sort Key: a.id

         Rows out:  Avg 1.0 rows x 5 workers.  Max 1 rows (seg42) with 6.292 ms to first row, 6.294 ms to end, start offset by 715 ms.

         Executor memory:  70K bytes avg, 74K bytes max (seg0).

         Work_mem used:  70K bytes avg, 74K bytes max (seg0). Workfile: (0 spilling, 0 reused)

         ->  Hash Left Join  (cost=2.04..7.17 rows=1 width=14)

               Hash Cond: a.id = b.id

               Join Filter: a.id >= 3

               Rows out:  Avg 1.0 rows x 5 workers.  Max 1 rows (seg42) with 4.422 ms to first row, 5.055 ms to end, start offset by 717 ms.

               Executor memory:  1K bytes avg, 1K bytes max (seg42).

               Work_mem used:  1K bytes avg, 1K bytes max (seg42). Workfile: (0 spilling, 0 reused)

               (seg42)  Hash chain length 1.0 avg, 1 max, using 1 of 262151 buckets.

               ->  Seq Scan on tgt1 a  (cost=0.00..5.05 rows=1 width=7)

                     Rows out:  Avg 1.0 rows x 5 workers.  Max 1 rows (seg42) with 0.179 ms to first row, 0.180 ms to end, start offset by 717 ms.

               ->  Hash  (cost=2.02..2.02 rows=1 width=7)

                     Rows in:  Avg 1.0 rows x 2 workers.  Max 1 rows (seg42) with 0.194 ms to end, start offset by 721 ms.

                     ->  Seq Scan on tgt2 b  (cost=0.00..2.02 rows=1 width=7)

                           Rows out:  Avg 1.0 rows x 2 workers.  Max 1 rows (seg42) with 0.143 ms to first row, 0.145 ms to end, start offset by 721 ms.

 Slice statistics:

   (slice0)    Executor memory: 332K bytes.

   (slice1)    Executor memory: 581K bytes avg x 64 workers, 4353K bytes max (seg42).  Work_mem: 74K bytes max.

 Statement statistics:

   Memory used: 128000K bytes

 Total runtime: 725.316 ms

(27 rows)

bigdatagp=# explain analyze select * from tgt1 a left join tgt2 b on a.id=b.id where a.id>=6 order by a.id;  

                                                  QUERY PLAN                                                  

--------------------------------------------------------------------------------------------------------------

 Gather Motion 64:1  (slice1; segments: 64)  (cost=7.17..7.18 rows=1 width=14)

   Merge Key: "?column5?"

   Rows out:  (No row requested) 0 rows at destination with 6.536 ms to end, start offset by 1.097 ms.

   ->  Sort  (cost=7.17..7.18 rows=1 width=14)

         Sort Key: a.id

         Rows out:  (No row requested) 0 rows (seg0) with 0 ms to end.

         Executor memory:  33K bytes avg, 33K bytes max (seg0).

         Work_mem used:  33K bytes avg, 33K bytes max (seg0). Workfile: (0 spilling, 0 reused)

         ->  Hash Left Join  (cost=2.04..7.15 rows=1 width=14)

               Hash Cond: a.id = b.id

               Rows out:  (No row requested) 0 rows (seg0) with 0 ms to end.

               ->  Seq Scan on tgt1 a  (cost=0.00..5.06 rows=1 width=7)

                     Filter: id >= 6

                     Rows out:  (No row requested) 0 rows (seg0) with 0 ms to end.

               ->  Hash  (cost=2.02..2.02 rows=1 width=7)

                     Rows in:  (No row requested) 0 rows (seg0) with 0 ms to end.

                     ->  Seq Scan on tgt2 b  (cost=0.00..2.02 rows=1 width=7)

                           Rows out:  (No row requested) 0 rows (seg0) with 0 ms to end.

 Slice statistics:

   (slice0)    Executor memory: 332K bytes.

   (slice1)    Executor memory: 225K bytes avg x 64 workers, 225K bytes max (seg0).  Work_mem: 33K bytes max.

 Statement statistics:

   Memory used: 128000K bytes

 Total runtime: 8.615 ms

(24 rows)

bigdatagp=# explain analyze select * from tgt1 a left join tgt2 b on a.id=b.id and a.id>=6 order by a.id;        

                                                                       QUERY PLAN                                                                       

--------------------------------------------------------------------------------------------------------------------------------------------------------

 Gather Motion 64:1  (slice1; segments: 64)  (cost=7.23..7.24 rows=1 width=14)

   Merge Key: "?column5?"

   Rows out:  5 rows at destination with 115 ms to end, start offset by 1.195 ms.

   ->  Sort  (cost=7.23..7.24 rows=1 width=14)

         Sort Key: a.id

         Rows out:  Avg 1.0 rows x 5 workers.  Max 1 rows (seg42) with 6.979 ms to first row, 6.980 ms to end, start offset by 12 ms.

         Executor memory:  72K bytes avg, 74K bytes max (seg0).

         Work_mem used:  72K bytes avg, 74K bytes max (seg0). Workfile: (0 spilling, 0 reused)

         ->  Hash Left Join  (cost=2.04..7.17 rows=1 width=14)

               Hash Cond: a.id = b.id

               Join Filter: a.id >= 6

               Rows out:  Avg 1.0 rows x 5 workers.  Max 1 rows (seg42) with 5.570 ms to first row, 6.157 ms to end, start offset by 12 ms.

               Executor memory:  1K bytes avg, 1K bytes max (seg42).

               Work_mem used:  1K bytes avg, 1K bytes max (seg42). Workfile: (0 spilling, 0 reused)

               (seg42)  Hash chain length 1.0 avg, 1 max, using 1 of 262151 buckets.

               ->  Seq Scan on tgt1 a  (cost=0.00..5.05 rows=1 width=7)

                     Rows out:  Avg 1.0 rows x 5 workers.  Max 1 rows (seg42) with 0.050 ms to first row, 0.051 ms to end, start offset by 12 ms.

               ->  Hash  (cost=2.02..2.02 rows=1 width=7)

                     Rows in:  Avg 1.0 rows x 2 workers.  Max 1 rows (seg42) with 0.153 ms to end, start offset by 18 ms.

                     ->  Seq Scan on tgt2 b  (cost=0.00..2.02 rows=1 width=7)

                           Rows out:  Avg 1.0 rows x 2 workers.  Max 1 rows (seg42) with 0.133 ms to first row, 0.135 ms to end, start offset by 18 ms.

 Slice statistics:

   (slice0)    Executor memory: 332K bytes.

   (slice1)    Executor memory: 583K bytes avg x 64 workers, 4353K bytes max (seg42).  Work_mem: 74K bytes max.

 Statement statistics:

   Memory used: 128000K bytes

 Total runtime: 116.997 ms

(27 rows)

bigdatagp=#  explain analyze select * from tgt1 a left join tgt2 b on a.id=b.id where id=6 order by a.id;

ERROR:  column reference "id" is ambiguous

LINE 1: ...* from tgt1 a left join tgt2 b on a.id=b.id where id=6 order...

                                                             ^

bigdatagp=#  explain analyze select * from tgt1 a left join tgt2 b on a.id=b.id where a.id=6 order by a.id;

                                             QUERY PLAN                                              

-----------------------------------------------------------------------------------------------------

 Gather Motion 1:1  (slice1; segments: 1)  (cost=7.17..7.18 rows=4 width=14)

   Merge Key: "?column5?"

   Rows out:  (No row requested) 0 rows at destination with 3.212 ms to end, start offset by 339 ms.

   ->  Sort  (cost=7.17..7.18 rows=1 width=14)

         Sort Key: a.id

         Rows out:  (No row requested) 0 rows with 0 ms to end.

         Executor memory:  58K bytes.

         Work_mem used:  58K bytes. Workfile: (0 spilling, 0 reused)

         ->  Hash Left Join  (cost=2.04..7.14 rows=1 width=14)

               Hash Cond: a.id = b.id

               Rows out:  (No row requested) 0 rows with 0 ms to end.

               ->  Seq Scan on tgt1 a  (cost=0.00..5.06 rows=1 width=7)

                     Filter: id = 6

                     Rows out:  (No row requested) 0 rows with 0 ms to end.

               ->  Hash  (cost=2.02..2.02 rows=1 width=7)

                     Rows in:  (No row requested) 0 rows with 0 ms to end.

                     ->  Seq Scan on tgt2 b  (cost=0.00..2.02 rows=1 width=7)

                           Filter: id = 6

                           Rows out:  (No row requested) 0 rows with 0 ms to end.

 Slice statistics:

   (slice0)    Executor memory: 252K bytes.

   (slice1)    Executor memory: 251K bytes (seg3).  Work_mem: 58K bytes max.

 Statement statistics:

   Memory used: 128000K bytes

 Total runtime: 342.067 ms

(25 rows)

bigdatagp=#  explain analyze select * from tgt1 a left join tgt2 b on a.id=b.id and a.id=6 order by a.id;      

                                                                       QUERY PLAN                                                                       

--------------------------------------------------------------------------------------------------------------------------------------------------------

 Gather Motion 64:1  (slice1; segments: 64)  (cost=7.23..7.24 rows=1 width=14)

   Merge Key: "?column5?"

   Rows out:  5 rows at destination with 435 ms to end, start offset by 1.130 ms.

   ->  Sort  (cost=7.23..7.24 rows=1 width=14)

         Sort Key: a.id

         Rows out:  Avg 1.0 rows x 5 workers.  Max 1 rows (seg42) with 5.156 ms to first row, 5.158 ms to end, start offset by 7.597 ms.

         Executor memory:  58K bytes avg, 58K bytes max (seg0).

         Work_mem used:  58K bytes avg, 58K bytes max (seg0). Workfile: (0 spilling, 0 reused)

         ->  Hash Left Join  (cost=2.04..7.17 rows=1 width=14)

               Hash Cond: a.id = b.id

               Join Filter: a.id = 6

               Rows out:  Avg 1.0 rows x 5 workers.  Max 1 rows (seg42) with 4.155 ms to first row, 4.813 ms to end, start offset by 7.930 ms.

               Executor memory:  1K bytes avg, 1K bytes max (seg42).

               Work_mem used:  1K bytes avg, 1K bytes max (seg42). Workfile: (0 spilling, 0 reused)

               (seg42)  Hash chain length 1.0 avg, 1 max, using 1 of 262151 buckets.

               ->  Seq Scan on tgt1 a  (cost=0.00..5.05 rows=1 width=7)

                     Rows out:  Avg 1.0 rows x 5 workers.  Max 1 rows (seg42) with 0.126 ms to first row, 0.127 ms to end, start offset by 7.941 ms.

               ->  Hash  (cost=2.02..2.02 rows=1 width=7)

                     Rows in:  Avg 1.0 rows x 2 workers.  Max 1 rows (seg42) with 0.103 ms to end, start offset by 12 ms.

                     ->  Seq Scan on tgt2 b  (cost=0.00..2.02 rows=1 width=7)

                           Rows out:  Avg 1.0 rows x 2 workers.  Max 1 rows (seg42) with 0.074 ms to first row, 0.076 ms to end, start offset by 12 ms.

 Slice statistics:

   (slice0)    Executor memory: 332K bytes.

   (slice1)    Executor memory: 569K bytes avg x 64 workers, 4337K bytes max (seg42).  Work_mem: 58K bytes max.

 Statement statistics:

   Memory used: 128000K bytes

 Total runtime: 436.384 ms

(27 rows)

因此如果要对a表过滤需要把条件写在where里面,要对b表过滤需要把调教写在b表的子查询里面,至于[ON]只是用来控制显示的。

-EOF-

时间: 2024-08-09 22:03:02

Left Join的相关文章

Spark SQL 之 Join 实现

原文地址:Spark SQL 之 Join 实现 Spark SQL 之 Join 实现 涂小刚 2017-07-19 217标签: spark , 数据库 Join作为SQL中一个重要语法特性,几乎所有稍微复杂一点的数据分析场景都离不开Join,如今Spark SQL(Dataset/DataFrame)已经成为Spark应用程序开发的主流,作为开发者,我们有必要了解Join在Spark中是如何组织运行的. SparkSQL总体流程介绍 在阐述Join实现之前,我们首先简单介绍SparkSQL

Join 和 App

在关系型数据库系统中,为了满足第三范式(3NF),需要将满足"传递依赖"的表分离成单独的表,通过Join 子句将相关表进行连接,Join子句共有三种类型:外连接,内连接,交叉连接:外连接分为:left join.right join.full join:内链接是:inner join,交叉连接是:cross join. 一,Join子句的组成 Join子句由连接表,连接类型和On子句组成,伪代码如下: from Left_Table [inner|left|right|full] jo

mysql中left join中的on条件 和 where条件区别

需要知道sql中关键字的执行顺序. FROM-> ON->JOIN-> WHERE->GROUP BY-> HAVING->SELECT-> DISTINCT->ORDER BY->LIMIT on在join前边.join在where前边.知道这两点,那就好说了. 注意join中的on是对关联表起作用,不是对主表. 如果想过滤主表中的数据,要用where. 具体案例可以参照:http://xianglp.iteye.com/blog/868957

swift -- 定义空字符串 hasPrefix hasSuffix trim split join range

// 定义空的字符串 var str1 = "" var str2 = String() str1.isEmpty      // 判断字符串是否为空 // 输出字符串中所有的字符 var str3 = "As god name" for c in str3{ println(c) } Int.max   // Int类型的最大值 Int.min   // Int类型的最小值 var arr1 = ["c", "oc", &q

sleep、yield和join

(1)sleep和yield都是Thread类的静态方法,都会使当前处于运行状态的线程放弃CPU,但两者的区别在于: sleep给其它线程运行的机会,但不考虑其它线程的优先级:但yield只会让位给相同或更高优先级的线程: 当线程执行了sleep方法后,将转到阻塞状态,而执行了yield方法之后,则转到就绪状态: sleep方法有可能抛出异常,而yield则没有: 在一般情况下,我们更建议使用sleep方法. (2)join方法用于等待其它线程结束,当前运行的线程可以调用另一线程的join方法,

数组-join()

例子:var array=[123,"gangqing",24]; array.join();   //表示将数组的元素组成一个字符串 ; 该字符串为"123,gagnqing,24" .作用跟array.toString()一样 array.join(".");   //表示用"."符号代替组成的字符串中的","符号 ; 该字符串为"123.gangqing.24"

hive join 优化 --小表join大表

1.小.大表 join 在小表和大表进行join时,将小表放在前边,效率会高,hive会将小表进行缓存. 2.mapjoin 使用mapjoin将小表放入内存,在map端和大表逐一匹配,从而省去reduce. 例子: select /*+MAPJOIN(b)*/ a.a1,a.a2,b.b2 from tablea a JOIN tableb b ON a.a1=b.b1 在0.7版本后,也可以用配置来自动优化 set hive.auto.convert.join=true;

C# LINQ 详解 From Where Select Group Into OrderBy Let Join

目录 1. 概述 2. from子句 3. where子句 4. select子句 5. group子句 6. into子句 7. 排序子句 8. let子句 9. join子句 10. 小结 1. 概述 LINQ的全称是Language Integrated Query,中文译成"语言集成查询".LINQ作为一种查询技术,首先要解决数据源的封装,大致使用了三大组件来实现这个封装,分别是LINQ to Object.LINQ to ADO.NET.LINQ to XML.它们和.NET

sqlzoo练习答案--The JOIN operation

game id mdate stadium team1 team2 1001 8 June 2012 National Stadium, Warsaw POL GRE 1002 8 June 2012 Stadion Miejski (Wroclaw) RUS CZE 1003 12 June 2012 Stadion Miejski (Wroclaw) GRE CZE 1004 12 June 2012 National Stadium, Warsaw POL RUS ... goal mat

SQL left join、right join和inner join的区别以及where的搭配使用

left join(左联接) 返回包括左表中的所有记录和右表中联结字段相等的记录  right join(右联接) 返回包括右表中的所有记录和左表中联结字段相等的记录 inner join(等值连接) 只返回两个表中联结字段相等的行 举例如下:  -------------------------------------------- 表A记录如下: aID aNum 1 a20050111 2 a20050112 3 a20050113 4 a20050114 5 a20050115 表B记录