INNER JOIN vs. CROSS APPLY

refer from : http://explainextended.com/2009/07/16/inner-join-vs-cross-apply/

INNER JOIN is the most used construct in SQL: it joins two tables together, selecting only those row combinations for which a JOIN condition is true.

This query:

SELECT  *
FROM    table1
JOIN    table2
ON      table2.b = table1.a

reads:

For each row from table1, select all rows from table2 where the value of field b is equal to that of field a

Note that this condition can be rewritten as this:

SELECT  *
FROM    table1, table2
WHERE   table2.b = table1.a

  

, in which case it reads as following:

Make a set of all possible combinations of rows from table1 and table2 and of this set select only those rows where the value of field b is equal to that of field a

These conditions are worded differently, but they yield the same result and database systems are aware of that. Usually both these queries are optimized to use the same execution plan.

The former syntax is called ANSI syntax, and it is generally considered more readable and is recommended to use.

However, it didn‘t get into Oracle until recently, that‘s why there are many hardcore Oracle developers that are just used to the latter syntax.

Actually, it‘s a matter of taste.

To use JOINs (with whatever syntax), both sets you are joining must be self-sufficient, i. e. the sets should not depend on each other. You can query both sets without ever knowing the contents on another set.

But for some tasks the sets are not self-sufficient. For instance, let‘s consider the following query:

We table table1 and table2table1 has a column called rowcount.

For each row from table1 we need to select first rowcount rows from table2, ordered bytable2.id

We cannot formulate a join condition here. The join condition, should it exists, would involve the row number, which is not present in table2, and there is no way to calculate a row number only from the values of columns of any given row in table2.

That‘s where the CROSS APPLY can be used.

CROSS APPLY is a Microsoft‘s extension to SQL, which was originally intended to be used with table-valued functions (TVF‘s).

The query above would look like this:

SELECT  *
FROM    table1
CROSS APPLY
(
SELECT  TOP (table1.rowcount) *
FROM    table2
ORDER BY
id
) t2

  

For each from table1, select first table1.rowcount rows from table2 ordered by id

The sets here are not self-sufficient: the query uses values from table1 to define the second set, not to JOINwith it.

The exact contents of t2 are not known until the corresponding row from table1 is selected.

I previously said that there is no way to join these two sets, which is true as long as we consider the sets as is. However, we can change the second set a little so that we get an additional computed field we can later join on.

The first option to do that is just count all preceding rows in a subquery:

SELECT  *
FROM    table1 t1
JOIN    (
SELECT  t2o.*,
(
SELECT  COUNT(*)
FROM    table2 t2i
WHERE   t2i.id <= t2o.id
) AS rn
FROM    table2 t2o
) t2
ON      t2.rn <= t1.rowcount

  

The second option is to use a window function, also available in SQL Server since version 2005:

SELECT  *
FROM    table1 t1
JOIN    (
SELECT  t2o.*, ROW_NUMBER() OVER (ORDER BY id) AS rn
FROM    table2 t2o
) t2
ON      t2.rn <= t1.rowcount

  

This functions returns the ordinal number a row would have be the ORDER BY condition used in the function applied to the whole query.

This is essentially the same result as the subquery used in the previous query.

Now, let‘s create the sample tables and check all these solutions for efficiency:

SET NOCOUNT ON
GO
DROP TABLE [20090716_cross].table1
DROP TABLE [20090716_cross].table2
DROP SCHEMA [20090716_cross]
GO
CREATE SCHEMA [20090716_cross]
CREATE TABLE table1
(
id INT NOT NULL PRIMARY KEY,
row_count INT NOT NULL
)
CREATE TABLE table2
(
id INT NOT NULL PRIMARY KEY,
value VARCHAR(20) NOT NULL
)
GO
BEGIN TRANSACTION
DECLARE @cnt INT
SET @cnt = 1
WHILE @cnt <= 100000
BEGIN
INSERT
INTO    [20090716_cross].table2 (id, value)
VALUES  (@cnt, ‘Value ‘ + CAST(@cnt AS VARCHAR))
SET @cnt = @cnt + 1
END
INSERT
INTO    [20090716_cross].table1 (id, row_count)
SELECT  TOP 5
id, id % 2 + 1
FROM    [20090716_cross].table2
ORDER BY
id
COMMIT
GO

  

table2 contains 100,000 rows with sequential ids.

table1 contains the following:

id row_count
1 2
2 1
3 2
4 1
5 2

Now let‘s run the first query (with COUNT):

SELECT  *
FROM    [20090716_cross].table1 t1
JOIN    (
SELECT  t2o.*,
(
SELECT  COUNT(*)
FROM    [20090716_cross].table2 t2i
WHERE   t2i.id <= t2o.id
) AS rn
FROM    [20090716_cross].table2 t2o
) t2
ON      t2.rn <= t1.row_count
ORDER BY
t1.id, t2.id

  

id row_count id value rn
1 2 1 Value 1 1
1 2 2 Value 2 2
2 1 1 Value 1 1
3 2 1 Value 1 1
3 2 2 Value 2 2
4 1 1 Value 1 1
5 2 1 Value 1 1
5 2 2 Value 2 2
8 rows fetched in 0.0000s (498.4063s)
Table ‘table1‘. Scan count 2, logical reads 200002, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Worktable‘. Scan count 100000, logical reads 8389920, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Worktable‘. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘table2‘. Scan count 4, logical reads 1077, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 947655 ms,  elapsed time = 498385 ms.

This query, as was expected, is very unoptimal. It runs for more than 500 seconds.

Here‘s the query plan:

SELECT
  Sort
    Compute Scalar
      Parallelism (Gather Streams)
        Inner Join (Nested Loops)
          Inner Join (Nested Loops)
            Clustered Index Scan ([20090716_cross].[table2])
            Compute Scalar
              Stream Aggregate
                Eager Spool
                  Clustered Index Scan ([20090716_cross].[table2])
          Clustered Index Scan ([20090716_cross].[table1])

For each row selected from table2, it counts all previous rows again an again, never recording the intermediate result. The complexity of such an algorithm is O(n^2), that‘s why it takes so long.

Let‘s run he second query, which uses ROW_NUMBER():

SELECT  *
FROM    [20090716_cross].table1 t1
JOIN    (
SELECT  t2o.*, ROW_NUMBER() OVER (ORDER BY id) AS rn
FROM    [20090716_cross].table2 t2o
) t2
ON      t2.rn <= t1.row_count
ORDER BY
t1.id, t2.id

  

id row_count id value rn
1 2 1 Value 1 1
1 2 2 Value 2 2
2 1 1 Value 1 1
3 2 1 Value 1 1
3 2 2 Value 2 2
4 1 1 Value 1 1
5 2 1 Value 1 1
5 2 2 Value 2 2
8 rows fetched in 0.0006s (0.5781s)
Table ‘Worktable‘. Scan count 1, logical reads 214093, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘table2‘. Scan count 1, logical reads 522, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘table1‘. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 578 ms,  elapsed time = 579 ms.

This is much faster, only 0.5 ms.

Let‘s look into the query plan:

SELECT
  Inner Join (Nested Loops)
    Clustered Index Scan ([20090716_cross].[table1])
  Lazy Spool
    Sequence Project (Compute Scalar)
      Compute Scalar
        Segment
          Clustered Index Scan ([20090716_cross].[table2])

This is much better, since this query plan keeps the intermediate results while calculating the ROW_NUMBER.

However, it still calculates ROW_NUMBERs for all 100,000 of rows in table2, then puts them into a temporary index over rn created by Lazy Spool, and uses this index in a nested loop to range the rns for each row fromtable1.

Calculating and indexing all ROW_NUMBERs is quite expensive, that‘s why we see 214,093 logical reads in the query statistics.

Finally, let‘s try a CROSS APPLY:

SELECT  *
FROM    [20090716_cross].table1 t1
CROSS APPLY
(
SELECT  TOP (t1.row_count) *
FROM    [20090716_cross].table2
ORDER BY
id
) t2
ORDER BY
t1.id, t2.id

  

id row_count id value
1 2 1 Value 1
1 2 2 Value 2
2 1 1 Value 1
3 2 1 Value 1
3 2 2 Value 2
4 1 1 Value 1
5 2 1 Value 1
5 2 2 Value 2
8 rows fetched in 0.0004s (0.0008s)
Table ‘table2‘. Scan count 5, logical reads 10, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘table1‘. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1 ms.

This query is instant, as it should be.

The plan is quite simple:

SELECT
  Inner Join (Nested Loops)
    Clustered Index Scan ([20090716_cross].[table1])
    Top
      Clustered Index Scan ([20090716_cross].[table2])

For each row from table1, it just takes first row_count rows from table2. So simple and so fast.

Summary:

While most queries which employ CROSS APPLY can be rewritten using an INNER JOINCROSS APPLY can yield better execution plan and better performance, since it can limit the set being joined yet before the join occurs.

时间: 2024-10-05 07:12:54

INNER JOIN vs. CROSS APPLY的相关文章

SQLSERVER表连接(INNER JOIN,LEFT JOIN,RIGHT JOIN,FULL JOIN,CROSS JOIN,CROSS APPLY,OUTER APPLY)

1 常用表连接(inner join,left join,right join,full join,cross join) if object_id(N'table1',N'U') is not null drop table table1 if object_id(N'table2',N'U') is not null drop table table2 create table table1(id int,name varchar(20)) insert into table1 select

使用 CROSS APPLY 与 OUTER APPLY 连接查询

?  前言 日常开发中遇到多表查询时,首先会想到 INNER JOIN 或 LEFT OUTER JOIN 等等,但是这两种查询有时候不能满足需求.比如,左表一条关联右表多条记录时,我需要控制右表的某一条或多条记录跟左表匹配.貌似,INNER JOIN 或 LEFT OUTER JOIN 不能很好完成.但是 CROSS APPLY 与 OUTER APPLY 可以,下面用示例说明. 1.   示例一 ?  有两张表:Student(学生表)和 Score(成绩表),数据如下: 1)   查询每个

SQL Server中CROSS APPLY和OUTER APPLY的应用详解

SQL Server数据库操作中,在2005以上的版本新增加了一个APPLY表运算符的功能.新增的APPLY表运算符把右表表达式应用到左表表达式中的每一行.它不像JOIN那样先计算那个表表达式都可以,APPLY必选先逻辑地计算左表达式.这种计算输入的逻辑顺序允许吧右表达式关联到左表表达式. APPLY有两种形式,一个是OUTER APPLY,一个是CROSS APPLY,区别在于指定OUTER,意味着结果集中将包含使右表表达式为空的左表表达式中的行,而指定CROSS,则相反,结果集中不包含使右表

cross apply 和 outer apply

使用APPLY运算符可以实现查询操作的外部表表达式返回的每个调用表值函数.表值函数作为右输入,外部表表达式作为左输入. 通过对右输入求值来获得左输入每一行的计算结果,生成的行被组合起来作为最终输出.APPLY 运算符生成的列的列表是左输入 中的列集,后跟右输入返回的列的列表. APPLY存在两种形式: CROSS APPLY 和 OUTER APPLY . CROSS APPLY 仅返回外部表中通过表值函数生成结果集的行. OUTER APPLY 既返回生成结果集的行,又返回不生成结果集的行,其

SQL 关于apply的两种形式cross apply 和 outer apply

例子: CREATE TABLE [dbo].[Customers]( [customerid] [char](5) COLLATE Chinese_PRC_CI_AS NOT NULL, [city] [varchar](10) COLLATE Chinese_PRC_CI_AS NOT NULL, PRIMARY KEY CLUSTERED ( [customerid] ASC )WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY]

CROSS APPLY和 OUTER APPLY 区别详解

SQL Server 2005 新增 cross apply 和 outer apply 联接语句,增加这两个东东有啥作用呢? 我们知道有个 SQL Server 2000 中有个 cross join 是用于交叉联接的.实际上增加 cross apply 和 outer apply 是用于交叉联接表值函数(返回表结果集的函数)的, 更重要的是这个函数的参数是另一个表中的字段.这个解释可能有些含混不请,请看下面的例子: -- 1. cross join 联接两个表 select * from T

mysql下分组取关联表指定提示方法,类似于mssql中的cross apply

转至:https://stackoverflow.com/questions/12113699/get-top-n-records-for-each-group-of-grouped-results 通过分组的排序及序号获取条数信息,可以使用到索引,没测试性能,不知道和mssql的cross apply性能差异性为多少,只是能实现相应的效果. 1 #MySQL 5.7.12 2 #please drop objects you've created at the end of the scrip

CROSS APPLY 和 OUTER APPLY 区别

我们知道有个 SQL Server 2000 中有个 cross join 是用于交叉联接的.实际上增加 cross apply 和 outer apply 是用于交叉联接表值函数(返回表结果集的函数)的, 更重要的是这个函数的参数是另一个表中的字段. -- OUTER APPLYselect *  from TABLE_1 T1cross apply FN_TableValue(T1.column_a) -- OUTER APPLYselect *  from TABLE_1 T1outer 

T-SQL CROSS APPLY、MERGE

写在前面 刚才看项目里一个存储过程,也是好长时间没有使用Sql Server2008了,好多写法和函数感觉到陌生,这就遇到了CROSS APPLY 和MERGE的语法,两者之前完全没接触过. 所以专门查了下SQL Server2008实战. 1.CROSS APPLY 从教程和数据查询结果来看CROSS APPLY完全是属于语法糖,下面是我基于AdventrueWorkR2查询的,使用了CROSS APPLY和INNER JOIN两种方式. USE [AdventureWorks2008R2]