原文出自:
问题:
大部分人都知道不要在where子句中调用函数,这样会影响你的性能。但是如果在SELECT中使用呢?本文将尝试移除select中的函数调用能戏剧性地增强性能,特别在返回大数据量时。
解决方案:
示例表和函数:
在此例中,我们将创建两个示例表和两个访问这些表的函数。至于填充表,你将不得不使用一种工具,如Visual
Studio来填充他们以提供一些合理的真实数据。在本例中,将对每个表填充20万数据。其中一件需要注意的事是,这些示例函数只返回每个买家的一笔记录。几时存在多个买家。
下面是代码:
-- Table creationlogic
CREATE TABLE[dbo].[CarSale](
[CarSaleID] [int] IDENTITY(1,1) NOT NULL,
[PurchaseDate] [smalldatetime] NOT NULL,
CONSTRAINT [PK_CarSale] PRIMARY KEYCLUSTERED ([CarSaleID] ASC)
);
CREATE TABLE[dbo].[Buyer](
[BuyerID] [int] IDENTITY(1,1) NOT NULL,
[CarSaleID] [int] NOT NULL,
[LastName] [varchar](50) NULL,
[FirstName] [varchar](100) NULL,
[CompanyName] [varchar](200) NULL,
CONSTRAINT [PK_Buyer] PRIMARY KEY NONCLUSTERED([BuyerID] ASC)
);
ALTER TABLE[dbo].[Buyer] WITH CHECK ADD CONSTRAINT[FK_Buyer_CarSale] FOREIGN KEY([CarSaleID])
REFERENCES[dbo].[CarSale] ([CarSaleID]) ON UPDATE CASCADE ON DELETE CASCADE;
CREATE CLUSTEREDINDEX [IX_Buyer_CarSalelID] ON [dbo].[Buyer]([CarSaleID] ASC);
-- Function creationlogic
CREATE FUNCTION[dbo].[fnGetBuyerFirstName]
(@CarSaleID INT)
RETURNS VARCHAR (500)
AS
BEGIN
RETURN (SELECT Top 1FirstName
FROM Buyer
WHERE [email protected]
ORDER BY BuyerID)
END
GO
CREATE FUNCTION[dbo].[fnGetBuyerLastName]
(@CarSaleID INT)
RETURNS VARCHAR (500)
AS
BEGIN
RETURN (SELECT Top 1coalesce(LastName,CompanyName)
FROM Buyer
WHERE [email protected]
ORDER BY BuyerID)
END
GO
原始查询:
SELECT cs.PurchaseDate,
dbo.fnGetBuyerFirstName(cs.CarSaleID),
dbo.fnGetBuyerLastName(cs.CarSaleID)
FROM CarSale cs
ORDER BY CarSaleID;
从上面代码中可以看出,每条记录都调用一次函数。并且查询了Buyer表两次。当CarSale表有大量数据时,这种做法并不高效。执行计划如下:
即使我们使用where子句限制查询并只查询一条数据,通过查看执行计划,如下,可以看到,依旧要对Buyer表做两次搜索。
修改后的查询:
SELECT cs.PurchaseDate,
dbo.fnGetBuyerFirstName(cs.CarSaleID),
dbo.fnGetBuyerLastName(cs.CarSaleID)
FROM CarSale cs
WHERE CarSaleID=5
ORDER BY CarSaleID;
值得注意的是,在这个例子中,只返回了一条记录。一下带有更广where条件从而返回更多数据的查询会变得越来越慢。
去除函数的例子:
现在移除select中的函数调用,并使用表关联来实现同样结果,其中一个是使用了where子句,另外一个没有限制:
SELECT cs.PurchaseDate,FirstName,LastName
FROM CarSale cs
INNER JOIN (SELECT CarSaleID,MIN(BuyerID) AS SingleBuyerID FROM Buyer GROUP BY CarSaleID) m2
ON cs.CarSaleID=m2.CarSaleID
INNER JOIN Buyer m ON m2.CarSaleID=cs.CarSaleID AND m2.SingleBuyerID=m.BuyerID
ORDER BY cs.CarSaleID;
SELECT cs.PurchaseDate,FirstName,LastName
FROM CarSale cs
INNER JOIN (SELECT CarSaleID,MIN(BuyerID) AS SingleBuyerID FROM Buyer GROUP BY CarSaleID) m2
ON cs.CarSaleID=m2.CarSaleID
INNER JOIN Buyer m ON m2.CarSaleID=cs.CarSaleID AND m2.SingleBuyerID=m.BuyerID
WHERE cs.CarSaleID=5
ORDER BY cs.CarSaleID;
通过查看执行计划,可以得出不用函数以后,不再需要每条记录都去重新查找。这是通过merge join来处理的。
为了确认这点,我们看看刚才去掉了函数之后的查询,通过sql Profiler的跟踪,可以得到多大的性能提升:
Query |
WHERE Clause |
CPU (ms) |
Reads |
Writes |
Duration |
Original |
NO |
10734 |
1239655 |
0 |
25879 |
YES |
0 |
9 |
0 |
0 |
|
No Function Call |
NO |
578 |
16337 |
0 |
2457 |
YES |
0 |
11 |
0 |
0 |
通过上面的结果可以看出,当返回的结果很大时,能从中得到相当大的好处,包括CPU、逻辑读、持续时间等。当只返回一个结果时,性能更好。
最终版本,使用CTE:
因为在本例中,使用函数来返回单独的买家,所以可以使用CTE来取得进一步的性能:
WITH summary AS (SELECT CarSaleID,
BuyerID,
FirstName,
LastName,
ROW_NUMBER() OVER(PARTITION BY CarSaleID ORDER BY CarSaleID) AS rk
FROM Buyer)
SELECT PurchaseDate,s.FirstName,s.LastName FROM CarSale cs INNER JOIN summary s
ON s.CarSaleID=cs.CarSaleID WHERE s.rk = 1;
WITH summary AS (SELECT CarSaleID,
BuyerID,
FirstName,
LastName,
ROW_NUMBER() OVER(PARTITION BY CarSaleID ORDER BY CarSaleID) AS rk
FROM Buyer)
SELECT PurchaseDate,s.FirstName,s.LastName FROM CarSale cs INNER JOIN summary s
ON s.CarSaleID=cs.CarSaleID WHERE s.rk = 1 AND cs.CarSaleID=5;
通过执行计划和sqlprofiler对比得到:
Query |
WHERE Clause |
CPU (ms) |
Reads |
Writes |
Duration |
No Function Call add WITH statement |
NO |
266 |
15796 |
0 |
1931 |
YES |
0 |
6 |
0 |
0 |
总结:
我同意第一种方式容易实现并容易阅读,但是对性能提升来说,性能上的提升比代码量更重要。