刚熟悉PG的开发者接触PG时,或多或少会碰到一些问题.
常见的表现是碰到一些慢SQL.这时候别急着吐槽.绝大多数并不是问题,稍微的语法调整或进行一些简单的优化就能解决问题.下面具体分析几个案例.
一: 中文字符串的索引扫描慢
test =# \d testidx
Table"test.testidx"
Column | Type | Modifiers
----------------+-----------------------------+-----------
id | numeric |
table_id | numeric |
description | character varying(4000) |
user_comment | character varying(4000) |
encoding | character varying(64) |
这是一个很常见的表,它所在database的 Encoding Collate Ctype 都是zh_CN.UTF-8
为了检索description的信息,很常见的(www.neitui.me),我们在该列上创建一个btree索引.
test=# create index idx_testidx on testidx(description);
CREATE INDEX
检索列中的信息使用like语句进行匹配,发现查询计划居然没有使用索引.
test=# explain select description from testidxwhere description like ‘test%‘;
QUERY PLAN
-------------------------------------------------------------
SeqScan on testidx (cost=0.00..30151.00rows=64 width=28)
Filter: ((description)::text ~~ ‘test%‘::text)
(2 rows)
有点不甘心,禁掉索引看个究竟,索引是用上了,但是条件过滤放到了索引外.数据用索引扫描获取,生成了位图,然后走Bitmap Heap Scan,很明显这是有问题的.
test=# set enable_seqscan=off;
SET
test=# explain select description from testidxwhere description like ‘test%‘;
QUERYPLAN
------------------------------------------------------------------------------------
Bitmap Heap Scan on testidx (cost=29756.57..59907.57 rows=64 width=28)
Filter: ((description)::text ~~ ‘test%‘::text)
-> Bitmap Index Scan on idx_testidx (cost=0.00..29756.55 rows=1000000 width=0)
(3 rows)
实际的执行一遍,看看执行情况,也不乐观,1000000行数据都取了出来,堆扫描的过滤器过滤掉所有的行
执行该SQL用了接近半秒,太慢了.
test=# explain analyze select description fromtestidx where description like ‘test%‘;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on testidx (cost=29756.57..59907.57 rows=64 width=28)(actual time=407.548..407.548 rows=0 loops=1)
Filter: ((description)::text ~~ ‘test%‘::text)
Rows Removed by Filter: 1000000
-> Bitmap Index Scan onidx_testidx (cost=0.00..29756.55rows=1000000 width=0) (actual time=166.581..166.581 rows=1000000 loops=1)
Total runtime: 407.590 ms
(5 rows)
原因很简单
1 在utf8编码下表中的列需要按照utf8的规则来操作(各种操作符 > =< (~~)like 等)
2 创建索引时没有指定比较方式,默认采用standard"C"的字符串比较方法在UTF8上只支持 = 操作符.
3 创建索引时指定特定的比较方式能支持索引的like.
4 btree gin hash索引的用法相同.
5 pg中char varchar text数据类型适用上述规则.
The operator classes text_pattern_ops,varchar_pattern_ops, and bpchar_pattern_ops support B-tree indexes on the typestext, varchar, and char respectively. The difference from the default operatorclasses is that the values are compared strictly character by character ratherthan according to the locale-specific collation rules. This makes theseoperator classes suitable for use by queries involving pattern matchingexpressions (LIKE or POSIX regular expressions) when the database does not usethe standard "C" locale. As an example, you might index a varcharcolumn like this:
来看看create index的语法:
CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [name ] ON table_name [ USING method ]
({ column_name | ( expression ) } [ ] [opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
[WITH ( storage_parameter = value [, ... ] ) ]
[TABLESPACE tablespace_name ]
[WHERE predicate ]
collation
The name of the collation to use for theindex. By default, the index uses the collation declared for the column to beindexed or the result collation of the expression to be indexed. Indexes withnon-default collations can be useful for queries that involve expressions usingnon-default collations.
于是,可以用该语法创建对应可用的索引.
test=# create index idx_testidx2 on testidx(description varchar_pattern_ops);
CREATE INDEX
让我们来看看效果,Index Cond出现了.使用索引扫描,并且做了条件过滤,可能有64行有效数据.
test=# explain select description from testidxwhere description like ‘test%‘;
QUERYPLAN
------------------------------------------------------------------------------------
Index Only Scan using idx_testidx2 ontestidx (cost=0.55..8.57 rows=64width=28)
Index Cond: ((description ~>=~ ‘test‘::text) AND (description ~<~‘tesu‘::text))
Filter: ((description)::text ~~ ‘test%‘::text)
(3 rows)
实际执行结果验证了索引的有效性,SQL效率执行效果明显.
test=# explain analyze select description fromtestidx where description like ‘test%‘;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Index Only Scan using idx_testidx2 ontestidx (cost=0.55..8.57 rows=64width=28) (actual time=0.081..0.081 rows=0 loops=1)
Index Cond: ((description ~>=~ ‘test‘::text) AND (description ~<~‘tesu‘::text))
Filter: ((description)::text ~~ ‘test%‘::text)
Heap Fetches: 0
Total runtime: 0.105 ms
(5 rows)