PostgreSQL的索引选型

PostgreSQL里面给全文检索或者模糊查询加索引提速的时候,一般会有两个选项,一个是GIST类型,一个是GIN类型,官网给出的参考如下:

There are substantial performance differences between the two index types, so it is important to understand their characteristics.

A GiST index is lossy, meaning that the index may produce false matches, and it is necessary to check the actual table row to eliminate such false matches. (PostgreSQL does this automatically when needed.) GiST indexes are lossy because each document is represented in the index by a fixed-length signature. The signature is generated by hashing each word into a single bit in an n-bit string, with all these bits OR-ed together to produce an n-bit document signature. When two words hash to the same bit position there will be a false match. If all words in the query have matches (real or false) then the table row must be retrieved to see if the match is correct.

Lossiness causes performance degradation due to unnecessary fetches of table records that turn out to be false matches. Since random access to table records is slow, this limits the usefulness of GiST indexes. The likelihood of false matches depends on several factors, in particular the number of unique words, so using dictionaries to reduce this number is recommended.

GIN indexes are not lossy for standard queries, but their performance depends logarithmically on the number of unique words. (However, GIN indexes store only the words (lexemes) of tsvector values, and not their weight labels. Thus a table row recheck is needed when using a query that involves weights.)

In choosing which index type to use, GiST or GIN, consider these performance differences:

GIN index lookups are about three times faster than GiST

GIN indexes take about three times longer to build than GiST

GIN indexes are moderately slower to update than GiST indexes, but about 10 times slower if fast-update support was disabled (see Section 54.3.1 for details)

GIN indexes are two-to-three times larger than GiST indexes

As a rule of thumb, GIN indexes are best for static data because lookups are faster. For dynamic data, GiST indexes are faster to update. Specifically, GiST indexes are very good for dynamic data and fast if the number of unique words (lexemes) is under 100,000, while GIN indexes will handle 100,000+ lexemes better but are slower to update.

Note that GIN index build time can often be improved by increasing maintenance_work_mem, while GiST index build time is not sensitive to that parameter

参考:http://www.postgresql.org/docs/9.2/static/textsearch-indexes.html

时间: 2024-07-28 20:02:16

PostgreSQL的索引选型的相关文章

浅谈PostgreSQL的索引

1. 索引的特性 1.1 加快条件的检索的特性 当表数据量越来越大时查询速度会下降,在表的条件字段上使用索引,快速定位到可能满足条件的记录,不需要遍历所有记录. create table t(id int, info text); insert into t select generate_series(1,10000),'lottu'||generate_series(1,10000); create table t1 as select * from t; create table t2 a

(转)浅谈PostgreSQL的索引

1. 索引的特性 1.1 加快条件的检索的特性 当表数据量越来越大时查询速度会下降,在表的条件字段上使用索引,快速定位到可能满足条件的记录,不需要遍历所有记录. create table t(id int, info text); insert into t select generate_series(1,10000),'lottu'||generate_series(1,10000); create table t1 as select * from t; create table t2 a

postgresql 排序索引

官方网站 In addition to simply finding the rows to be returned by a query, an index may be able to deliver them in a specific sorted order. This allows a query's ORDER BY specification to be honored without a separate sorting step. Of the index types cur

生产环境修改PostgreSQL表索引对应的表空间

通过iostat命令发现某块磁盘的io使用率经常保持在100%,通过blkid命令获取linux raid存储盘符和挂载点的关系后,最后发现是挂载点上的一个数据库表空间在占用大io. 现象 [email protected]:~$ iostat -xm 3 |grep -v dm avg-cpu:  %user   %nice %system %iowait  %steal   %idle           11.68    0.00    3.82    8.63    0.00   75.

PostgreSQL创建索引并避免写数据锁定(并发的索引)

关于并发建立索引:http://58.58.27.50:8079/doc/html/9.3.1_zh/sql-createindex.html 写这篇blog源自一个帅哥在建索引发生了表锁的问题.先介绍一下Postgresql的建索引语法: Version:9.1 CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ name ] ON table [ USING method ]     ( { column | ( expression ) } [ COLLA

PostgreSQL Select 索引优化

使用 gin() 创建全文索引后,虽然有走索引,但是当结果集很大时,查询效率还是很底下, SELECT keyword,avg_mon_search,competition,impressions,ctr,position,suggest_bid,click,update_time FROM keyword WHERE update_time is not null and plainto_tsquery('driver') @@ keyword_participle ORDER BY avg_

PostgreSQL查看索引的使用情况

--========================================== --查看索引的使用情况 --索引在重建或删除新建时sys.dm_db_index_usage_stats中相关的数据会被清除 --索引在重整是不会清除sys.dm_db_index_usage_stats的数据 SELECT DB_NAME(ixu.database_id) DataBase_Name , OBJECT_NAME(ixu.object_id) Table_Name , ix.name Ind

是什么影响了数据库索引选型?

主存存取原理 主存的构成 主存储器(简称主存或内存)包括存取体.各种逻辑部件及控制电路等.存储体由许多存储单元组成,每个存储单元又包含若干个存储元件,每个存储元件能寄存一位二进制代码"0"或"1".这样,一个存储单元可以存储一串二进制代码,这串二进制代码称为存储字,这串二进制代码的位数称为存储字长,可以是8位.16位或者32位等. 主存与CPU的联系 MAR(Memory Address Register)是存储器地址寄存器,用来存放欲访问的存储单元的地址,其位数对

postgresql —— 查看索引

语句: SELECT tablename, indexname, indexdef FROM pg_indexes WHERE tablename = 'user_tbl' ORDER BY tablename, indexname; 233 原文地址:https://www.cnblogs.com/lemos/p/11616000.html