PostgreSQL index types and index bloating

warehouse_db=# create table item (item_id integer not null,item_name text,item_price numeric,item_data text);
CREATE TABLE
warehouse_db=# create index item_idx on item(item_id);
CREATE INDEX

warehouse_db=# \di item_idx
List of relations
Schema | Name | Type | Owner | Table
--------+----------+-------+----------+-------
public | item_idx | index | postgres | item
(1 row)
warehouse_db=# \h create index
Command: CREATE INDEX
Description: define a new index
Syntax:
CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ name ] ON table_name [ USING method ]
( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
[ WITH ( storage_parameter = value [, ... ] ) ]
[ TABLESPACE tablespace_name ]
[ WHERE predicate ]
warehouse_db=# \di
List of relations
Schema | Name | Type | Owner | Table
--------+--------------------------------+-------+----------+---------------
public | PRIM_KEY | index | postgres | warehouse_tb1
public | PRM_KEY | index | postgres | history
public | cards_card_id_owner_number_key | index | postgres | cards
public | item_idx | index | postgres | item
public | item_item_id_idx | index | postgres | item
public | movies_title_copies_excl | index | postgres | movies
public | tools_pkey | index | postgres | tools
(7 rows)

warehouse_db=# \di item_item_id_idx
List of relations
Schema | Name | Type | Owner | Table
--------+------------------+-------+----------+-------
public | item_item_id_idx | index | postgres | item
(1 row)

warehouse_db=# drop index item_item_id_idx ;
DROP INDEX
http://www.postgresql.org/docs/9.4/static/sql-createindex.html.
types of index
single index
create index index_name on table_name(column);
warehouse_db=# create index item_single_index on item (item_id);
CREATE INDEX
warehouse_db=# create index item_multi_index on item (item_id,item_price);
CREATE INDEX
partial index:creating an index on the subset of the table

CREATE INDEX index_name ON table_name (column) WHERE (condition);

warehouse_db=# CREATE INDEX item_partial_index ON item (item_id) WHERE
(item_id < 106);
warehouse_db=# \d item;
Table "item"
Column | Type | Modifiers
------------+--------------------+-----------
item_id | integer | not null
item_name | text |
item_price | numeric |
item_data | text |
Indexes:
"item_index" btree (item_id)
"item_multi_index" btree (item_id, item_price)
"item_partial_index" btree (item_id) WHERE item_id < 106

the unique index
a unique index can be created on any column;it not only creates an index ,but also
enforces uniqueness of the column.

warehouse_db=# CREATE UNIQUE INDEX item_unique_idx ON item (item_id);
CREATE INDEX
Time: 485.644 ms
warehouse_db=# \d item_unique_idx;
List of relations
Schema | Name | Type | Owner | Table
--------+-----------------+-------+----------+-------
public | item_unique_idx | index | postgres | item
(1 row)

we can create a unique index explicitly using the CREATE UNIQUE
INDEX command and that it can be created implicitly by declaring a primary key on a table.
warehouse_db=# create table item
warehouse_db-# (
warehouse_db(# item_unique integer primary key,
warehouse_db(# item_name text,
warehouse_db(# item_price numeric,
warehouse_db(# item_data text);
CREATE TABLE
warehouse_db=# \d item
Table "public.item"
Column | Type | Modifiers
-------------+---------+-----------
item_unique | integer | not null
item_name | text |
item_price | numeric |
item_data | text |
Indexes:
"item_pkey" PRIMARY KEY, btree (item_unique), tablespace "tbs_yl"
Tablespace: "tbs_yl"

Here is an example of an implicit creation of a unique index by defining unique
constraints:
warehouse_db=# alter table item add constraint primary_key unique(item_unique);
ALTER TABLE
warehouse_db=# \d item;
Table "public.item"
Column | Type | Modifiers
-------------+---------+-----------
item_unique | integer | not null
item_name | text |
item_price | numeric |
item_data | text |
Indexes:
"item_pkey" PRIMARY KEY, btree (item_unique), tablespace "tbs_yl"
"primary_key" UNIQUE CONSTRAINT, btree (item_unique), tablespace "tbs_yl"
Tablespace: "tbs_yl"
The ALTER command adds a unique constraint to the item_id column and can be used as
the primary key.

explicitlycreate a unique index explicitly using the already discussed CREATE INDEX
command as follows:
warehouse_db=# create table item(
warehouse_db(# item_id integer primary key,
warehouse_db(# item_name text,
warehouse_db(# item_price numeric,
warehouse_db(# item_data text);
CREATE TABLE
warehouse_db=#
warehouse_db=# \d item
Table "public.item"
Column | Type | Modifiers
------------+---------+-----------
item_id | integer | not null
item_name | text |
item_price | numeric |
item_data | text |
Indexes:
"item_pkey" PRIMARY KEY, btree (item_id), tablespace "tbs_yl"
Tablespace: "tbs_yl"
warehouse_db=# create unique index idx_unique_id on item(item_id);
CREATE INDEX
warehouse_db=# \d item;
Table "public.item"
Column | Type | Modifiers
------------+---------+-----------
item_id | integer | not null
item_name | text |
item_price | numeric |
item_data | text |
Indexes:
"item_pkey" PRIMARY KEY, btree (item_id), tablespace "tbs_yl"
"idx_unique_id" UNIQUE, btree (item_id), tablespace "tbs_yl"
Tablespace: "tbs_yl"

warehouse_db=# insert into item values (1,‘boxing‘,200,‘glaves‘);
INSERT 0 1
warehouse_db=# insert into item values (1,‘hockey‘,300,‘shoes‘);
ERROR: duplicate key value violates unique constraint "item_pkey"
DETAIL: Key (item_id)=(1) already exists.
warehouse_db=# insert into item values (2,‘hockey‘,300,‘shoes‘);
INSERT 0 1

the expression index
For example, if we want to search for a case-insensitive item name,
then the normal way of doing this is as follows:
warehouse_db=# SELECT * FROM item WHERE UPPER(item_name) LIKE ‘COFFEE‘;
The preceding query will scan each row or table and convert item_name to uppercase and
compare it with COFFEE; this is really expensive. The following is the command to create
an expression index on the item_name column:
warehouse_db=# create index item_expression_index on item(upper(item_name));
CREATE INDEX
warehouse_db=# \d item;
Table "public.item"
Column | Type | Modifiers
------------+---------+-----------
item_id | integer | not null
item_name | text |
item_price | numeric |
item_data | text |
Indexes:
"item_pkey" PRIMARY KEY, btree (item_id), tablespace "tbs_yl"
"idx_unique_id" UNIQUE, btree (item_id), tablespace "tbs_yl"
"item_expression_index" btree (upper(item_name)), tablespace "tbs_yl"
Tablespace: "tbs_yl"

the implicit index
An index that is created automatically by the database is called an implicit index. The
primary key or unique constraint implicitly creates an index on that column.

index
creation on a table is a very expensive operation, and on a sizeably huge table, it can take
hours to build an index. This can cause difficulty in regards to performing any write
operations. To solve this issue, PostgreSQL has the concurrent index, which is useful
when you need to add indexes in a live database.

The syntax of a concurrent index is as follows:
CREATE INDEX CONCURRENTLY index_name ON table_name using btree(column);

The concurrent index is slower than the normal index because it completes index building
in two parts. This can be explained with the help of the following example:

warehouse_db=# CREATE INDEX idx_id ON item (item_id);
Time: 8265.473 ms
Time taken in creating a concurrent index idx_id using CREATE INDEX CONCURRENTLY:
warehouse_db=# CREATE INDEX CONCURRENTLY idx_id ON item (item_id);
Time: 51887.942 ms

index types
PostgreSQL supports the B-tree, hash, GiST, and GIN index methods. The index method
or type can be selected via the USING method. Different types of indexes have different
purposes, for example, the B-tree index is effectively used when a query involves the
range and equality operators and the hash index is effectively used when the equality
operator is used in a query.
Here is a simple example of how to use the index types:
warehouse_db=# CREATE INDEX index_name ON table_name USING btree(column);
the B-tree index
The B-tree index is effectively used when a query involves the equality operator (=) and
range operators (<, <=, >, >=, BETWEEN, and IN).

the hash index
Hash indexes are utilized when a query involves simple equivalent operators only. Here,
we create a hash index on the item table. You can see in the following example that the
planner chooses the hash index in the case of an equivalent operator and does not utilize
the hash index in the case of the range operator:
the hash index is the best for queries that have equivalent operators in the
WHERE clause. This can be explained with the help of the following example:
warehouse_db=# EXPLAIN SELECT COUNT(*) FROM item WHERE item_id = 100;
QUERY PLAN
------------------------------------------------------------------
Aggregate (cost=8.02..8.03 rows=1 width=0)
-> Index Scan using item_hash_index on item (cost=0.00..8.02 rows=1
width=0)
Index Cond: (item_id = 100)
(3 rows)
The hash index method is not suitable for range operators, so the planner will not select a
hash index for range queries:
warehouse_db=# EXPLAIN SELECT COUNT(*) FROM item WHERE item_id > 100;
QUERY PLAN
------------------------------------------------------------------
Aggregate (cost=25258.75..25258.76 rows=1 width=0)
-> Seq Scan on item (cost=0.00..22759.00 rows=999900 width=0)
Filter: (item_id > 100)
(3 rows)

To get the size of a table and an index, we can use the following:
SELECT pg_relation_size(‘table_name‘)
AS table_size,pg_relation_size(‘index_name‘) index_size
FROM pg_tables WHERE table_name like ‘table_name‘;
the GiST index
The Generalized Search Tree (GiST) index provides the possibility to create custom
data types with indexed access methods. It additionally provides an extensive set of
queries.
It can be utilized for operations beyond equivalent and range comparisons. The GiST
index is lossy, which means that it can create incorrect matches.
The syntax of the GiST index is as follows:
warehouse_db=# CREATE INDEX index_name ON table_name USING
gist(column_name);

the GIN index
“GIN stands for Generalized Inverted Index. GIN is designed for handling cases
where the items to be indexed are composite values, and the queries to be handled by
the index need to search for element values that appear within the composite items.
For example, the items could be documents, and the queries could be searches for
documents containing specific words”

Here is the syntax for the creation of a GIN index:
warehouse_db=# CREATE INDEX index_name ON table_name USING
gin(column_name);
The GIN index requires three times more space than GiST, but is three times faster than
GiST.
warehouse_db=# create extension pg_trgm ;
CREATE EXTENSION
Time: 117.645 ms
warehouse_db=# create table words(lineno int,simple_words text,special_words text);
CREATE TABLE
Time: 32.913 ms
warehouse_db=# insert into words values (generate_series(1,2000000),md5(random()::text),md5(random()::text));
INSERT 0 2000000
Time: 18268.619 ms
warehouse_db=# select count(*) from words where simple_words like ‘%a31%‘ and special_words like ‘%a31%‘;
count
-------
115
(1 row)

Time: 669.342 ms
warehouse_db=# create index words_idx on words (simple_words,special_words);
CREATE INDEX
Time: 22136.229 ms
warehouse_db=# select count(*) from words where simple_words like ‘%a31%‘ and special_words like ‘%a31%‘;
count
-------
115
(1 row)

Time: 658.988 ms
warehouse_db=# create index words_idx on words using gin(simple_words gin_trgm_ops,special_words gin_trgm_ops);
ERROR: relation "words_idx" already exists
Time: 0.952 ms
warehouse_db=# drop index words_idx ;
DROP INDEX
Time: 75.698 ms
warehouse_db=# create index words_idx on words using gin(simple_words gin_trgm_ops,special_words gin_trgm_ops);
CREATE INDEX
Time: 271499.350 ms
warehouse_db=# select count(*) from words where simple_words like ‘%a31%‘ and special_words like ‘%a31%‘;
count
-------
115
(1 row)

Time: 10.260 ms

http://www.sai.msu.su/~megera/wiki/Gin
http://www.postgresql.org/docs/9.4/static/pgtrgm.html

index bloating

As the architecture of PostgreSQL is based on MVCC, tables have the difficulty of dead
rows. Rows that are not visible to any transaction are considered dead rows. In a
continuous table, some rows are deleted or updated. These operations cause dead space in
a table. Dead space can potentially be reused when new data is inserted. Due to a lot of
dead rows, bloating occurs. There are various reasons for index bloating, and it needs to
be fixed to achieve more performance, because it hurts the performance of the database.
AUTO VACUUM is the best obviation from bloating, but it is a configurable parameter and can
be incapacitated or erroneously configured. There are multiple ways to fix index bloating;
To know more about MVCC, check out
http://www.postgresql.org/docs/current/static/mvcc-intro.html

dump and restore
In the case of bloating, the simplest way of prevention is to back up the table utilizing
pg_dump, drop the table, and reload the data into the initial table. This is an expensive
operation and sometimes seems too restrictive.

VCUUM
Vacuuming the table using the VACUUM command is another solution that can be used to fix
the bloat. The VACUUM command reshuffles the rows to ensure that the page is as full as
possible, but database file shrinking only happens when there are 100 percent empty pages
at the end of the file. This is the only case where VACUUM is useful to reduce the bloat. Its
syntax is as follows:
VACUUM table_name
The following example shows the usage of VACUUM on the item table:
warehouse_db=# VACUUM item;
The other way of using VACUUM is as follows:
warehouse_db=# VACUUM FULL item;

CLUSTER
As we discussed previously, rewriting and reordering of rows can fix the issue that can be
indirectly achieved using dump/restore, but this is an expensive operation. The other way
to do this is the CLUSTER command, which is used to physically reorder rows based on the
index. The CLUSTER command is used to create a whole initial copy of the table and the old
copy of the data is dropped. The CLUSTER command requires enough space, virtually twice
the disk space, to hold the initial organized copy of the data. Its syntax is as follows:
CLUSTER table_name USING index_name
As we discussed previously, rewriting and reordering of rows can fix the issue that can be
indirectly achieved using dump/restore, but this is an expensive operation. The other way
to do this is the CLUSTER command, which is used to physically reorder rows based on the
index. The CLUSTER command is used to create a whole initial copy of the table and the old
copy of the data is dropped. The CLUSTER command requires enough space, virtually twice
the disk space, to hold the initial organized copy of the data. Its syntax is as follows:
CLUSTER table_name USING index_name

Reindexing
If an index becomes inefficient due to bloating or data becomes randomly scattered, then
reindexing is required to get the maximum performance from the index. Its syntax is as
follows:
warehouse_db=# REINDEX TABLE item;

points to ponder
When using an index, you need to keep in mind the following things:
It will make sense to index a table column when you have a handsome number of
rows in a table.
When retrieving data, you need to make sure that good candidates for an index are
foreign keys and keys where min() and max () can be used when retrieving data.
This means column selectivity is very important to index effectively.
Don’t forget to remove unused indexes for better performance. Also, perform
REINDEX on all indexes once a month to clean up the dead tuples.
Use table partitioning along with an index if you have large amounts of data.
When you are indexing columns with null values, consider using a conditional index
with WHERE column_name IS NOT NULL.

时间: 2024-08-10 23:30:04

PostgreSQL index types and index bloating的相关文章

index seek和index scan 提高sql 效率

index seek和index scan 提高sql 效率解释解释index seek和index scan:索引是一颗B树,index seek是查找从B树的根节点开始,一级一级找到目标行.index scan则是从左到右,把整个B树遍历一遍.假设唯一的目标行位于索引树最右的叶节点上(假设是非聚集索引,树深度2,叶节点占用k页物理存储).index seek引起的IO是4,而index scan引起的IO是K,性能差别巨大.关于索引,可以仔细读读联机文档关于物理数据库体系结构部分     查

Index Seek和Index Scan的区别

Index Seek是Sql Server执行查询语句时利用建立的索引进行查找,索引是B树结构,Sql Server先查找索引树的根节点,一级一级向下查找,在查找到相应叶子节点后,取出叶子节点的数据.对于聚集索引,叶子节点是整个表的数据,能够获取到所有列的数据,而对于非聚集索引,叶子节点存储的是索引列的数据,如果索引有包含列,那么叶子节点中存储有包含列的数据,获取的数据是索引列和包含列,如果还需要其他列的数据,那么必须进行key lookup,根据索引叶子节点包含的“行地址”信息到源表中去获取数

mysql index hint 在index不存在时的处理

关于index_hint 在mysql查询语句中可以通过指定index_hint来告诉优化器如何使用索引,详细可以参考这里 index_hint: USE {INDEX|KEY} [FOR {JOIN|ORDER BY|GROUP BY}] ([index_list]) | IGNORE {INDEX|KEY} [FOR {JOIN|ORDER BY|GROUP BY}] (index_list) | FORCE {INDEX|KEY} [FOR {JOIN|ORDER BY|GROUP BY}

index seek与index scan

低效 Index Scan(索引扫描):就全扫描索引(包括根页,中间页和叶级页): 高效 Index Seek(索引查找):通过索引向前和向后搜索 : 解释解释index seek和index scan: 索引是一颗B树, index seek是查找从B树的根节点开始,一级一级找到目标行. index scan则是从左到右,把整个B树遍历一遍. 假设唯一的目标行位于索引树最右的叶节点上(假设是非聚集索引,树深度2,叶节点占用k页物理存储). index seek引起的IO是4,而index sc

elasticsearch index 之 create index(二)

创建索引需要创建索引并且更新集群index matedata,这一过程在MetaDataCreateIndexService的createIndex方法中完成.这里会提交一个高优先级,AckedClusterStateUpdateTask类型的task.索引创建需要即时得到反馈,异常这个task需要返回,会超时,而且这个任务的优先级也非常高.下面具体看一下它的execute方法,这个方法会在master执行任务时调用,这个方法非常长,主要完成以下三个功能:更新合并request,template

MySQL 执行计划中Extra(Using where,Using index,Using index condition,Using index,Using where)的浅析

  关于如何理解MySQL执行计划中Extra列的Using where.Using Index.Using index condition,Using index,Using where这四者的区别.首先,我们来看看官方文档关于三者的简单介绍(官方文档并没有介绍Using index,Using where这种情况): Using index (JSON property: using_index) The column information is retrieved from the ta

Clusetered Index VS NonClustered Index

关于ClusteredIndex 与 NonclusteredIndex的区别. 一般表,如果存在主Key,会自动生成一个主keyindex, 主key默认为ClusteredIndex.表格在存储过程中,会自动按照主Key进行排序. 数据在检索的时候,不需要Order by 主Key. 因为是将实际数据按主key进行排序,因此每张表只有一个ClusteredIndex. NonclusteredIndex 与 clusteredIndex不一样之处在于,NonclusteredIndex是另外

elasticsearch index 之 create index(-)

从本篇开始,就进入了Index的核心代码部分.这里首先分析一下索引的创建过程.elasticsearch中的索引是多个分片的集合,它只是逻辑上的索引,并不具备实际的索引功能,所有对数据的操作最终还是由每个分片完成.创建索引的过程,从elasticsearch集群上来说就是写入索引元数据的过程,这一操作只能在master节点上完成.这是一个阻塞式动作,在加上分配在集群上均衡的过程也非常耗时,因此在一次创建大量索引的过程master节点会出现单点性能瓶颈,能够看到响应过程很慢. 在开始具体源码分析之

jQuery中get(index)和eq(index)区别和用法

get(index) 取得其中一个匹配的元素. num表示取得第几个匹配的元素.从0开始,返回的是DOM对象 返回html 对象数组,不能继续调用 eq(index | -index) 获取当前链式操作中第N个jQuery对象,返回jQuery对象,当参数大于等于0时为正向选取;当参数为负数时为反向选取 可以继续做链式操作 +例如: <p> This is just a test.</p> <p> So is this</p> $('p').get(1) /