Fulltext Index Study4:management and performance

Only one full-text index is allowed per table. For a full-text index to be created on a table, the table must have a single, unique nonnull column. You can build a full-text index on columns of type char, varchar, nchar, nvarchar, text, ntext, image, xml, varbinary, and varbinary(max) can be indexed for full-text search. Creating a full-text index on a column whose data type is  varbinary, varbinary(max), image, or xml requires that you specify a type column. A type column is a table column in which you store the file extension (.doc, .pdf, .xls, and so forth) of the document in each row.

The process of creating and maintaining a full-text index is called a population (also known as a crawl). There are three types of full-text index population: full population, change tracking-based population, and incremental timestamp-based population. For more information, see Populate Full-Text Indexes.

1,Full-Text Index Structure

Fulltext Index会将字符列拆分成多个substring,每个substring 是一个word,如果substring是stopwords,那么该substring不会存储到fulltext index,但是stopwords的position会被考虑,substring的position是substring在列中的绝对位置。substring存在重复值,Full-Text Index Structure 使用四个column:substring,ColumnID,KeyID,Position ,timestamp来唯一标识一个column。在内部,fulltext 是由多个fragments构成的,如果一个column value 发生change,那么sql server 不会更新之前的fragments,而是在crawl时,创建一个新的fragment,timestamp 会标识column value 有变化,fulltext index会选择最新的fragments来返回数据。

如果表的数据行多,列值拆分的substring 多,那么Fulltext Index将会非常庞大。

引用doc:Create and Manage Full-Text Indexes

A good understanding of the structure of a full-text index will help you understand how the Full-Text Engine works. This topic uses the following excerpt of the Document table in Adventure Works as an example table. This excerpt shows only two columns, the DocumentID column and the Title column, and three rows from the table.

For this example, we will assume that a full-text index has been created on the Title column.


DocumentID


Title


1


Crank Arm and Tire Maintenance


2


Front Reflector Bracket and Reflector Assembly 3


3


Front Reflector Bracket Installation

For example, the following table, which shows Fragment 1, depicts the contents of the full-text index created on the Title column of the Document table. Full-text indexes contain more information than is presented in this table. The table is a logical representation of a full-text index and is provided for demonstration purposes only. The rows are stored in a compressed format to optimize disk usage.

Notice that the data has been inverted from the original documents. Inversion occurs because the keywords are mapped to the document IDs. For this reason, a full-text index is often referred to as an inverted index.

Also notice that the keyword "and" has been removed from the full-text index. This is done because "and" is a stopword, and removing stopwords from a full-text index can lead to substantial savings in disk space thereby improving query performance. For more information about stopwords, see Configure and Manage Stopwords and Stoplists for Full-Text Search.

Fragment 1


Keyword


ColId


DocId


Occurrence


Crank


1


1


1


Arm


1


1


2


Tire


1


1


4


Maintenance


1


1


5


Front


1


2


1


Front


1


3


1


Reflector


1


2


2


Reflector


1


2


5


Reflector


1


3


2


Bracket


1


2


3


Bracket


1


3


3


Assembly


1


2


6


3


1


2


7


Installation


1


3


4

The Keyword column contains a representation of a single token extracted at indexing time. Word breakers determine what makes up a token.

The ColId column contains a value that corresponds to a particular column that is full-text indexed.

The DocId column contains values for an eight-byte integer that maps to a particular full-text key value in a full-text indexed table. This mapping is necessary when the full-text key is not an integer data type. In such cases, mappings between full-text key values and DocId values are maintained in a separate table called the DocId Mapping table. To query for these mappings use the sp_fulltext_keymappings system stored procedure. To satisfy a search condition, DocId values from the above table need to be joined with the DocId Mapping table to retrieve rows from the base table being queried. If the full-text key value of the base table is an integer type, the value directly serves as the DocId and no mapping is necessary. Therefore, using integer full-text key values can help optimize full-text queries.

The Occurrence column contains an integer value. For each DocId value, there is a list of occurrence values that correspond to the relative word offsets of the particular keyword within that DocId. Occurrence values are useful in determining phrase or proximity matches, for example, phrases have numerically adjacent occurrence values. They are also useful in computing relevance scores; for example, the number of occurrences of a keyword in a DocId may be used in scoring.

2,Full-Text Index Fragments

The logical full-text index is usually split across multiple internal tables. Each internal table is called a full-text index fragment. Some of these fragments might contain newer data than others. For example, if a user updates the following row whose DocId is 3 and the table is auto change-tracked, a new fragment is created.


DocumentID


Title


3


Rear Reflector

In the following example, which shows Fragment 2, the fragment contains newer data about DocId 3 compared to Fragment 1. Therefore, when the user queries for "Rear Reflector" the data from Fragment 2 is used for DocId 3. Each fragment is marked with a creation timestamp that can be queried by using the sys.fulltext_index_fragments catalog view.

Fragment 2


Keyword


ColId


DocId


Occ


Rear


1


3


1


Reflector


1


3


2

As can be seen from Fragment 2, full-text queries need to query each fragment internally and discard older entries. Therefore, too many full-text index fragments in the full-text index can lead to substantial degradation in query performance. To reduce the number of fragments, reorganize the fulltext catalog by using the REORGANIZE option of the ALTER FULLTEXT CATALOG Transact-SQL statement. This statement performs a master merge, which merges the fragments into a single larger fragment and removes all obsolete entries from the full-text index.

After being reorganized, the example index would contain the following rows:


Keyword


ColId


DocId


Occ


Crank


1


1


1


Arm


1


1


2


Tire


1


1


4


Maintenance


1


1


5


Front


1


2


1


Rear


1


3


1


Reflector


1


2


2


Reflector


1


2


5


Reflector


1


3


2


Bracket


1


2


3


Assembly


1


2


6


3


1


2


7

3,configure stopwords

参考doc:Configure and Manage Stopwords and Stoplists for Full-Text Search

To prevent a full-text index from becoming bloated, SQL Server has a mechanism that discards commonly occurring strings that do not help the search. These discarded strings are called stopwords. During index creation, the Full-Text Engine omits stopwords from the full-text index. This means that full-text queries will not search on stopwords.

Although it ignores the inclusion of stopwords, the full-text index does take into account their position.

通过 CREATE FULLTEXT STOPLIST (Transact-SQL)  和 DROP FULLTEXT STOPLIST (Transact-SQL) 创建和删除 StopLists,通过ALTER FULLTEXT STOPLIST (Transact-SQL) 增加和删除 StopList的Stopwords.

ALTER FULLTEXT STOPLIST stoplist_name
{
   ADD [N] ‘stopword‘ LANGUAGE language_term
   | DROP
    {
        ‘stopword‘ LANGUAGE language_term
      | ALL LANGUAGE language_term
      | ALL
     }
};

通过  sys.fulltext_stoplists (Transact-SQL) 和 sys.fulltext_stopwords (Transact-SQL) 查看系统中已经存在的StopLists 和 StopWords。

3,维护Fulltext catalog

由于一个fulltext index 可能存在多个 fragments,当数据更新时,新的fragments 会被创建,但是旧的fragments 不会被删除,这样会导致fragments的增加,性能下降。由于每一个Fulltext index 都属于一个catalog,通过对catalog 进行 rebuild 或reorganize,可以重新创建会组织fulltext index 的结构,提高查询性能。

ALTER FULLTEXT CATALOG catalog_name
{ REBUILD [ WITH ACCENT_SENSITIVITY = { ON | OFF } ]
| REORGANIZE
| AS DEFAULT
}

REBUILD      

Tells SQL Server to rebuild the entire catalog. When a catalog is rebuilt, the existing catalog is deleted and a new catalog is created in its place. All the tables that have full-text indexing references are associated with the new catalog. Rebuilding resets the full-text metadata in the database system tables.

REORGANIZE  

Tells SQL Server to perform a master merge, which involves merging the smaller indexes created in the process of indexing into one large index. Merging the full-text index fragments can improve performance and free up disk and memory resources. If there are frequent changes to the full-text catalog, use this command periodically to reorganize the full-text catalog.

REORGANIZE also optimizes internal index and catalog structures.

Keep in mind that, depending on the amount of indexed data, a master merge may take some time to complete. Master merging a large amount of data can create a long running transaction, delaying truncation of the transaction log during checkpoint. In this case, the transaction log might grow significantly under the full recovery model. As a best practice, ensure that your transaction log contains sufficient space for a long-running transaction before reorganizing a large full-text index in a database that uses the full recovery model. For more information, see Manage the Size of the Transaction Log File.

参考doc:ALTER FULLTEXT CATALOG (Transact-SQL)

参考doc:

Create and Manage Full-Text Indexes

Manage Full-Text Indexes

Improve the Performance of Full-Text Indexes

时间: 2024-12-29 23:31:48

Fulltext Index Study4:management and performance的相关文章

Fulltext Index Study8:Resouce Consumption

一,查看Disk Consumption 1,通过SSMS查看Full-Text Index 的 Disk Consumption 在Storage->Full Text Catalogs,选择某一个Catalog,点击属性,查看Catalog Size,就是位于Catalog中的属于fulltext的internal tables的总大小. 2,Population Schedule 通过Population Schedule tab,创建schedule和Job,按照schedule对ful

Fulltext Index Study3:Query

在query 语句中,可以使用 contains predicate来调用Fulltext Index,实现比like速度更快的查询.使用contains能够进行term的extract匹配查询或term的前缀匹配查询,还能够进行基于词根的steming查询,基于自定义同义词文件的synonym查询,基于距离和顺序的相邻term查询.和like 相比,contains不能进行后缀匹配查询.如果Fulltext Index 能够满足业务需求,那么Fulltext Index是一个非常不错的选择,跟

Fulltext Index Study1:Usage

一,在创建Fulltext Index的table上,必须使用Key Index(unique, single-key, non-nullable column) CREATE UNIQUE INDEX ui_dbLogID ON [dbo].[DatabaseLog]([DatabaseLogID]); The KEY INDEX must be a unique, single-key, non-nullable column. Select the smallest unique key

Fulltext Index Study6:Population monitor

一, filter daemon host sys.dm_fts_fdhosts Returns information on the current activity of the filter daemon host or hosts on the server instance. fdhost_process_id:Windows process ID of the filter daemon host. max_thread:Maximum number of threads in th

Fulltext Index Study2:Pupulate

Creating and maintaining a full-text index involves populating the index by using a process called a population (also known as a crawl). 由于创建Fulltext Index 会消耗大量的系统资源,因此Fulltext Index 必须在系统空间的时间进行maintain和crawl.在创建Fulltext Index时,通过指定 CHANGE_TRACKING

Fulltext Index Study7: maintain fragment

A fulltext index uses internal tables called full-text index fragments to store the inverted index data. 一,查看fragment 1, sys.fulltext_index_fragments Status of the fragment, one of: 0 = Newly created and not yet used 1 = Being used for insert during

FULLTEXT INDEX全文索引

给现有的wxinfo表的sourceUrl 字段创建全文索引 ALTER TABLE wxinfoADD FULLTEXT INDEX sourceUrl (sourceUrl) 创建全文索引前: SELECT * FROM wxinfo WHERE sourceUrl LIKE '%查询字符串%' 创建全文索引后: SELECT * FROM wxinfo WHERE MATCH(sourceUrl) AGAINST('查询字符串') 备注1:目前,使用MySQL自带的全文索引时,如果查询字符

老李分享:《Java Performance》笔记1——性能分析基础 1

老李分享:<Java Performance>笔记1——性能分析基础 1.性能分析两种方法: (1).自顶向下: 应用开发人员通过着眼于软件栈顶层的应用,从上往下寻找性能优化的机会. (2).自底向上: 性能专家从软件栈底层的CPU统计数据(例如CPU高速缓存未命中率.CPU指令效率)开始,逐渐上升到应用自身的结构或应用常见的使用方式. 2.CPU使用率: 大多数操作系统的CPU使用率分为用户态CPU使用率和系统态CPU使用率. 用户态CPU使用率:执行应用程序代码的时间占总CPU时间的百分比

在MYSQL中运用全文索引(FULLTEXT index)

在MYSQL中使用全文索引(FULLTEXT index) MYSQL的一个很有用的特性是使用全文索引(FULLTEXT index)查找文本的能力.目前只有使用MyISAM类型表的时候有效(MyISAM是默认的表类型,如果你不知道使用的是什么类型的表,那很可能就是 MyISAM).全文索引可以建立在TEXT,CHAR或者VARCHAR类型的字段,或者字段组合上.我们将建立一个简单的表用来解释各种特性.简单用法(MATCH()函数)对3.23.23以后的版本有效,复杂的用法(IN BOOLEAN