sphinx mysql 增量索引

Sphinx mysql 增量索引

一、增量索引的理解：

向一个数据表插入数据时，这些新插入的数据，就是增量了，sphinx是根据索引来查找数据的，如果索引没有更新，新增数据是查不出来的，所以我们要更新主索引，更新增量索引,增量条件的设定就比较重要了。

二、sphinx增量索引的设置:数据库中的已有数据很大，又不断有新数据加入到数据库中，也希望能够检索到。全部重新建立索引很消耗资源，因为我们需要更新的数据相比较而言很少。例如。
原来的数据有几百万条，而新增的只是几千条。这样就可以使用“主索引+增量索引”的模式来实现近乎实时更新的功能。

三、原理:这个模式实现的基本原理是设置两个数据源和两个索引，为那些基本不更新的数据建立主索引，而对于那些新增的数据建立增量索引。主索引的更新频率可以设置的长一些(例如设置在每天的午夜进行)，而增量索引的更新频率，我们可以将时间设置的很短(几分钟左右)，这样在用户搜索的时候，我们可以同时查询这两个索引的数据。

使用增量索引需要用到一张计数表,记录每次重新构建主索引时，被索引表的最后一个数据id。每次构建都要更新。表以example.sql为例

四、开始配置

1.创建一张计数表和两张索引表

create table sph_counter(counter_id integer primary key not null,max_doc_id integer not null);

2.修改配置文件csft.conf

source src1

{

type = mysql

sql_host = localhost

sql_user = root

sql_pass = root

sql_db = test

sql_port = 3306 # optional, default is 3306

sql_sock = /tmp/mysql.sock

sql_query_pre = SET NAMES utf8

sql_query_pre = SET SESSION query_cache_type=OFF

#有代替sph_counter中的max_doc_id，没有会添加,更新计数表的索引

sql_query_pre =replace into sph_counter select 1,max(id) from news

sql_query =select id,title,content from news where id<= (select max_doc_id from sph_counter where counter_id=1)

}

#注意：ddj中的sql_query_pre的个数需和src1对应，否则可能搜索不出相应结果

#增量数据源

source ddj: src1

{

sql_ranged_throttle = 100

sql_query_pre=set names utf8

#查询增量数据

sql_query=select id,title,content from news where id > (select max_doc_id from sph_counter where counter_id=1)

}

index test1//主索引

{

source = src1

path = /usr/local/coreseek/var/data/test1

}

index ddj:test1//增量索引

{

source =ddj

path = /usr/local/coreseek/var/data/ddj

morphology = stem_en

}

注意:在400行左右配置

charset=zh_cn.utf8

charset_dictpath=/usr/local/mmseg3/etc

3.重新建立索引

如果sphinx正在运行，停止sphinx服务，然后根据配置文件来建立索引

/usr/local/sphinx/bin/indexer
--all

/usr/local/sphinx/bin/indexer test1

/usr/local/sphinx/bin/indexer
重索引名字

/usr/local/sphinx/bin/searchd --stop

/usr/local/sphinx/bin/indexer -c

/usr/local/sphinx/etc/sphinx.conf --all

/usr/local/sphinx/bin/searchd -c

/usr/local/sphinx/etc/sphinx.conf

/usr/local/sphinx/bin/indexer -c

/usr/local/sphinx/etc/sphinx.conf --all --rotate

4.索引合并

例如：将delta合并到main中

indexer --merge main delta

5.索引自动更新

需要使用到脚本

脚本的运行需要权限，利用crontab -e 来编辑

*/30 * * * * /bin/sh /usr/local/sphinx/etc/build_delta_index.sh > /dev/null 2>&1

30 2 * * * /bin/sh /usr/local/sphinx/etc/build_main_index.sh > /dev/null 2>&1

第一条是表示每30分钟运行 /usr/local/sphinx/etc/下的build_delta_index.sh 脚本，输出重定向。

第二条是表示每天的凌晨2：30分运行 /usr/local/sphinx/etc下的build_main_inde.sh 脚本，输出重定向。

时间： 2024-10-09 00:14:18

sphinx mysql 增量索引

sphinx mysql 增量索引的相关文章

sphinx配置增量索引和索引合并

php定时执行sphinx的增量索引

sphinx （coreseek）——3、区段查询与增量索引实例

SPHINX 增量索引实现近实时更新

sphinx增量索引和主索引来实现索引的实时更新

sphinx增量索引使用

sphinx续5-主索引增量索引和实时索引

php+中文分词scws+sphinx+mysql打造千万级数据全文搜索

Coreseek:部门查询和增量索引代替实时索引