sphinx实时索引和高亮显示
时间 2014-06-25 14:50:58 linux技术分享 -欧阳博客
原文 http://www.wantlearn.net/825
上次介绍了coreseek与sphinx的区别,并详细记录了安装coreseek文档说明,以及给php加上sphinx模块,详细内容请参考我写的coreseek详解这篇文档,这次主要介绍sphinx是如何做到实时索引.首先配置进入到coreseek配置文件目录,对原始配置文件进行配制,这里介略说下coreseek配制文件,主要分为主数据源,增量数据源,主索引,增量索引,索引器配制、以及还有守护进程配制。如果应用在大型系统上还会涉及到分布式索引,和增量分布式索引,由于分布式索引过于复杂,这里就不说.下面贴出我在项目中用到的sphinx配制文件
##主数据源 source main { type = mysql sql_host = localhost sql_user = root sql_pass = sql_db = test sql_port = 3306 # optional, default is 3306 sql_sock = /tmp/mysql.sock sql_query_pre = SET NAMES utf8 # sql_query_pre = SET SESSION query_cache_type=OFF sql_query_pre = replace into sph_counter select 1,max(id) from post sql_query=select id,title,content from post where id <=(select max_doc_id from sph_counter where count_id = 1) sql_ranged_throttle = 0 sql_query_info = SELECT * FROM post WHERE id=$id } #增量数据源 source delta : main { sql_query_pre=set names utf8 sql_query=select id,title,content from post where id >(select max_doc_id from sph_counter where count_id = 1) } #主索引 index main { source = main path = /usr/local/coreseek/var/data/main docinfo = extern mlock = 0 morphology = none min_word_len = 1 charset_type = zh_cn.utf-8 charset_dictpath = /usr/local/mmseg/etc/ html_strip = 0 } #增量索引 index delta : main { source=delta path = /usr/local/coreseek/var/data/delta # morphology = stem_en } ##索引器 indexer { mem_limit = 128M } ###守护进程设置 searchd { log = /usr/local/coreseek/var/log/searchd.log query_log = /usr/local/coreseek/var/log/query.log read_timeout = 5 client_timeout = 300 max_children = 30 pid_file = /usr/local/coreseek/var/log/searchd.pid max_matches = 1000 seamless_rotate = 1 preopen_indexes = 0 unlink_old = 1 mva_updates_pool = 1M max_packet_size = 8M max_filters = 256 max_filter_values = 4096 } |
上面请注意我的sql语句的写法,这里是一个核心,也是决定sphinx能否配置成功的一个关键,下面贴出sph_counter和post表结构,这里做下说明sph_count表是与sphinx实时索引相关的表
CREATE TABLE `post` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `title` varchar(254) NOT NULL, `content` text, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=42 DEFAULT CHARSET=utf8; CREATE TABLE `sph_counter` ( `count_id` int(10) unsigned NOT NULL AUTO_INCREMENT, `max_doc_id` int(11) DEFAULT NULL, PRIMARY KEY (`count_id`) ) ENGINE=MyISAM AUTO_INCREMENT=2 DEFAULT CHARSET=utf8 |
下面通过一段程序介绍shpinx如何实现高显示和实时索引
<html> <head> <title>spinx</title> <meta charset="utf-8" /> </head> <body> <form action="find.php" method="post"> <input type="text" name="search"/> <input type="submit" value="提交"> </form> </body> </html> |
header("content-type:text/html;charset=utf-8"); $keyword = $_POST[‘search‘]; $sphinx = new SphinxClient(); $sphinx->SetServer("localhost",9312); $sphinx->SetMatchMode(SPH_MATCH_ANY); $result=$sphinx->query("$keyword","*"); $key = array_keys($result[‘matches‘]); $ids = implode(‘,‘,$key); $conn = mysql_connect(‘localhost‘,‘root‘,‘‘)or die(‘mysql connect failed‘); mysql_select_db(‘test‘); mysql_set_charset(‘utf8‘,$conn); $sql = "select * from post where id in($ids)"; $res = mysql_query($sql); $opt = array("before_match"=>"<font style=‘font-weight:bold;color:#f00‘>","after_match"=>"</font>"); while($row=mysql_fetch_assoc($res)){ echo ‘<pre>‘; //这里为sphinx高亮显示 $rows = $sphinx->buildExcerpts($row,"main",$keyword,$opt); print_r($rows); } $sphinx->close(); |
运行之后结果展示
做到这里以经完成了一大半,但还没有做到实时索引,假设数据库表里面的数据增加就没有办法搜索到新增的数据,这里写了一个shell脚本 main.sh
#!/bin/bash /usr/local/coreseek/bin/inderer main --rotate >>/usr/local/coreseek/var/log/main.log |
脚本delta.sh
#!/bin/bash /usr/local/coreseek/bin/inderer delta --rotate >>/usr/local/coreseek/var/log/delta.log |
然后将这两个脚本放在linux定时任务器每一分钟执行一次,代码如下
*/5 * * * * /usr/local/coreseek/init/delta.sh 00 03 * * * /usr/local/coreseek/init/main.sh |
完毕,另外在说一点,sphinx操作的表必须要有主键。
时间: 2024-12-18 00:32:44