sphinx全文检索安装配置和使用

公司项目刚刚导入大量产品数据，然后发现网站的产品搜索很卡，原本是原生sql的like来做模糊搜索，数据量20W的时候还可以接受，但是上百万就很卡了，所以需要做优化。

经过考虑，打算采用全文检索 sphinx + 数据库中间件(atlas/mycat) 的架构来优化.

我的环境：

centos6.5 64位

lnmp1.3一键环境包

CentOS6.4 X64 安装sphinx及sphinx for php扩展

安装前请先确定安装了常用的组件,然后在官方网站下载最新的sphinx,

yum install -y python python-devel

http://sphinxsearch.com/downloads/release/

安装sphinx

tar zxvf sphinx-2.2.10-release.tar.gz
cd sphinx-2.2.10-release
./configure --prefix=/usr/local/sphinx –-with-mysql
make && make install

在make时如果出现undefined reference to libiconv的错,请参考 http://www.lvtao.net/database/sphinx-make-error.html 解决方法
libsphinxclient 安装（PHP模块需要）

cd api/libsphinxclient
./configure –prefix=/usr/local/sphinx
make &&  make install

安装PHP的Sphinx模块
下载地址：http://pecl.php.net/package/sphinx

wget http://pecl.php.net/get/sphinx-1.3.0.tgz
tar zxf sphinx-1.3.3.tgz
cd sphinx-1.3.3
/usr/local/php/bin/phpize
./configure --with-php-config=/usr/local/php/bin/php-config --with-sphinx=/usr/local/sphinx/
make && make install

添加php扩展库查看php.ini位置php --ini

编辑配置vi /usr/local/php/etc/php.ini:$ 跳至文件尾部

extension_dir="/usr/local/php/lib/php/extensions/no-debug-non-zts-20131226/"
[sphinx]
extension=sphinx.so

php -m 或者 phpinfo() 查看是否已经加载扩展

首先我们得在服务器端把索引建立好，以便php通过端口访问获取

复制默认配置文件，重新创建一个配置文件

cp /usr/local/sphinx/etc/sphinx-min.conf.dist /usr/local/sphinx/etc/sphinx.conf

sphinx.conf.dist是完整版默认配置，有很多内容，我这里选择复制的是sphinx-min.conf.dist迷你版，只要满足基本查询需要即可

#
# Minimal Sphinx configuration sample (clean, simple, functional)
#

source src1
{
        type                    = mysql

        sql_host                = localhost
        sql_user                = root
        sql_pass                = root
        sql_db                  = allchips_test
        sql_port                = 3306  # optional, default is 3306

        sql_query               = select * from products

        #sql_attr_uint          = id
        #sql_attr_timestamp     = date_added

        sql_field_string        = product_id
        sql_field_string        = partNo
}

source src2
{
        type                    = mysql

        sql_host                = localhost
        sql_user                = root
        sql_pass                = root
        sql_db                  = allchips_test
        sql_port                = 3306  # optional, default is 3306

        sql_query               = select * from product_prices

}

source src3
{
        type                    = mysql

        sql_host                = localhost
        sql_user                = root
        sql_pass                = root
        sql_db                  = allchips_test
        sql_port                = 3306  # optional, default is 3306

        sql_query               = select * from product_attrs

}

index products
{
        source                  = src1
        path                    = /mnt/data/products
        min_infix_len = 1
        infix_fields = partNo,short_desc

}

index prices
{
        source                  = src2
        path                    = /mnt/data/prices

}

index attrs
{
        source                  = src3
        path                    = /mnt/data/attrs

}

indexer
{
        mem_limit               = 128M
}

searchd
{
        listen                  = 9312
        listen                  = 9306:mysql41
        log                     = /mnt/data/log/searchd.log
        query_log               = /mnt/data/log/query.log
        read_timeout            = 5
        max_children            = 30
        pid_file                = /mnt/data/log/searchd.pid
        seamless_rotate         = 1
        preopen_indexes         = 1
        unlink_old              = 1
        workers                 = threads # for RT to work
        binlog_path             = /mnt/data
}

最下面的indexer和searchd分别是索引创建，和查询命令的配置，基本只要设置好自己想要日志路径即可

重要的上面的部分，source （来源）和 index （索引）

分析一下我的需求，我的产品搜索主要3张表

产品表products, （id,product_id）

产品价格表product_prices,

产品参数表product_attrs

三者以产品表的product_id关联1对多

source src1 对应 index products

source src2 对应 index prices

source src3 对应 index attrs

在source中是可以设置自定义返回的字段的

如上面的
sql_field_string = product_id
sql_field_string = partNo

配置好了之后，创建索引

我在使用 /usr/local/sphinx/bin/indexer -c /usr/local/sphinx/etc/sphinx.conf --all --rotate 命令的的时候，除了no rotate的提示。为了确保生成索引，我还是分开多个源生成

/usr/local/sphinx/bin/indexer -c /usr/local/sphinx/etc/sphinx.conf products

/usr/local/sphinx/bin/indexer -c /usr/local/sphinx/etc/sphinx.conf prices

/usr/local/sphinx/bin/indexer -c /usr/local/sphinx/etc/sphinx.conf attrs

如果没有什么问题一般是这样的。

接下来要用searchd作为sphinx在服务器的守护进程

/usr/local/sphinx/bin/searchd -c /usr/local/sphinx/etc/sphinx.conf(途中的test.conf是以前测试的,使用sphinx.conf即可)

一般如果报错

文件夹不存在，则创建文件夹

如果已经端口进程已经在运行，那么有2种方法停止

1，/usr/local/sphinx/bin/searchd -c /usr/local/sphinx/etc/sphinx.conf --stop

2, netstat -tnl 查看端口9312是否在监听

lsof -i:9312 查看9312端口信息,获得pid

kill {pid}

杀掉进程之后重新执行searchd命令启动

==========

php端

<?php
    //index.php
    phpinfo();die;
    $s = new SphinxClient;
    $s->setServer("127.0.0.1", 9312);

    $s->setMatchMode(SPH_MATCH_PHRASE);
    $s->setMaxQueryTime(30);

    $res1 = $s->query(‘usb‘,‘products‘);
    $res2 = $s->query(‘53e6dde17a667c4b2af1d38ba0a466c4‘,‘prices‘);
    $res3 = $s->query(‘53e6dde17a667c4b2af1d38ba0a466c4‘,‘attrs‘);

    //$res = $s->query(‘开关‘,‘products‘);
    //$res = $s->query(‘products‘);

    $err = $s->GetLastError();
   //var_dump(array_keys($res[‘matches‘]));
   // echo "<br>"."通过获取的ID来读取数据库中的值即可。"."<br>";

    echo ‘<pre>‘;

    $products=!empty($res1[‘matches‘])?$res1[‘matches‘]:"";
    $prices=!empty($res2[‘matches‘])?$res2[‘matches‘]:"";
    $attrs=!empty($res3[‘matches‘])?$res3[‘matches‘]:"";

    print_r($products);
    print_r($prices);
    print_r($attrs);

    if(!empty($err)){
        print_r($err);
    }

    $s->close();

coreseek的官网挂了下载不了，所以暂时不弄中文。以后看有时间在下载个中文词典打进去

这是打印的query返回的matches匹配结果，如果要查看整个query结果，可以看PHP手册http://php.net/manual/zh/sphinxclient.query.php

返回数据结构
键	值说明
"matches"	存储文档ID以及其对应的另一个包含文档权重和属性值的hash表
"total"	此查询在服务器检索所得的匹配文档总数（即服务器端结果集的大小，且与相关设置有关）
"total_found"	（服务器上找到和处理了的）索引中匹配文档的总数
"words"	将查询关键字（关键字已经过大小写转换，取词干和其他处理）映射到一个包含关于关键字的统计数据（“docs”——在多少文档中出现，“hits”——共出现了多少次）的小hash表上。
"error"	searchd报告的错误信息
"warning"	searchd报告的警告信息

================================================================

Atlas听说很多人都在用，安装测试中待续 -

时间： 2024-07-28 18:52:47

sphinx全文检索安装配置和使用

CentOS6.4 X64 安装sphinx及sphinx for php扩展

sphinx全文检索安装配置和使用的相关文章

搭建coreseek(sphinx+mmseg3)详细安装配置+php之sphinx扩展安装+php调用示例

全文搜索引擎coreseek安装配置(sphinx)

sphinx全文检索功能 | windows下测试

Sphinx学习之sphinx的安装篇

webpy+nginx+uwsgi安装配置

【coreseek】安装配置

Coreseek/sphinx全文检索的了解

Windows 2008群集与SQL Server 2008群集安装配置

Sphinx全文检索之PHP使用教程

sphinx全文检索 安装配置和使用

CentOS6.4 X64 安装sphinx及sphinx for php扩展

sphinx全文检索 安装配置和使用的相关文章

sphinx全文检索安装配置和使用

sphinx全文检索安装配置和使用的相关文章