Solr -- 实时搜索

在solr中，实时搜索有3种方案

①soft commit，这其实是近实时搜索，不能完全实时。

②RealTimeGet，这是实时，但只支持根据文档ID的查询。

③和第一种类似，只是触发softcommit。

综上，其实是由实时（②）和近实时（①③）两种。

solr4.0 之后使用NRT的方法和需要的配置

方案1

使用soft commit达到近实时搜索的效果。

为了使用soft commit ，需要配置solrconfig.xml。其中两个地方需要修改

<autoCommit>
      <maxDocs>10000</maxDocs> <!-- maximum uncommited docs before autocommit triggered -->
      <maxTime>15000</maxTime> <!-- maximum time (in MS) after adding a doc before an autocommit
is triggered -->
      <openSearcher>false</openSearcher> <!-- SOLR 4.0.  Optionally don‘t open a searcher on
 hard commit.  This is useful to minimize the size of transaction logs that keep track of
uncommitted updates. -->
</autoCommit>

这里需要将hard commit 的 openSearcher改为true。Hard commit时间根据自己系统承载能力和需要设置。因为hard commit动作较大，对性能有较大影响。原则稍长较好，但又不能太长，以免突然断电导致大量数据丢失（hard commit前数据都在memery中）。

<!-- SoftAutoCommit
         Perform a ‘soft‘ commit automatically under certain conditions.
         This commit avoids ensuring that data is synched to disk.
         maxDocs - Maximum number of documents to add since the last
                   soft commit before automaticly triggering a new soft commit.
         maxTime - Maximum amount of time in ms that is allowed to pass
                   since a document was added before automaticly
                   triggering a new soft commit.
      -->
     <autoSoftCommit>
       <maxTime>2000</maxTime>
     </autoSoftCommit>

将soft commit 打开（默认配置注释了该节点），这里的时间是你希望在几秒内搜到，此处我的设置为2s。可根据需要设置，值越小NRT效果越好，相反的，会带来性能上的影响。如果索引请求量不是特别大，则可以将值设小点，比如1000.不建议小于1000，小于1000并没有意义。

设置a，b之后就可通过普通的SearchHandler 搜到到了。Solr 默认配置的SearchHandler REST接口有“/select”“/query”“/browse”。

值得注意的是：当索引请求量巨大时，solr并不一定能保证在你设置的时间内能立马搜索到最新的文档，通常会有秒级别的延迟。

方案 2

需要配置solrconfig.xml。其中两个地方需要修改

<autoCommit>
      <maxDocs>10000</maxDocs> <!-- maximum uncommited docs before autocommit triggered -->
      <maxTime>15000</maxTime> <!-- maximum time (in MS) after adding a doc before an autocommit
is triggered -->
      <openSearcher>false</openSearcher> <!-- SOLR 4.0.  Optionally don‘t open a searcher on
 hard commit.  This is useful to minimize the size of transaction logs that keep track of
uncommitted updates. -->
</autoCommit>

此外还要配置solrconfig.xml的RealTimeGetHandler。根据solr的文档。该搜索Handler的接口为“/get”。该接口使用了一个特定的组件RealTimeGetComponent，该接口会通用solrCore的 getRealtimeSearcher()方法，后者会先搜索一下updateLog，再做普通搜索。

方案 3

第3种使用NRT的方法依然需要配置NRT1中的a项。这次使用普通的SearchHandler来实现NRT。利用solr的commit和 commitwithin。实现方式是每次索引文档后都明确的发送一个commit或者commitwithin命令。这样也可以马上搜索刚索引的数据。由于发送命令需要走网络，时间上有不确定性，总体速度也不如NRT1。这里commit为hard commit请求，方法为commit=true；commitwithin为softcommit请求，方法为commitwithin=2000. 从前所述可以看出同样是commit，是用commitwithin将能更快搜到新文档，此处表示2s内要完成softcommit。该方法灵活性较高，适合在一些特殊情况下使用。

综上，虽然我们可以通过不同手段（包括变相的手段NRT3）来实现NRT。但NRT1中的配置softcommit的方式才是最佳选择，这也是其存在的价值。但是在一些特殊的应用场景可以根据需要使用NRT3。比如，索引频繁而搜索量很小。

时间： 2024-10-25 21:44:59

Solr -- 实时搜索

方案 2

方案 3

Solr -- 实时搜索的相关文章

solr 近实时搜索

剖析Elasticsearch集群系列之三：近实时搜索、深层分页问题和搜索相关性权衡之道

solr中文搜索倒排索引和数据存储结构

使用js实现前端内容实时搜索

SOLR企业搜索平台一 (搭建SOLR)

jquery 表格排序，实时搜索表格内容

lucene4之后的近实时搜索实现

开源分布式搜索平台ELK+Redis+Syslog-ng实现日志实时搜索

实时搜索系统