Solr特性:Schemaless Mode(自动往Schema中添加field)

WiKi:https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode

介绍:

Schemaless Mode is a set of Solr features that, when used together, allow users to rapidly construct an effective schema by simply indexing sample data, without having to manually edit the schema. These Solr features, all specified in solrconfig.xml, are:

  1. Managed schema: Schema modifications are made through Solr APIs rather than manual edits - see Managed Schema Definition in SolrConfig.
  2. Field value class guessing: Previously unseen fields are run through a cascading set of value-based parsers, which guess the Java class of field values - parsers for Boolean, Integer, Long, Float, Double, and Date are currently available.
  3. Automatic schema field addition, based on field value class(es): Previously unseen fields are added to the schema, based on field value Java classes, which are mapped to schema field types - see Solr Field Types

配置:

1.Enable Managed Schema

As described in the section Managed Schema Definition in SolrConfig, changing the schemaFactory will allow the schema to be modified by the Schema API. Your solrconfig.xml should have a section like the one below (and the ClassicIndexSchemaFactory should be commented out or removed).

<schemaFactory class="ManagedIndexSchemaFactory">

  <bool name="mutable">true</bool>

  <str name="managedSchemaResourceName">managed-schema</str>

</schemaFactory>

2.Define an UpdateRequestProcessorChain

The UpdateRequestProcessorChain allows Solr to guess field types, and you can define the default field type classes to use. To start, you should define it as follows (see the javadoc links below for update processor factory documentation):

<updateRequestProcessorChain name="add-unknown-fields-to-the-schema">

  <!-- UUIDUpdateProcessorFactory will generate an id if none is present in the incoming document -->

  <processor class="solr.UUIDUpdateProcessorFactory" />

  <processor class="solr.LogUpdateProcessorFactory"/>

  <processor class="solr.DistributedUpdateProcessorFactory"/>

  <processor class="solr.RemoveBlankFieldUpdateProcessorFactory"/>

  <processor class="solr.FieldNameMutatingUpdateProcessorFactory">

    <str name="pattern">[^\w-\.]</str>

    <str name="replacement">_</str>

  </processor>

  <processor class="solr.ParseBooleanFieldUpdateProcessorFactory"/>

  <processor class="solr.ParseLongFieldUpdateProcessorFactory"/>

  <processor class="solr.ParseDoubleFieldUpdateProcessorFactory"/>

  <processor class="solr.ParseDateFieldUpdateProcessorFactory">

    <arr name="format">

      <str>yyyy-MM-dd‘T‘HH:mm:ss.SSSZ</str>

      <str>yyyy-MM-dd‘T‘HH:mm:ss,SSSZ</str>

      <str>yyyy-MM-dd‘T‘HH:mm:ss.SSS</str>

      <str>yyyy-MM-dd‘T‘HH:mm:ss,SSS</str>

      <str>yyyy-MM-dd‘T‘HH:mm:ssZ</str>

      <str>yyyy-MM-dd‘T‘HH:mm:ss</str>

      <str>yyyy-MM-dd‘T‘HH:mmZ</str>

      <str>yyyy-MM-dd‘T‘HH:mm</str>

      <str>yyyy-MM-dd HH:mm:ss.SSSZ</str>

      <str>yyyy-MM-dd HH:mm:ss,SSSZ</str>

      <str>yyyy-MM-dd HH:mm:ss.SSS</str>

      <str>yyyy-MM-dd HH:mm:ss,SSS</str>

      <str>yyyy-MM-dd HH:mm:ssZ</str>

      <str>yyyy-MM-dd HH:mm:ss</str>

      <str>yyyy-MM-dd HH:mmZ</str>

      <str>yyyy-MM-dd HH:mm</str>

      <str>yyyy-MM-dd</str>

    </arr>

  </processor>

  <processor class="solr.AddSchemaFieldsUpdateProcessorFactory">

    <str name="defaultFieldType">strings</str>

    <lst name="typeMapping">

      <str name="valueClass">java.lang.Boolean</str>

      <str name="fieldType">booleans</str>

    </lst>

    <lst name="typeMapping">

      <str name="valueClass">java.util.Date</str>

      <str name="fieldType">tdates</str>

    </lst>

    <lst name="typeMapping">

      <str name="valueClass">java.lang.Long</str>

      <str name="valueClass">java.lang.Integer</str>

      <str name="fieldType">tlongs</str>

    </lst>

    <lst name="typeMapping">

      <str name="valueClass">java.lang.Number</str>

      <str name="fieldType">tdoubles</str>

    </lst>

  </processor>

  <processor class="solr.RunUpdateProcessorFactory"/>

</updateRequestProcessorChain>

3.Make the UpdateRequestProcessorChain the Default for the UpdateRequestHandler

Once the UpdateRequestProcessorChain has been defined, you must instruct your UpdateRequestHandlers to use it when working with index updates (i.e., adding, removing, replacing documents). Here is an example using InitParams to set the defaults on all /updaterequest handlers:

<initParams path="/update/**">

  <lst name="defaults">

    <str name="update.chain">add-unknown-fields-to-the-schema</str>

  </lst>

</initParams>

时间: 2024-10-27 00:08:42

Solr特性:Schemaless Mode(自动往Schema中添加field)的相关文章

treeview自动从表中添加标题和列值做目录的方法2

treeview自动从表中添加标题和列值做目录的方法2,该方法是借鉴万一老师的 http://www.cnblogs.com/del/archive/2008/05/15/1114450.html 首先界面上添加treeview组件,然后在treeview的onchange事件里这样写: 因为要用到定义个过程,需要在接口声明里引用 private { Private declarations } /// <summary> /// 刷新左侧treeView /// </summary&g

Solr中的Field、CopyField、DynamicField

Field: Field就是一个字段,定义一个Field很简单: Xml代码   <span style="font-size: 20px;"><field name="tile" type="string" indexed="true" stored="true"/></span> Field的属性也和FieldType类似,他的属性会覆盖掉FieldType的同名属性

如何在solr中添加ik分词器

分词技术是搜索技术里面的一块基石.很多人用过,如果你只是为了简单快速地搭一个搜索引擎,你确实不用了解太深.但一旦涉及效果问题,分词器上就可以做很多文章.例如, 在我们实际用作电商领域的搜索的工作中,类目预判的实现就极须依赖分词,至少需要做到可以对分词器动态加规则.再一个简单的例子,如果你的优化方法就是对不同的词分权重,提高一些重点词的权重的话,你就需要依赖并理解分词器. 下面将介绍如何在solr中为core:simple的title添加分词,承接上一篇博文(http://simplelife.b

schema中字段类型的定义

当schema中字段类型为String时,保存的时候如果该字段为Number也可以保存成功,mongoose会自动将其转换为数字字符串. 当schema中字段类型为Number时,保存的时候如果该字段如果是String类型,只要能转换为数字格式的字符串,也能保存成功,比如"20.17",否则会报错 在实例化模型的时候,如果传入的字段值的类型和schema中定义的不一致(上面说的可以自动转换类型的不算),那么在实例化生成的文档对象中不包含该属性.如: var schema = Schem

schema中的虚拟属性方法

schema中的虚拟属性方法相当于vue中的计算属性,它是通过已定义的schema属性的计算\组合\拼接得到的新的值 var personSchema = new Schema({ name: { first: String, last: String } }); var Person = mongoose.model('Person', personSchema); // create a document var bad = new Person({ name: { first: 'Walt

Centos开机自动挂载windows中的ntfs磁盘

装了windows和centos双系统后,发现在centos中无法进入windows中的磁盘,更不要说查看磁盘里的文件了! 原来默认情况下,centos不支持Widows NTFS硬盘分区读写,要想把NTFS格式的磁盘挂载到CentOS 下面需要安装第三方软件ntfs-3g.那么如何实现挂载并开机自动挂载呢? 一.下载ntfs-3g 源码包进行编译安装 下载地址:http://down1.chinaunix.net/distfiles/ntfs-3g-1.2918.tgz 1.安装编译工具 yu

unreal3对象属性自动从配置文件中加载的机制

unrealscript中有两个与属性自动配置相关的关键字: config/globalconfig 当把它们应用于属性时,对象在创建后,该属性的初始值会被自动设置为相对应ini文件中的值. 举例来说,如有一个类: class HNet extends Object config(game) native(net); //var globalconfig string host;var config string host; function test() { `Log("HNet test,

MyEclipse 中 添加 js自动完成模版

MyEclipse 中 添加 js自动完成模版: window>preference>MyEclipse>Files and Editors>JavaScript>Editor>Templates|New 进行添加.在编写的过程中可以进行变量的添加: 例如:Name: clg Pattern: console.log(${});${cursor} 这样在()中输入之后,再回车就到行尾了: 还有其他变量可以进行了解:

struts2新增json返回类型,自动将action中的的成员变量转换成json字符串

做了一个小测试 struts2,spring,mybatis的框架,所需jar包如下: 新增result type:json JsonResult.java package com.test.xiaobc.login.server.util; import java.beans.BeanInfo; import java.beans.Introspector; import java.beans.PropertyDescriptor; import java.io.IOException; im