elasticsearch 自定义分词器
安装拼音分词器、ik分词器
拼音分词器: https://github.com/medcl/elasticsearch-analysis-pinyin/releases
ik分词器:https://github.com/medcl/elasticsearch-analysis-ik/releases
下载源码需要使用maven打包
下载构建好的压缩包解压后放直接在elasticsearch安装目录下 plugins文件夹下,可以重命名
1.在es中设置分词
创建索引,添加setting属性
PUT myindex { "settings": { "index":{ "analysis":{ "analyzer":{ "ik_pinyin_analyzer":{ "type":"custom", "tokenizer":"ik_smart", "filter":"pinyin_filter" } }, "filter":{ "pinyin_filter":{ "type":"pinyin", "keep_separate_first_letter" : false, "keep_full_pinyin" : true, "keep_original" : false, "limit_first_letter_length" : 10, "lowercase" : true, "remove_duplicated_term" : true } } } } } }
添加属性 设置mapping属性
PUT myindex/_mapping/users { "properties": { "uname":{ "type": "text", "analyzer": "ik_smart", "search_analyzer": "ik_smart", "fields": { "my_pinyin":{ "type": "text" , "analyzer": "ik_pinyin_analyzer", "search_analyzer": "ik_pinyin_analyzer" } } }, "age":{ "type": "integer" } } }
2.spring data elasticsearch设置分词
创建实体类
@Mapping(mappingPath = "elasticsearch_mapping.json")//设置mapping@Setting(settingPath = "elasticsearch_setting.json")//设置setting@Document(indexName = "myindex",type = "users")public class User { @Id private Integer id;//// @Field(type =FieldType.keyword ,analyzer = "pinyin_analyzer",searchAnalyzer = "pinyin_analyzer")//没有作用 private String name1; @Field(type = FieldType.keyword) private String userName; @Field(type = FieldType.Nested) private List<Product> products; }
在resources下创建elasticsearch_mapping.json 文件
{ "properties": { "uname": { "type": "text", "analyzer": "ik_smart", "search_analyzer": "ik_smart", "fields": { "my_pinyin": { "type": "text", "analyzer": "ik_pinyin_analyzer", "search_analyzer": "ik_pinyin_analyzer" } } }, "age": { "type": "integer" } } }
在resources下创建elasticsearch_setting.json 文件
{ "index": { "analysis": { "analyzer": { "ik_pinyin_analyzer": { "type": "custom", "tokenizer": "ik_smart", "filter": "pinyin_filter" } }, "filter": { "pinyin_filter": { "type": "pinyin", //true:支持首字母 "keep_first_letter":true, //false:不支持首字母分隔 "keep_separate_first_letter": false, //true:支持全拼 "keep_full_pinyin": true, "keep_original": false, //设置最大长度 "limit_first_letter_length": 10, //小写非中文字母 "lowercase": true, //重复的项将被删除 "remove_duplicated_term": true } } } }}
- ik_max_word:会将文本做最细粒度的拆分,例如「中华人民共和国国歌」会被拆分为「中华人民共和国、中华人民、中华、华人、人民共和国、人民、人、民、共和国、共和、和、国国、国歌」,会穷尽各种可能的组合;
- ik_smart:会将文本做最粗粒度的拆分,例如「中华人民共和国国歌」会被拆分为「中华人民共和国、国歌」;
程序启动后分词并没有设置分词
实体创建后需要加上,创建的索引才可以分词
elasticsearchTemplate.putMapping(User.class);
原文地址:https://www.cnblogs.com/double-yuan/p/9742567.html
时间: 2024-12-23 23:25:17