插入测试数据
PUT /forum/article/_bulk { "index": { "_id": 1 }} { "articleID" : "XHDK-A-1293-#fJ3", "userID" : 1, "hidden": false, "postDate": "2017-01-01" } { "index": { "_id": 2 }} { "articleID" : "KDKE-B-9947-#kL5", "userID" : 1, "hidden": false, "postDate": "2017-01-02" } { "index": { "_id": 3 }} { "articleID" : "JODL-X-1937-#pV7", "userID" : 2, "hidden": false, "postDate": "2017-01-01" } { "index": { "_id": 4 }} { "articleID" : "QQPX-R-3956-#aD8", "userID" : 2, "hidden": true, "postDate": "2017-01-02" }
查看生成的mapping:
GET /forum/_mapping/article
结果(articleID除了显示type外,还有一个fields显示):
type=text,默认会设置两个field,一个是field本身,比如articleID就是分词的;还有一个就是field.keyword(这里是articleID.keyword),这个字段默认是不分词的,并且最多保留256字符
{ "forum": { "mappings": { "article": { "properties": { "articleID": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "hidden": { "type": "boolean" }, "postDate": { "type": "date" }, "userID": { "type": "long" } } } } } }
查询id为2的精确匹配
GET /forum/article/_search { "query": { "constant_score": { "filter": { "term": { "userID": "1" } }, "boost": 1.2 } } }
constant_score:返回确切的得分
query+constant_score+filter+term:查找
term和terms的区别:terms是term的复数形式,用法 "terms": {"userID": ["1","2"]},term精确匹配一个,而terms是精确匹配多个值。
查询articleID
GET /forum/article/_search { "query": { "constant_score": { "filter": { "term": { "articleID":"XHDK-A-1293-#fJ3" } }, "boost": 1.2 } } }
引用语句:结果为空。因为articleID.keyword,是ES最新版本内置建立的field,就是不分词的。所以一个articleID过来的时候,会建立两次索引。一次是自己本身(articleID),是要分词的,分词后放入倒排索引;另一次是基于articleID.keyword,不分词,最多保留256字符,直接一个完整的字符串放入倒排索引中。
所以term filter,对text过滤,可以考虑使用内置的field.keyword来进行匹配。但是有个问题,默认就保留256字符,所以尽可能还是自己去手动建立索引,指定not_analyzed吧,在最新版本的es中,不需要指定not_analyzed也可以,将type=keyword即可。
自己的理解:term是精确查找,去找XHDK-A-1293-#fJ3。
问题是创建索引的时候,默认对text进行分词后简历索引。所以查询不到。
但是keyword是未被分词后索引,索引这种查找能查询出来。
解决方法:
GET /forum/article/_search { "query": { "constant_score": { "filter": { "term": { "articleID.keyword":"XHDK-A-1293-#fJ3" } }, "boost": 1.2 } } }
更深度的理解,分词后的索引:
GET /forum/_analyze { "field": "articleID", "text": "XHDK-A-1293-#fJ3" }
结果:
{ "tokens": [ { "token": "xhdk", "start_offset": 0, "end_offset": 4, "type": "<ALPHANUM>", "position": 0 }, { "token": "a", "start_offset": 5, "end_offset": 6, "type": "<ALPHANUM>", "position": 1 }, { "token": "1293", "start_offset": 7, "end_offset": 11, "type": "<NUM>", "position": 2 }, { "token": "fj3", "start_offset": 13, "end_offset": 16, "type": "<ALPHANUM>", "position": 3 } ] }
参考文献:https://www.jianshu.com/p/e1430282378d
原文地址:https://www.cnblogs.com/parent-absent-son/p/11063765.html