Lucene提供3种高亮功能:highlighter, fast-vector-highlighter or postings-highlighter.
- highlighter: 最基本的、默认的高亮器。需要对查询的_source进行二次Reanalyzed,速度在3种高亮器里最慢,但不需要额外存储index。
- postings-highighlighter: setting中需要配置"index_options" : "offsets",postings优缺点。速度中等,但是在phrase(短语查询) query结合的查询中,会把查询短语的每个词单独高亮显示。
- fast-vector-highligh: setting中需要配置"term_vector" : "with_positions_offsets",速度最快,但是占用存储空间最大。典型空间换速度。
测试
- mapping
curl -XPUT ‘localhost:9200/hl-test‘ -d ‘{
"settings": {
"index": {
"number_of_shards": 2,
"number_of_replicas": 0
}
},
"mappings": {
"tm": {
"properties": {
"content1": {
"type": "string",
"analyzer" : "default",
"store": "yes",
"term_vector" : "with_positions_offsets"
},
"content2": {
"type": "string",
"analyzer" : "default",
"store": "yes",
"index_options" : "offsets"
},
"content3": {
"type": "string",
"store": "yes",
"analyzer" : "default"
},
"content4": {
"type": "string",
"store": "yes",
"index": "not_analyzed"
}
}
}
}
}‘
note
offsets
Store docs, freqs, positions, and the start and end character offsets of each term in the original string. This information is used by the postings >highlighter but is disabled by default.
来源: https://www.elastic.co/guide/en/elasticsearch/guide/current/stopwords-phrases.html#index-options - 测试数据
curl -XPUT ‘http://localhost:9200/hl-test/tm/1‘ -d ‘{
"content1": "In the above case, the content field will be highlighted for each search hit (there will be another element in each search hit, called highlight, which includes the highlighted fields and the highlighted fragments)."
}‘
- Query DSL
{
"query": {
"term": {
"content1": "the"
}
},
"highlight": {
"pre_tags": [
"<tag1>"
],
"post_tags": [
"</tag1>"
],
"fields": {
"content1": {
"type": "fvh",
"fragment_size": 30,
"number_of_fragments": 1,
"force_source": true,
"order": "score",
"fragment_offset": 3,
"no_match_size": 2
},
"content2": {
"fragment_size": 250,
"number_of_fragments": 0
},
"content3": {
"fragment_size": 250,
"number_of_fragments": 3,
"force_source": true
}
}
}
}
时间: 2024-10-29 19:08:02