elasticsearch补全功能之只补全筛选后的部分数据context suggester

官方文档https://www.elastic.co/guide/en/elasticsearch/reference/5.0/suggester-context.html

  下面所有演示基于elasticsearch5.x和Python3.x

  最近项目使用elasticsearch的补全功能时,需要对于所有文章(article)的作者名字(author)的搜索做补全,文章的mapping大致如下

ARTICLE = {
    ‘properties‘: {
        ‘id‘: {
            ‘type‘: ‘integer‘,
            ‘index‘: ‘not_analyzed‘,
        },
        ‘author‘: {
            ‘type‘: ‘text‘,
        },
        ‘author_completion‘: {
            ‘type‘: ‘completion‘,
        },
        ‘removed‘: {
            ‘type‘: ‘boolean‘,
        }
    }
}

MAPPINGS = {
    ‘mappings‘: {
        ‘article‘: ARTICLE,
    }
}

  现在的需求是,针对于下架状态removed为True的不做补全提示。

  作为演示先插入部分数据,代码如下

  

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from elasticsearch.helpers import bulk
from elasticsearch import Elasticsearch

ES_HOSTS = [{‘host‘: ‘localhost‘, ‘port‘: 9200}, ]

ES = Elasticsearch(hosts=ES_HOSTS)

INDEX = ‘test_article‘
TYPE = ‘article‘

ARTICLE = {
    ‘properties‘: {
        ‘id‘: {
            ‘type‘: ‘integer‘,
            ‘index‘: ‘not_analyzed‘,
        },
        ‘author‘: {
            ‘type‘: ‘text‘,
        },
        ‘author_completion‘: {
            ‘type‘: ‘completion‘,
        },
        ‘removed‘: {
            ‘type‘: ‘boolean‘,
        }
    }
}

MAPPINGS = {
    ‘mappings‘: {
        ‘article‘: ARTICLE,
    }
}

def create_index():
    """
    插入数据前创建对应的index
    """
    ES.indices.delete(index=INDEX, ignore=404)
    ES.indices.create(index=INDEX, body=MAPPINGS)

def insert_data():
    """
    添加测试数据
    :return:
    """
    test_datas = [
        {
            ‘id‘: 1,
            ‘author‘: ‘tom‘,
            ‘author_completion‘: ‘tom‘,
            ‘removed‘: False
        },
        {
            ‘id‘: 2,
            ‘author‘: ‘tom_cat‘,
            ‘author_completion‘: ‘tom_cat‘,
            ‘removed‘: True
        },
        {
            ‘id‘: 3,
            ‘author‘: ‘kitty‘,
            ‘author_completion‘: ‘kitty‘,
            ‘removed‘: False
        },
        {
            ‘id‘: 4,
            ‘author‘: ‘tomato‘,
            ‘author_completion‘: ‘tomato‘,
            ‘removed‘: False
        },
    ]
    bulk_data = []
    for data in test_datas:
        action = {
            ‘_index‘: INDEX,
            ‘_type‘: TYPE,
            ‘_id‘: data.get(‘id‘),
            ‘_source‘: data
        }
        bulk_data.append(action)

    success, failed = bulk(client=ES, actions=bulk_data, stats_only=True)

    print(‘success‘, success, ‘failed‘, failed)

if __name__ == ‘__main__‘:
    create_index()
    insert_data()

  成功插入4条测试数据,下面测试获取作者名称补全建议,代码如下

def get_suggestions(keywords):
    body = {
        # ‘size‘: 0,  # 这里是不返回相关搜索结果的字段,如author,id等,作为测试这里返回
        ‘_source‘: ‘suggest‘,
        ‘suggest‘: {
            ‘author_prefix_suggest‘: {
                ‘prefix‘: keywords,
                ‘completion‘: {
                    ‘field‘: ‘author_completion‘,
                    ‘size‘: 10,
                }
            }
        },
        # 对于下架数据,我单纯的以为加上下面的筛选就行了
        ‘query‘: {
            ‘term‘: {
                ‘removed‘: False
            }
        }
    }
    suggest_data = ES.search(index=INDEX, doc_type=TYPE, body=body)
    return suggest_data

if __name__ == ‘__main__‘:
    # create_index()
    # insert_data()

    suggestions = get_suggestions(‘t‘)
    print(suggestions)
    """
    suggestions = {
        ‘took‘: 0,
        ‘timed_out‘: False,
        ‘_shards‘: {
            ‘total‘: 5,
            ‘successful‘: 5,
            ‘skipped‘: 0,
            ‘failed‘: 0
        },
        ‘hits‘: {
            ‘total‘: 3,
            ‘max_score‘: 0.6931472,
            ‘hits‘: [
                {‘_index‘: ‘test_article‘, ‘_type‘: ‘article‘, ‘_id‘: ‘4‘, ‘_score‘: 0.6931472,
                 ‘_source‘: {}},
                {‘_index‘: ‘test_article‘, ‘_type‘: ‘article‘, ‘_id‘: ‘1‘, ‘_score‘: 0.2876821,
                 ‘_source‘: {}},
                {‘_index‘: ‘test_article‘, ‘_type‘: ‘article‘, ‘_id‘: ‘3‘, ‘_score‘: 0.2876821,
                 ‘_source‘: {}}]},
        ‘suggest‘: {
            ‘author_prefix_suggest‘: [{‘text‘: ‘t‘, ‘offset‘: 0, ‘length‘: 1, ‘options‘: [
                {‘text‘: ‘tom‘, ‘_index‘: ‘test_article‘, ‘_type‘: ‘article‘, ‘_id‘: ‘1‘, ‘_score‘: 1.0,
                 ‘_source‘: {}},
                {‘text‘: ‘tom_cat‘, ‘_index‘: ‘test_article‘, ‘_type‘: ‘article‘, ‘_id‘: ‘2‘, ‘_score‘: 1.0,
                 ‘_source‘: {}},
                {‘text‘: ‘tomato‘, ‘_index‘: ‘test_article‘, ‘_type‘: ‘article‘, ‘_id‘: ‘4‘, ‘_score‘: 1.0,
                 ‘_source‘: {}}]}]
        }
    }
    """

  发现,removed为True的tom_cat赫然在列,明明加了

‘query‘: {
            ‘term‘: {
                ‘removed‘: False
            }
        }

  却没有起作用,难道elasticsearch不支持这种需求!?怎么可能……

  查阅文档发现解决方法为https://www.elastic.co/guide/en/elasticsearch/reference/5.0/suggester-context.html

  找到问题所在,首先改造mapping,并重新录入测试数据如下

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from elasticsearch.helpers import bulk
from elasticsearch import Elasticsearch

ES_HOSTS = [{‘host‘: ‘localhost‘, ‘port‘: 9200}, ]

ES = Elasticsearch(hosts=ES_HOSTS)

INDEX = ‘test_article‘
TYPE = ‘article‘

ARTICLE = {
    ‘properties‘: {
        ‘id‘: {
            ‘type‘: ‘integer‘,
            ‘index‘: ‘not_analyzed‘
        },
        ‘author‘: {
            ‘type‘: ‘text‘,
        },
        ‘author_completion‘: {
            ‘type‘: ‘completion‘,
            ‘contexts‘: [  # 这里是关键所在
                {
                    ‘name‘: ‘removed_tab‘,
                    ‘type‘: ‘category‘,
                    ‘path‘: ‘removed‘
                }
            ]
        },
        ‘removed‘: {
            ‘type‘: ‘boolean‘,
        }
    }
}

MAPPINGS = {
    ‘mappings‘: {
        ‘article‘: ARTICLE,
    }
}

def create_index():
    """
    插入数据前创建对应的index
    """
    ES.indices.delete(index=INDEX, ignore=404)
    ES.indices.create(index=INDEX, body=MAPPINGS)

def insert_data():
    """
    添加测试数据
    :return:
    """
    test_datas = [
        {
            ‘id‘: 1,
            ‘author‘: ‘tom‘,
            ‘author_completion‘: ‘tom‘,
            ‘removed‘: False
        },
        {
            ‘id‘: 2,
            ‘author‘: ‘tom_cat‘,
            ‘author_completion‘: ‘tom_cat‘,
            ‘removed‘: True
        },
        {
            ‘id‘: 3,
            ‘author‘: ‘kitty‘,
            ‘author_completion‘: ‘kitty‘,
            ‘removed‘: False
        },
        {
            ‘id‘: 4,
            ‘author‘: ‘tomato‘,
            ‘author_completion‘: ‘tomato‘,
            ‘removed‘: False
        },
    ]
    bulk_data = []
    for data in test_datas:
        action = {
            ‘_index‘: INDEX,
            ‘_type‘: TYPE,
            ‘_id‘: data.get(‘id‘),
            ‘_source‘: data
        }
        bulk_data.append(action)

    success, failed = bulk(client=ES, actions=bulk_data, stats_only=True)

    print(‘success‘, success, ‘failed‘, failed)

if __name__ == ‘__main__‘:
    create_index()
    insert_data()

  Duang!意想不到的问题出现了

elasticsearch.helpers.BulkIndexError: (‘4 document(s) failed to index.‘, [{‘index‘: {‘_index‘: ‘test_article‘, ‘_type‘: ‘article‘, ‘_id‘: ‘1‘, ‘status‘: 400, ‘error‘: {‘type‘: ‘illegal_argument_exception‘, ‘reason‘: ‘Failed to parse context field [removed], only keyword and text fields are accepted‘}, ‘data‘: {‘id‘: 1, ‘author‘: ‘tom‘, ‘author_completion‘: ‘tom‘, ‘removed‘: False}}}, {‘index‘: {‘_index‘: ‘test_article‘, ‘_type‘: ‘article‘, ‘_id‘: ‘2‘, ‘status‘: 400, ‘error‘: {‘type‘: ‘illegal_argument_exception‘, ‘reason‘: ‘Failed to parse context field [removed], only keyword and text fields are accepted‘}, ‘data‘: {‘id‘: 2, ‘author‘: ‘tom_cat‘, ‘author_completion‘: ‘tom_cat‘, ‘removed‘: True}}}, {‘index‘: {‘_index‘: ‘test_article‘, ‘_type‘: ‘article‘, ‘_id‘: ‘3‘, ‘status‘: 400, ‘error‘: {‘type‘: ‘illegal_argument_exception‘, ‘reason‘: ‘Failed to parse context field [removed], only keyword and text fields are accepted‘}, ‘data‘: {‘id‘: 3, ‘author‘: ‘kitty‘, ‘author_completion‘: ‘kitty‘, ‘removed‘: False}}}, {‘index‘: {‘_index‘: ‘test_article‘, ‘_type‘: ‘article‘, ‘_id‘: ‘4‘, ‘status‘: 400, ‘error‘: {‘type‘: ‘illegal_argument_exception‘, ‘reason‘: ‘Failed to parse context field [removed], only keyword and text fields are accepted‘}, ‘data‘: {‘id‘: 4, ‘author‘: ‘tomato‘, ‘author_completion‘: ‘tomato‘, ‘removed‘: False}}}])

  意思是context只支持keyword和text类型,而上面removed类型为boolean,好吧,再改造mapping,将mapping的removed改为keyword类型……

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from elasticsearch.helpers import bulk
from elasticsearch import Elasticsearch

ES_HOSTS = [{‘host‘: ‘localhost‘, ‘port‘: 9200}, ]

ES = Elasticsearch(hosts=ES_HOSTS)

INDEX = ‘test_article‘
TYPE = ‘article‘

ARTICLE = {
    ‘properties‘: {
        ‘id‘: {
            ‘type‘: ‘integer‘,
            ‘index‘: ‘not_analyzed‘
        },
        ‘author‘: {
            ‘type‘: ‘text‘,
        },
        ‘author_completion‘: {
            ‘type‘: ‘completion‘,
            ‘contexts‘: [  # 这里是关键所在
                {
                    ‘name‘: ‘removed_tab‘,
                    ‘type‘: ‘category‘,
                    ‘path‘: ‘removed‘
                }
            ]
        },
        ‘removed‘: {
            ‘type‘: ‘keyword‘,
        }
    }
}

MAPPINGS = {
    ‘mappings‘: {
        ‘article‘: ARTICLE,
    }
}

def create_index():
    """
    插入数据前创建对应的index
    """
    ES.indices.delete(index=INDEX, ignore=404)
    ES.indices.create(index=INDEX, body=MAPPINGS)

def insert_data():
    """
    添加测试数据
    :return:
    """
    test_datas = [
        {
            ‘id‘: 1,
            ‘author‘: ‘tom‘,
            ‘author_completion‘: ‘tom‘,
            ‘removed‘: ‘False‘
        },
        {
            ‘id‘: 2,
            ‘author‘: ‘tom_cat‘,
            ‘author_completion‘: ‘tom_cat‘,
            ‘removed‘: ‘True‘
        },
        {
            ‘id‘: 3,
            ‘author‘: ‘kitty‘,
            ‘author_completion‘: ‘kitty‘,
            ‘removed‘: ‘False‘
        },
        {
            ‘id‘: 4,
            ‘author‘: ‘tomato‘,
            ‘author_completion‘: ‘tomato‘,
            ‘removed‘: ‘False‘
        },
    ]
    bulk_data = []
    for data in test_datas:
        action = {
            ‘_index‘: INDEX,
            ‘_type‘: TYPE,
            ‘_id‘: data.get(‘id‘),
            ‘_source‘: data
        }
        bulk_data.append(action)

    success, failed = bulk(client=ES, actions=bulk_data, stats_only=True)

    print(‘success‘, success, ‘failed‘, failed)

if __name__ == ‘__main__‘:
    create_index()
    insert_data()

  mission success。看看表结构ok

接下来就是获取补全建议

def get_suggestions(keywords):
    body = {
        ‘size‘: 0,
        ‘_source‘: ‘suggest‘,
        ‘suggest‘: {
            ‘author_prefix_suggest‘: {
                ‘prefix‘: keywords,
                ‘completion‘: {
                    ‘field‘: ‘author_completion‘,
                    ‘size‘: 10,
                    ‘contexts‘: {
                        ‘removed_tab‘: [‘False‘, ]  # 筛选removed为‘False‘的补全
                    }
                }
            }
        },
    }
    suggest_data = ES.search(index=INDEX, doc_type=TYPE, body=body)
    return suggest_data

if __name__ == ‘__main__‘:
    # create_index()
    # insert_data()
    suggestions = get_suggestions(‘t‘)
    print(suggestions)

    """
    suggestions = {
        ‘took‘: 0,
        ‘timed_out‘: False,
        ‘_shards‘: {
            ‘total‘: 5,
            ‘successful‘: 5,
            ‘skipped‘: 0, ‘failed‘: 0
        },
        ‘hits‘: {
            ‘total‘: 0,
            ‘max_score‘: 0.0,
            ‘hits‘: []
        },
        ‘suggest‘: {
            ‘author_prefix_suggest‘: [
                {‘text‘: ‘t‘, ‘offset‘: 0, ‘length‘: 1, ‘options‘: [
                    {‘text‘: ‘tom‘, ‘_index‘: ‘test_article‘, ‘_type‘: ‘article‘, ‘_id‘: ‘1‘, ‘_score‘: 1.0,
                     ‘_source‘: {},
                     ‘contexts‘: {‘removed_tab‘: [‘False‘]}},
                    {‘text‘: ‘tomato‘, ‘_index‘: ‘test_article‘, ‘_type‘: ‘article‘, ‘_id‘: ‘4‘, ‘_score‘: 1.0,
                     ‘_source‘: {},
                     ‘contexts‘: {‘removed_tab‘: [‘False‘]}}]}]}}
        """

  发现,removed为‘True‘的tom_cat被筛选掉了,大功告成!

原文地址:https://www.cnblogs.com/ALXPS/p/10238312.html

时间: 2024-10-29 17:37:06

elasticsearch补全功能之只补全筛选后的部分数据context suggester的相关文章

转:Eclipse自动补全功能轻松设置

Eclipse自动补全功能轻松设置 || 不需要修改编辑任何文件 2012-03-08 21:29:02|  分类: Java |  标签:eclipse  自动补全  设置  |举报|字号 订阅 下载LOFTER我的照片书  | 本文介绍如何设置Eclipse代码自动补全功能.轻松实现输入任意字母均可出现代码补全提示框. Eclipse代码自动补全功能默认只包括 点"."  ,即只有输入”."后才出现自动补全的提示框.想要自动补全总是去按 “Alt + / ”也很麻烦. 其

Eclipse自动补全功能轻松设置 || 不需要修改编辑任何文件

本文介绍如何设置Eclipse代码自动补全功能.轻松实现输入任意字母均可出现代码补全提示框. Eclipse代码自动补全功能默认只包括 点"."  ,即只有输入”."后才出现自动补全的提示框.想要自动补全总是去按 “Alt + / ”也很麻烦. 其实只需简单在Eclipse中进行设置即可实现输入任意及符合自动出现自动补全提示框.    具体设置步骤如下: 选择Eclipse菜单条中的Windows菜单下的Preferences项 在左侧找到“Java” => “Edit

vim基础学习之自动补全功能

本章我们学习自动补全功能1.自动补全优先从当前的编辑区获得补全列表例如:我们写下如下内容 aaaaa aabbb aaab 当我们再次输入aa,然后我们按下Tab的时候,会弹出一个包含 aaaaa aabbb aaab的列表触发补全模式的条件1.插入模式下 ctrl+p ctrl+n 或者Tab 同时,ctrl+p ctrl+n还能够上下移动选中补全列表项还有其他的补全方法,如下这些方法都是以ctrl + x来启动的,然后跟着你想要的补全样式 1.<c-n>-普通关键字 2.<c-x&g

第三百六十八节,Python分布式爬虫打造搜索引擎Scrapy精讲—elasticsearch(搜索引擎)用Django实现搜索的自动补全功能

第三百六十八节,Python分布式爬虫打造搜索引擎Scrapy精讲-用Django实现搜索的自动补全功能 elasticsearch(搜索引擎)提供了自动补全接口 官方说明:https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-completion.html 创建自动补全字段 自动补全需要用到一个字段名称为suggest类型为Completion类型的一个字段 所以我们需要用

四十七 Python分布式爬虫打造搜索引擎Scrapy精讲—elasticsearch(搜索引擎)用Django实现搜索的自动补全功能

elasticsearch(搜索引擎)提供了自动补全接口 官方说明:https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-completion.html 1.创建搜索自动补全字段suggest 自动补全需要用到一个字段名称为suggest类型为Completion类型的一个字段 所以我们需要用将前面的elasticsearch-dsl操作elasticsearch(搜索引擎)增加sugg

js 实现类似百度联想输入,自动补全功能

js  实现类似百度联想输入,自动补全功能 方案一: search是搜索框id="search" 1 //点击页面隐藏自动补全提示框 2 document.onclick = function (e) { 3 var e = e ? e : window.event; 4 var tar = e.srcElement || e.target; 5 if (tar.id != search) { 6 if ($("#" + auto).is(":visibl

gocode+auto-complete搭建emacs的go语言自动补全功能

上篇随笔记录了在emacs中使用go-mode和goflymake搭建了go语言的简单编程环境(推送门),今天来记录一下使用gocode+auto-complete配置emacs中go语言的自动补全功能.先看一下效果图??,我的emacs配置可以参考myemacs-conf. 关于gocode gocode是nsf写的各种编辑器提供go语言自动补全功能的工具.参考github中的README.md对gocode进行一下安装. $ go get -u github.com/nsf/gocode 安

辛星深入分析vim的自动补全功能以及vim的映射

以前对于vim的自动补全功能,都是需要的时候从网上下载点配置项,然后拷贝到自己的vimrc上去,自己也不知道是什么意思,结果发现搜索到的很多自动补全的方式都很另类,有的喜欢在补全大括号的时候自动换行,还有的喜欢在补全大括号的时候自动缩进一下,那么,我们花几分钟时间了解一下,自己写出来这些配置,何乐而不为呢? ********************************按键映射****************************** 1.首先分清一个概念,那就是nnoremap和inore

editplus中html的自动补全功能

之前一直都是“纯手工”,一个一个符号慢慢敲的,现在编码量大了,自然效率也不能还停留在一个一个慢慢敲的时代. 如何设置editplus中的自动补全功能? 首先在Tools中找到Configure User Tools...,点击File选项下的Settings &&syntax,在File types中选中HTML. 将下面的Auto Completion选项勾选,默认是htmlbar.acp.点击下面的OK就可以了. 怎么使用editplus的自动补全呢? 首先先新建一个HTML页面,当你