[ElasticSearch] 如何使用中文分詞ik與繁簡轉換stconvert插件

一. 環境安裝

ElasticSearch(以下簡稱ES)安裝已經算相對簡單了, 但要使用需要配置的插件, 對剛入門的新手還是有點麻煩, 所幸medcl大神提供一個配置好的ES-rtf版本, 新手們在參數配置上浪費無謂的時間。

Github上的elasticsearch-rtf , 裡面都有詳盡的使用說明, 這裡就不多說了.(P.S. ansj,string2int這兩個插件需使用Redis, 若不需要使用可以將這兩個插件移除)

二. 使用ik中文分詞插件

以ik中文分詞插件例子在ES-Sense演示語法 ,

1. 創建名為myindex的index

PUT http://localhost:9200/myindex

2.在Type名為fulltext建立mapping, 若fulltext有資料ES會報錯

POST http://localhost:9200/myindex/fulltext/_mapping
{
    "fulltext": {
             "_all": {
            "indexAnalyzer": "ik",
            "searchAnalyzer": "ik",
            "term_vector": "no",
            "store": "false"
        },
        "properties": {
            "content": {
                "type": "string",
                "store": "no",
                "term_vector": "with_positions_offsets",
                "indexAnalyzer": "ik",
                "searchAnalyzer": "ik",
                "include_in_all": "true",
                "boost": 8
            }
        }
    }
}

3. 接下來我們可以在/index/fulltext建立一下資料了, 這裡將網站上content改為"content", 在Sense執行是都沒有問題的, 但使用Paly Framework在解析Json會出現問題, 所以最好補上雙引號

POST http://localhost:9200/myindex/fulltext/1
{"content":"美国留给伊拉克的是个烂摊子吗"}

POST http://localhost:9200/myindex/fulltext/2
{"content":"公安部：各地校车将享最高路权"}

POST http://localhost:9200/myindex/fulltext/3
{"content":"中韩渔警冲突调查：韩警平均每天扣1艘中国渔船"}

POST http://localhost:9200/myindex/fulltext/4
{"content":"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"}

4. 接下來我們用"中国"這個關鍵字做搜尋

POST http://localhost:9200/myindex/fulltext/_search
{
    "query" : { "term" : { "content" : "中国" }},
    "highlight" : {
        "pre_tags" : ["<tag1>", "<tag2>"],
        "post_tags" : ["</tag1>", "</tag2>"],
        "fields" : {
            "content" : {}
        }
    }
}

ik github上有搜尋的結果這裡就不顯示了

5. 如何輸入繁體"中國" 得出相同的搜尋結果呢?

GET /myindex/fulltext/_search?pretty
{
  query: {
    match: {
      content: {
        analyzer: "t2s_convert",
        query: "中國"

      }

    }

  },
   "highlight" : {
        "pre_tags" : ["<tag1>", "<tag2>"],
        "post_tags" : ["</tag1>", "</tag2>"],
        "fields" : {
            "content" : {}
        }
    }
}

analyzer: "t2s_convert"這指的是設定analyzer選擇用t2s_convert(繁 to 簡轉換)

這樣就能繁體對簡體資料做搜尋囉。

[ElasticSearch] 如何使用中文分詞ik與繁簡轉換stconvert插件

时间： 2024-10-07 08:42:00

[ElasticSearch] 如何使用中文分詞ik與繁簡轉換stconvert插件

[ElasticSearch] 如何使用中文分詞ik與繁簡轉換stconvert插件的相关文章

如何在Elasticsearch中安装中文分词器(IK)和拼音分词器？

我与solr(六)--solr6.0配置中文分词器IK Analyzer

PHPAnalysis中文分词类实用教程

jQuery - 中文輸入法與KeyDown/KeyPress事件

ElasticSearch的中文分词器ik

elasticsearch 在查询中文时需要分字

ElasticSearch中文分词器-IK分词器的使用

Docker 安装ElasticSearch的中文分词器IK

Elasticsearch实践（四）：IK分词