Performs the analysis process on a text and return the tokens breakdown of the text

Analyzeedit

Performs the analysis process on a text and return the tokens breakdown of the text.

Can be used without specifying an index against one of the many built in analyzers:

GET _analyze
{
  "analyzer" : "standard",
  "text" : "this is a test"
}

COPY AS CURLVIEW IN CONSOLE

If text parameter is provided as array of strings, it is analyzed as a multi-valued field.

GET _analyze
{
  "analyzer" : "standard",
  "text" : ["this is a test", "the second text"]
}

COPY AS CURLVIEW IN CONSOLE

Or by building a custom transient analyzer out of tokenizers, token filters and char filters. Token filters can use the shorter filter parameter name:

GET _analyze
{
  "tokenizer" : "keyword",
  "filter" : ["lowercase"],
  "text" : "this is a test"
}

COPY AS CURLVIEW IN CONSOLE

GET _analyze
{
  "tokenizer" : "keyword",
  "filter" : ["lowercase"],
  "char_filter" : ["html_strip"],
  "text" : "this is a <b>test</b>"
}

COPY AS CURLVIEW IN CONSOLE

Deprecated in 5.0.0.

Use filter/char_filter instead of filters/char_filters and token_filters has been removed

Custom tokenizers, token filters, and character filters can be specified in the request body as follows:

GET _analyze
{
  "tokenizer" : "whitespace",
  "filter" : ["lowercase", {"type": "stop", "stopwords": ["a", "is", "this"]}],
  "text" : "this is a test"
}

COPY AS CURLVIEW IN CONSOLE

It can also run against a specific index:

GET twitter/_analyze
{
  "text" : "this is a test"
}

COPY AS CURLVIEW IN CONSOLE

The above will run an analysis on the "this is a test" text, using the default index analyzer associated with the test index. An analyzer can also be provided to use a different analyzer:

GET twitter/_analyze
{
  "analyzer" : "whitespace",
  "text" : "this is a test"
}

COPY AS CURLVIEW IN CONSOLE

Also, the analyzer can be derived based on a field mapping, for example:

GET twitter/_analyze
{
  "field" : "obj1.field1",
  "text" : "this is a test"
}

COPY AS CURLVIEW IN CONSOLE

Will cause the analysis to happen based on the analyzer configured in the mapping for obj1.field1(and if not, the default index analyzer).

Deprecated in 5.1.0 request parameters are deprecated and will be removed in the next major release. please use JSON params instead of request params.

All parameters can also supplied as request parameters. For example:

GET /_analyze?tokenizer=keyword&filter=lowercase&text=this+is+a+test

COPY AS CURLVIEW IN CONSOLE

For backwards compatibility, we also accept the text parameter as the body of the request, provided it doesn’t start with { :

curl -XGET ‘localhost:9200/_analyze?tokenizer=keyword&filter=lowercase&char_filter=reverse‘ -d ‘this is a test‘ -H ‘Content-Type: text/plain‘

Deprecated in 5.1.0 the text parameter as the body of the request are deprecated and this feature will be removed in the next major release. please use JSON text param

时间: 2024-10-10 06:27:34

Performs the analysis process on a text and return the tokens breakdown of the text的相关文章

论文阅读(Weilin Huang——【arXiv2016】Accurate Text Localization in Natural Image with Cascaded Convolutional Text Network)

Weilin Huang--[arXiv2016]Accurate Text Localization in Natural Image with Cascaded Convolutional Text Network 目录 作者和相关链接 背景介绍 方法概括 方法细节 实验结果 总结与收获点 参考文献 作者和相关链接 个人主页:Tong He,黄伟林,乔宇,姚剑 作者简单信息: 论文下载:论文传送门 背景介绍 自底向上方法(bottom up)的一般流程 Step 1: 用滑动窗口或者MSER

Oracle Error - &quot;OCIEnvCreate failed with return code -1 but error message text was not available&quot;.

ISSUE: When trying to connect to an Oracle database you receive the following error: "OCIEnvCreate failed with return code -1 but error message text was not available" CAUSE: 以下两种情况之一是可能的原因: 1,你在 Windows 7 上使用不支持的版本的 Oracle 客户端 (超过 11.2). 2,从以前安

AFN不支持 &quot;text/html&quot; 的数据的问题:unacceptable content-type: text/html

使用AFN若遇到这个错误信息 Request failed: unacceptable content-type: text/html bug原因: 这不是AFNetworking的问题 这是做Server那边的人沒把head内的 meta的content格式指定好 解决方法: 本文介绍两种方法, 第一种方法较好, 第二种存在风险, 特此说明,请勿使用第二种! 1.第一种方法: 在懒加载AFHTTPSessionManager 对象时, 告诉AFN,支持接受 text/xml 的数据,代码如下:

http://elasticsearch-py.readthedocs.io/en/master/api.html

API Documentation All the API calls map the raw REST api as closely as possible, including the distinction between required and optional arguments to the calls. This means that the code makes distinction between positional and keyword arguments; we,

jquery的http请求对响应内容的处理

写在前面:在学习XMLHttpRequest对象时碰到的一个问题,发现jquery的http请求会自动处理响应内容,变为开发者想要的数据,下面来看看吧: 实验案例: var xhr=new XMLHttpRequest(); xhr.onreadystatechange=function(e){ console.log(e); if(xhr.readyState==4 && xhr.status==200){ console.log(xhr); console.log(xhr.respon

11大Java开源中文分词器的使用方法和分词效果对比

本文的目标有两个: 1.学会使用11大Java开源中文分词器 2.对比分析11大Java开源中文分词器的分词效果 本文给出了11大Java开源中文分词的使用方法以及分词结果对比代码,至于效果哪个好,那要用的人结合自己的应用场景自己来判断. 11大Java开源中文分词器,不同的分词器有不同的用法,定义的接口也不一样,我们先定义一个统一的接口: /** * 获取文本的所有分词结果, 对比不同分词器结果 * @author 杨尚川 */ public interface WordSegmenter {

Webpact打包React后端Node+Express

Webpact打包React后端Node+Express 前言 React官方推荐用Browserify或者Webpack 来开发React组件. Webpack 是什么?是德国开发者 Tobias Koppers 开发的模块加载器.Instagram 工程师认为这个方案很棒, 似乎还把作者招过去了.在 Webpack 当中, 所有的资源都被当作是模块, js, css, 图片等等..Webpack 都有对应的模块 loader,如下文中将用到jsx-loader来加载带react语法的js文件

9大Java开源中文分词器的使用方法和分词效果对比

本文的目标有两个: 1.学会使用9大Java开源中文分词器 2.对比分析9大Java开源中文分词器的分词效果 9大Java开源中文分词器,不同的分词器有不同的用法,定义的接口也不一样,我们先定义一个统一的接口: /**  * 获取文本的所有分词结果, 对比不同分词器结果  * @author 杨尚川  */ public interface WordSegmenter {     /**      * 获取文本的所有分词结果      * @param text 文本      * @retur

对话框(api)

对话框 壹佰软件开发小组  整理编译   如果有很多输入超出了菜单可以处理的程度,那么我们可以使用对话框来取得输入信息.程序写作者可以通过在某选项后面加上省略号(…)来表示该菜单项将启动一个对话框. 对话框的一般形式是包含多种子窗口控件的弹出式窗口,这些控件的大小和位置在程序资源描述文件的「对话框模板」中指定.虽然程序写作者能够「手工」定义对话框模板,但是现在通常是在Visual C++ Developer Studio中以交谈式操作的方式设计的,然后由Developer Studio建立对话框