Query DSL for elasticsearch Query

Query DSL

Query DSL (资料来自: http://www.elasticsearch.cn/guide/reference/query-dsl/)
http://elasticsearch.qiniudn.com/

--简介--

elasticsearch 提供基于JSON的完整的Query DSL查询表达式(DSL即领域专用语言). 一般来说, 普通的查询如 term
或者 prefix. 另外还有混合查询如 bool 等. 另外查询表达式(Queries)还能够关联特定的过滤表达式,如 filtered 或者
constant_score 查询.
你可以把Query DSL当作是一系列的抽象的查询表达式树( AST ). 特定查询能够包含其它的查询,(如 bool ),
有些查询能够包含过滤器(如 constant_score), 还有的可以同时包含查询和过滤器 (如 filtered).
都能够从ES支持查询集合里面选择任意一个查询或者是从过滤器集合里面挑选出任意一个过滤器, 这样的话,我们就可以构造出任意复杂(maybe
非常有趣)的查询了,是不是很灵活啊.
查询和过滤都可以被用于各种不同的API接口里面. 如 search query, 或者是 facet filter 等等. 本章会介绍构造AST能够用到的各种查询或者过滤器.

提示. 过滤器非常有用因为他们比简单的查询更快(不进行文档评分)并且会自动缓存.

过滤器和缓存(Filters and Caching)
过滤器是用来实现缓存的很好的办法. 因为缓存这些过滤结果并不需要太多的内存, 而且其它的查询可以重用这些过滤(注意是同样参数哦),所以速度是刷刷的.
某些过滤产生的结果是很易于缓存的,有关缓存与否的区别在于是否将过滤结果存放到缓存中,像如下过滤器如 term, terms, prefix, 和 range 默认就是会进行缓存的, 并且建议使用这些过滤条件而不使用同等效果的查询.
其它过滤器,一般会将字段数据加载到内存中来工作, 默认是不缓存结果的.
这些过滤操作的速度其实已经非常快了,如果将它们的结果缓存需要做额外的操作来使它们能够被其它查询使用,这些查询,包括地理位置的(geo),
numeric_range, 和 script 默认是没有缓存结果的.
最后一个过滤器的类型是过滤器之间的组合, and, not 和 or ,这些过滤器是没有缓存结果的,因为它们主要是操作内联的过滤器,所以不需要过滤.

所有的过滤器都允许设置 _cache 元素来显式的控制缓存与否. 并且允许设置一个 _cache_key 用来当作缓存的主键. 这个在过滤大集合的情况下非常有用 (如包含很多元素的 terms filter).

--Text Query--

text 类型的查询, 可以用于处理各种文本. 例如:
{
    "text" : {
        "message" : "this is a test"
    }
}
注意, 虽然他的名字叫text, 但可以用它来精确匹配 (类似于 term) 数字和日期.
其中, message 是字段的名称, 你可以用你实际使用的字段名来替换 (包括 _all).

Text Queries的类型
boolean
默认的 text 查询是 boolean 型的. 意思就是说提供的文本会被分析构建为一个布尔型查询. operator 标志可以使用 or 或者 and 来组合布尔子句 (默认为 or).
analyzer 用于设定在分析过程中哪一个分析器会用于处理这段文本. 它会使用mapping中定义的分析器, 如果没有定义则会使用索引的默认分析器.
fuzziness can be set to a value (depending on the relevant type, for
string types it should be a value between 0.0 and 1.0) to constructs
fuzzy queries for each term analyzed. The prefix_length and
max_expansions can be set in this case to control the fuzzy
process.
下面这个例子使用了额外的参数 (注意例子中的结构变化, message 是字段的名称):

{
    "text" : {
        "message" : {
            "query" : "this is a test",
            "operator" : "and"
        }
    }
}
phrase

text_phrase 查询会分析文本并且创建一个 phrase 查询. 例如:
{
    "text_phrase" : {
        "message" : "this is a test"
    }
}
既然 text_phrase 只是 text 查询的一个 种类 , 你也可以使用下面的方式:
{
    "text" : {
        "message" : {
            "query" : "this is a test",
            "type" : "phrase"
        }
    }
}
A phrase query maintains order of the terms up to a configurable slop (which defaults to 0).
The analyzer can be set to control which analyzer will perform the
analysis process on the text. It default to the field explicit mapping
definition, or the default search analyzer, for example:
{
    "text_phrase" : {
        "message" : {
            "query" : "this is a test",
            "analyzer" : "my_analyzer"
        }
    }
}
text_phrase_prefix

The text_phrase_prefix is the same as text_phrase, expect it allows for
prefix matches on the last term in the text. For example:

{
    "text_phrase_prefix" : {
        "message" : "this is a test"
    }
}
Or:

{
    "text" : {
        "message" : {
            "query" : "this is a test",
            "type" : "phrase_prefix"
        }
    }
}
It accepts the same parameters as the phrase type. In addition, it also
accepts a max_expansions parameter that can control to how many prefixes
the last term will be expanded. It is highly recommended to set it to
an acceptable value to control the execution
time of the query. For example:
{
    "text_phrase_prefix" : {
        "message" : {
            "query" : "this is a test",
            "max_expansions" : 10
        }
    }
}
Comparison to query_string / field

The text family of queries does not go through a “query parsing”
process. It does not support field name prefixes, wildcard characters,
or other “advance” features. For this reason, chances of it failing are
very small / non existent, and it provides an excellent
behavior when it comes to just analyze and run that text as a query
behavior (which is usually what a text search box does). Also, the
phrase_prefix can provide a great “as you type” behavior to
automatically load search results.

--Bool Query--

一个由其他类型查询组合而成的文档匹配查询, 对应Lucene的 BooleanQuery. 它可以由一个或者多个查询语句构成, 每种语句都有它们的匹配条件. 可能的匹配条件如下:

Occur Description
must 匹配的文档必须满足该查询语句.
should 匹配的文档可以满足该查询语句.
如果一个布尔查询(Bool Query)不包含 must 查询语句, 那么匹配的文档必须满足其中一个或多个 should 查询语句, 可以使用
minimum_number_should_match 参数来设定最低满足的数量.
must_not 匹配的文档必须不满足该查询语句. 注意, 不能只用一个 must_not 查询语句来搜索文档.
布尔查询(Bool Query)也支持 disable_coord 参数 (默认为 false).

{
    "bool" : {
        "must" : {
            "term" : { "user" : "kimchy" }
        },
        "must_not" : {
            "range" : {
                "age" : { "from" : 10, "to" : 20 }
            }
        },
        "should" : [
            {
                "term" : { "tag" : "wow" }
            },
            {
                "term" : { "tag" : "elasticsearch" }
            }
        ],
        "minimum_number_should_match" : 1,
        "boost" : 1.0
    }
}

--Boosting Query--

The boosting query can be used to effectively demote results that match a
given query. Unlike the “NOT” clause in bool query, this still selects
documents that contain undesirable terms, but reduces their overall
score.

{
    "boosting" : {
        "positive" : {
            "term" : {
                "field1" : "value1"
            }
        },
        "negative" : {
            "term" : {
                "field2" : "value2"
            }
        },
        "negative_boost" : 0.2
    }
}

--Ids Query--

Filters documents that only have the provided ids. Note, this filter
does not require the _id field to be indexed since it works using the
_uid field.
{
    "ids" : {
        "type" : "my_type"
        "values" : ["1", "4", "100"]
    }
}    
The type is optional and can be omitted, and can also accept an array of values.

--Custom Score Query--

custom_score 查询可以包含其他种类的查询并且自定义评分标准, 可以使用 脚本表达式 来根据文档查询结果中(数值型)的值计算评分, 下面是一个简单的例子:
"custom_score" : {
    "query" : {
        ....
    },
    "script" : "_score * doc[‘my_numeric_field‘].value"
}
除了使用文档结果字段和脚本表达式外, 还可以使用 _score 参数来获取其所含查询的评分.

脚本参数
脚本会被缓存下来用以加快执行速度. 如果脚本中有参数需要代入使用的话, 推荐的方法是使用同一个脚本,然后传入参数:

"custom_score" : {
    "query" : {
        ....
    },
    "params" : {
        "param1" : 2,
        "param2" : 3.1
    }
    "script" : "_score * doc[‘my_numeric_field‘].value / pow(param1, param2)"
}

--Constant Score Query--

A query that wraps a filter or another query and simply returns a
constant score equal to the query boost for every document in the
filter. Maps to Lucene ConstantScoreQuery.
{
    "constant_score" : {
        "filter" : {
            "term" : { "user" : "kimchy"}
        },
        "boost" : 1.2
    }
}
The filter object can hold only filter elements, not queries. Filters
can be much faster compared to queries since they don’t perform any
scoring, especially when they are cached.

A query can also be wrapped in a constant_score query:
{
    "constant_score" : {
        "query" : {
            "term" : { "user" : "kimchy"}
        },
        "boost" : 1.2
    }
}

--Dis Max Query--

A query that generates the union of documents produced by its
subqueries, and that scores each document with the maximum score for
that document as produced by any subquery, plus a tie breaking increment
for any additional matching subqueries.

This is useful when searching for a word in multiple fields with
different boost factors (so that the fields cannot be combined
equivalently into a single search field). We want the primary score to
be the one associated with the highest boost, not the sum
of the field scores (as Boolean Query would give). If the query is
“albino elephant” this ensures that “albino” matching one field and
“elephant” matching another gets a higher score than “albino” matching
both fields. To get this result, use both Boolean
Query and DisjunctionMax Query: for each term a DisjunctionMaxQuery
searches for it in each field, while the set of these
DisjunctionMaxQuery’s is combined into a BooleanQuery.

The tie breaker capability allows results that include the same term in
multiple fields to be judged better than results that include this term
in only the best of those multiple fields, without confusing this with
the better case of two different terms in
the multiple fields.The default tie_breaker is 0.0.

This query maps to Lucene DisjunctionMaxQuery.
{
    "dis_max" : {
        "tie_breaker" : 0.7,
        "boost" : 1.2,
        "queries" : [
            {
                "term" : { "age" : 34 }
            },
            {
                "term" : { "age" : 35 }
            }
        ]
    }
}

--Field Query--

A query that executes a query string against a specific field. It is a
simplified version of query_string query (by setting the default_field
to the field this query executed against). In its simplest form:
{
    "field" : { 
        "name.first" : "+something -else"
    }
}
Most of the query_string parameters are allowed with the field query as
well, in such a case, the query should be formatted as follows:
{
    "field" : { 
        "name.first" : {
            "query" : "+something -else",
            "boost" : 2.0,
            "enable_position_increments": false
        }
    }
}

--Filtered Query--

对应于Lucene里面的 FilteredQuery ,可以在一个查询的结果上应用一个过滤操作.
{
    "filtered" : {
        "query" : {
            "term" : { "tag" : "wow" }
        },
        "filter" : {
            "range" : {
                "age" : { "from" : 10, "to" : 20 }
            }
        }
    }
}
该DSL里面的 filter 对象只能使用 filter 元素, 而不能是query类型. 过滤(Filters) 要比查询快很多,因为和查询相比它们不需要执行打分过程, 尤其是当设置缓存过滤结果之后.

--Fuzzy Like Query--

Fuzzy like this query find documents that are “like” provided text by running it against one or more fields.
{
    "fuzzy_like_this" : {
        "fields" : ["name.first", "name.last"],
        "like_text" : "text like this one",
        "max_query_terms" : 12
    }
}
fuzzy_like_this can be shortened to flt.

The fuzzy_like_this top level parameters include:

Parameter Description
fields A list of the fields to run the more like this query against. Defaults to the _all field.
like_text The text to find documents like it, required.
ignore_tf Should term frequency be ignored. Defaults to false.
max_query_terms The maximum number of query terms that will be included in any generated query. Defaults to 25.
min_similarity The minimum similarity of the term variants. Defaults to 0.5.
prefix_length Length of required common prefix on variant terms. Defaults to 0.
boost Sets the boost value of the query. Defaults to 1.0.
analyzer The analyzer that will be used to analyze the text. Defaults to the analyzer associated with the field.
How it Works
Fuzzifies ALL terms provided as strings and then picks the best n
differentiating terms. In effect this mixes the behaviour of FuzzyQuery
and MoreLikeThis but with special consideration of fuzzy scoring
factors. This generally produces good results for queries
where users may provide details in a number offields and have no
knowledge of boolean query syntax and also want a degree of fuzzy
matching and a fast query.

For each source term the fuzzy variants are held in a BooleanQuery with
no coord factor (because we are not looking for matches on multiple
variants in any one doc). Additionally, a specialized TermQuery is used
for variants and does not use that variant term’s
IDF because this would favour rarer terms eg misspellings. Instead, all
variants use the same IDF ranking (the one for the source query term)
and this is factored into the variant’s boost. If the source query term
does not exist in the index the average IDF
of the variants is used.

--Fuzzy Like Field Query--

The fuzzy_like_this_field query is the same as the fuzzy_like_this
query, except that it runs against a single field. It provides nicer
query DSL over the generic fuzzy_like_this query, and support typed
fields query (automatically wraps typed fields with type
filter to match only on the specific type).
{
    "fuzzy_like_this_field" : {
        "name.first" : {
            "like_text" : "text like this one",
            "max_query_terms" : 12
        }
    }
}
fuzzy_like_this_field can be shortened to flt_field.

The fuzzy_like_this_field top level parameters include:

Parameter Description
like_text The text to find documents like it, required.
ignore_tf Should term frequency be ignored. Defaults to false.
max_query_terms The maximum number of query terms that will be included in any generated query. Defaults to 25.
min_similarity The minimum similarity of the term variants. Defaults to 0.5.
prefix_length Length of required common prefix on variant terms. Defaults to 0.
boost Sets the boost value of the query. Defaults to 1.0.
analyzer The analyzer that will be used to analyze the text. Defaults to the analyzer associated with the field.

--Fuzzy Query--

A fuzzy based query that uses similarity based on Levenshtein (edit distance) algorithm.

Warning: this query is not very scalable with its default prefix length
of 0 – in this case, every term will be enumerated and cause an edit
score calculation or max_expansions is not set.

Here is a simple example:
{
    "fuzzy" : { "user" : "ki" }
}
More complex settings can be set (the values here are the default values):
    {
        "fuzzy" : { 
            "user" : {
                "value" : "ki",
                "boost" : 1.0,
                "min_similarity" : 0.5,
                "prefix_length" : 0
            }
        }
    }
The max_expansions parameter (unbounded by default) controls the number of terms the fuzzy query will expand to.

Numeric / Date Fuzzy

fuzzy query on a numeric field will result in a range query “around” the value using the min_similarity value. For example:
{
    "fuzzy" : {
        "price" : {
            "value" : 12,
            "min_similarity" : 2
        }
    }
}
Will result in a range query between 10 and 14. Same applies to dates,
with support for time format for the min_similarity field:
{
    "fuzzy" : {
        "created" : {
            "value" : "2010-02-05T12:05:07",
            "min_similarity" : "1d"
        }
    }
}
In the mapping, numeric and date types now allow to configure a
fuzzy_factor mapping value (defaults to 1), which will be used to
multiply the fuzzy value by it when used in a query_string type query.
For example, for dates, a fuzzy factor of “1d” will result
in multiplying whatever fuzzy value provided in the min_similarity by
it. Note, this is explicitly supported since query_string query only
allowed for similarity valued between 0.0 and 1.0.

--Has Child Query--

has_child 查询仅仅是将一个 has_child 过滤器包含进了一个 constant_score 中. 它的语法跟 has_child filter 是一样的:
{
    "has_child" : {
        "type" : "blog_tag"
        "query" : {
            "term" : {
                "tag" : "something"
            }
        }
    }
}    
Scope

A _scope can be defined on the filter allowing to run facets on the same
scope name that will work against the child documents. For example:
{
    "has_child" : {
        "_scope" : "my_scope",
        "type" : "blog_tag"
        "query" : {
            "term" : {
                "tag" : "something"
            }
        }
    }
}    
内存考量
目前的实现方式是, 所有 _id 的值都会被载入了内存(堆)以便于更快的查找, 所以请确认有足够的内存来存放它们.

--Match All Query--

A query that matches all documents. Maps to Lucene MatchAllDocsQuery.
{
    "match_all" : { }
}
Which can also have boost associated with it:

{
    "match_all" : { "boost" : 1.2 }
}
Index Time Boost

When indexing, a boost value can either be associated on the document
level, or per field. The match all query does not take boosting into
account by default. In order to take boosting into account, the
norms_field needs to be provided in order to explicitly
specify which field the boosting will be done on (Note, this will
result in slower execution time). For example:
{
    "match_all" : { "norms_field" : "my_field" }
}

--More Like This Query--

More like this query find documents that are “like” provided text by running it against one or more fields.
{
    "more_like_this" : {
        "fields" : ["name.first", "name.last"],
        "like_text" : "text like this one",
        "min_term_freq" : 1,
        "max_query_terms" : 12
    }
}
more_like_this can be shortened to mlt.

The more_like_this top level parameters include:

Parameter Description
fields A list of the fields to run the more like this query against. Defaults to the _all field.
like_text The text to find documents like it, required.
percent_terms_to_match The percentage of terms to match on (float value). Defaults to 0.3 (30 percent).
min_term_freq The frequency below which terms will be ignored in the source doc. The default frequency is 2.
max_query_terms The maximum number of query terms that will be included in any generated query. Defaults to 25.
stop_words An array of stop words.
Any word in this set is considered “uninteresting” and ignored. Even if
your Analyzer allows stopwords, you might want to tell the MoreLikeThis
code to ignore them, as for the purposes
of document similarity it seems reasonable to assume that “a stop word
is never interesting”.
min_doc_freq The frequency at which words will be ignored which do not occur in at least this many docs. Defaults to 5.
max_doc_freq The maximum frequency
in which words may still appear. Words that appear in more than this
many docs will be ignored. Defaults to unbounded.
min_word_len The minimum word length below which words will be ignored. Defaults to 0.
max_word_len The maximum word length above which words will be ignored. Defaults to unbounded (0).
boost_terms Sets the boost factor to use when boosting terms. Defaults to 1.
boost Sets the boost value of the query. Defaults to 1.0.
analyzer The analyzer that will be used to analyze the text. Defaults to the analyzer associated with the field.

--More Like This Field Query--

The more_like_this_field query is the same as the more_like_this query,
except it runs against a single field. It provides nicer query DSL over
the generic more_like_this query, and support typed fields query
(automatically wraps typed fields with type filter
to match only on the specific type).
{
    "more_like_this_field" : {
        "name.first" : {
            "like_text" : "text like this one",
            "min_term_freq" : 1,
            "max_query_terms" : 12
        }
    }
}
more_like_this_field can be shortened to mlt_field.

The more_like_this_field top level parameters include:

Parameter Description
like_text The text to find documents like it, required.
percent_terms_to_match The percentage of terms to match on (float value). Defaults to 0.3 (30 percent).
min_term_freq The frequency below which terms will be ignored in the source doc. The default frequency is 2.
max_query_terms The maximum number of query terms that will be included in any generated query. Defaults to 25.
stop_words An array of stop words.
Any word in this set is considered “uninteresting” and ignored. Even if
your Analyzer allows stopwords, you might want to tell the MoreLikeThis
code to ignore them, as for the purposes
of document similarity it seems reasonable to assume that “a stop word
is never interesting”.
min_doc_freq The frequency at which words will be ignored which do not occur in at least this many docs. Defaults to 5.
max_doc_freq The maximum frequency
in which words may still appear. Words that appear in more than this
many docs will be ignored. Defaults to unbounded.
min_word_len The minimum word length below which words will be ignored. Defaults to 0.
max_word_len The maximum word length above which words will be ignored. Defaults to unbounded (0).
boost_terms Sets the boost factor to use when boosting terms. Defaults to 1.
boost Sets the boost value of the query. Defaults to 1.0.
analyzer The analyzer that will be used to analyze the text. Defaults to the analyzer associated with the field.

--Prefix Query 前缀--

Matches documents that have fields containing terms with a specified
prefix (not analyzed). The prefix query maps to Lucene PrefixQuery. The
following matches documents where the user field contains a term that
starts with ki:
{
    "prefix" : { "user" : "ki" }
}
A boost can also be associated with the query:
{
    "prefix" : { "user" :  { "value" : "ki", "boost" : 2.0 } }
}
Or :
{
    "prefix" : { "user" :  { "prefix" : "ki", "boost" : 2.0 } }
}
This multi term query allows to control how it gets rewritten using the rewrite parameter.

--Query String Query--

A query that uses a query parser in order to parse its content. Here is an example:
{
    "query_string" : {
        "default_field" : "content",
        "query" : "this AND that OR thus"
    }
}
The query_string top level parameters include:

Parameter Description
query The actual query to be parsed.
default_field The default field for query terms if no prefix field is specified. Defaults to the _all field.
default_operator The default
operator used if no explicit operator is specified. For example, with a
default operator of OR, the query capital of Hungary is translated to
capital OR of OR Hungary, and with default operator
of AND, the same query is translated to capital AND of AND Hungary. The
default value is OR.
analyzer The analyzer name used to analyze the query string.
allow_leading_wildcard When set, * or ? are allowed as the first character. Defaults to true.
lowercase_expanded_terms Whether
terms of wildcard, prefix, fuzzy, and range queries are to be
automatically lower-cased or not (since they are not analyzed). Default
it true.
enable_position_increments Set to true to enable position increments in result queries. Defaults to true.
fuzzy_prefix_length Set the prefix length for fuzzy queries. Default is 0.
fuzzy_min_sim Set the minimum similarity for fuzzy queries. Defaults to 0.5
phrase_slop Sets the default slop for phrases. If zero, then exact phrase matches are required. Default value is 0.
boost Sets the boost value of the query. Defaults to 1.0.
analyze_wildcard By default,
wildcards terms in a query string are not analyzed. By setting this
value to true, a best effort will be made to analyze those as well.
auto_generate_phrase_queries Default to false.
minimum_should_match A percent value (for example 20%) controlling how many “should” clauses in the resulting boolean query should match.
When a multi term query is being generated, one can control how it gets rewritten using the rewrite parameter.

Multi Field
The query_string query can also run against multiple fields. The idea of
running the query_string query against multiple fields is by internally
creating several queries for the same query string, each with
default_field that match the fields provided. Since
several queries are generated, combining them can be automatically done
either using a dis_max query or a simple bool query. For example (the
name is boosted by 5 using ^5 notation):
{
    "query_string" : {
        "fields" : ["content", "name^5"],
        "query" : "this AND that OR thus",
        "use_dis_max" : true
    }
}
When running the query_string query against multiple fields, the following additional parameters are allowed:

Parameter Description
use_dis_max Should the queries be combined using dis_max (set it to true), or a bool query (set it to false). Defaults to true.
tie_breaker When using dis_max, the disjunction max tie breaker. Defaults to 0.
The fields parameter can also include pattern based field names,
allowing to automatically expand to the relevant fields (dynamically
introduced fields included). For example:
{
    "query_string" : {
        "fields" : ["content", "name.*^5"],
        "query" : "this AND that OR thus",
        "use_dis_max" : true
    }
}
Syntax Extension
There are several syntax extensions to the Lucene query language.

missing / exists

The _exists_ and _missing_ syntax allows to control docs that have
fields that exists within them (have a value) and missing. The syntax
is: _exists_:field1, _missing_:field and can be used anywhere a query
string is used.

--Range Query--

Matches documents with fields that have terms within a certain range.
The type of the Lucene query depends on the field type, for string
fields, the TermRangeQuery, while for number/date fields, the query is a
NumericRangeQuery. The following example returns
all documents where age is between 10 and 20:
{
    "range" : {
        "age" : { 
            "from" : 10, 
            "to" : 20, 
            "include_lower" : true, 
            "include_upper": false, 
            "boost" : 2.0
        }
    }
}
The range query top level parameters include:

Name Description
from The lower bound. Defaults to start from the first.
to The upper bound. Defaults to unbounded.
include_lower Should the first from (if set) be inclusive or not. Defaults to true
include_upper Should the last to (if set) be inclusive or not. Defaults to true.
gt Same as setting from to the value, and include_lower to false.
gte Same as setting from to the value,and include_lower to true.
lt Same as setting to to the value, and include_upper to false.
lte Same as setting to to the value, and include_upper to true.
boost Sets the boost value of the query. Defaults to 1.0.

--Span First Query--

Matches spans near the beginning of a field. The span first query maps to Lucene SpanFirstQuery. Here is an example:
{
    "span_first" : {
        "match" : {
            "span_term" : { "user" : "kimchy" }
        },
        "end" : 3
    }
}    
The match clause can be any other span type query. The end controls the maximum end position permitted in a match.

--Span Near Query--

Matches spans which are near one another. One can specify slop, the
maximum number of intervening unmatched positions, as well as whether
matches are required to be in-order. The span near query maps to Lucene
SpanNearQuery. Here is an example:
{
    "span_near" : {
        "clauses" : [
            { "span_term" : { "field" : "value1" } },
            { "span_term" : { "field" : "value2" } },
            { "span_term" : { "field" : "value3" } }
        ],
        "slop" : 12,
        "in_order" : false,
        "collect_payloads" : false
    }
}
The clauses element is a list of one or more other span type queries and
the slop controls the maximum number of intervening unmatched positions
permitted.

--Span Not Query--

Removes matches which overlap with another span query. The span not query maps to Lucene SpanNotQuery. Here is an example:
{
    "span_not" : {
        "include" : {
            "span_term" : { "field1" : "value1" }
        },
        "exclude" : {
            "span_term" : { "field2" : "value2" }
        }
    }
}
The include and exclude clauses can be any span type query. The include
clause is the span query whose matches are filtered, and the exclude
clause is the span query whose matches must not overlap those returned.

--Span or Query--

Matches the union of its span clauses. The span or query maps to Lucene SpanOrQuery. Here is an example:
{
    "span_or" : {
        "clauses" : [
            { "span_term" : { "field" : "value1" } },
            { "span_term" : { "field" : "value2" } },
            { "span_term" : { "field" : "value3" } }
        ]
    }
}
The clauses element is a list of one or more other span type queries.

--Span term Query--

Matches spans containing a term. The span term query maps to Lucene SpanTermQuery. Here is an example:
{
    "span_term" : { "user" : "kimchy" }
}    
A boost can also be associated with the query:
{
    "span_term" : { "user" : { "value" : "kimchy", "boost" : 2.0 } }
}    
Or :
{
    "span_term" : { "user" : { "term" : "kimchy", "boost" : 2.0 } }
}

--Top Children Query--

The top_children query runs the child query with an estimated hits size,
and out of the hit docs, aggregates it into parent docs. If there
aren’t enough parent docs matching the requested from/size search
request, then it is run again with a wider (more hits)
search.

The top_children also provide scoring capabilities, with the ability to specify max, sum or avg as the score type.

One downside of using the top_children is that if there are more child
docs matching the required hits when executing the child query, then the
total_hits result of the search response will be incorrect.

How many hits are asked for in the first child query run is controlled
using the factor parameter (defaults to 5). For example, when asking for
10 docs with from 0, then the child query will execute with 50 hits
expected. If not enough parents are found (in
our example, 10), and there are still more child docs to query, then
the search hits are expanded my multiplying by the incremental_factor
(defaults to 2).

The required parameters are the query and type (the child type to
execute the query on). Here is an example with all different parameters,
including the default values:
{
    "top_children" : {
        "type": "blog_tag",
        "query" : {
            "term" : {
                "tag" : "something"
            }
        }
        "score" : "max",
        "factor" : 5,
        "incremental_factor" : 2
    }
}
Scope
A _scope can be defined on the query allowing to run facets on the same
scope name that will work against the child documents. For example:
{
    "top_children" : {
        "_scope" : "my_scope",
        "type": "blog_tag",
        "query" : {
            "term" : {
                "tag" : "something"
            }
        }
    }
}
Memory Considerations
With the current implementation, all _id values are loaded to memory
(heap) in order to support fast lookups, so make sure there is enough
mem for it.

--Wildcard Query 通配符--

Matches documents that have fields matching a wildcard expression (not
analyzed). Supported wildcards are *, which matches any character
sequence (including the empty one), and ?, which matches any single
character. Note this query can be slow, as it needs
to iterate over many terms. In order to prevent extremely slow wildcard
queries, a wildcard term should not start with one of the wildcards *
or ?. The wildcard query maps to Lucene WildcardQuery.
{
    "wildcard" : { "user" : "ki*y" }
}
A boost can also be associated with the query:
{
    "wildcard" : { "user" : { "value" : "ki*y", "boost" : 2.0 } }
}    
Or :
{
    "wildcard" : { "user" : { "wildcard" : "ki*y", "boost" : 2.0 } }
}    
This multi term query allows to control how it gets rewritten using the rewrite parameter.

--Nested Query 嵌套--

Nested query allows to query nested objects / docs (see nested mapping).
The query is executed against the nested objects / docs as if they were
indexed as separate docs (they are, internally) and resulting in the
root parent doc (or parent nested mapping).
Here is a sample mapping we will work with:
{
    "type1" : {
        "properties" : {
            "obj1" : {
                "type" : "nested"
            }
        }
    }
}
And here is a sample nested query usage:
{
    "nested" : {
        "path" : "obj1",
        "score_mode" : "avg",
        "query" : {
            "bool" : {
                "must" : [
                    {
                        "text" : {"obj1.name" : "blue"}
                    },
                    {
                        "range" : {"obj1.count" : {"gt" : 5}}
                    }
                ]
            }
        }
    }
}
The query path points to the nested object path, and the query (or
filter) includes the query that will run on the nested docs matching the
direct path, and joining with the root parent docs.

The score_mode allows to set how inner children matching affects scoring
of parent. It defaults to avg, but can be total, max and none.

Multi level nesting is automatically supported, and detected, resulting
in an inner nested query to automatically match the relevant nesting
level (and not root) if it exists within another nested query.

--Custom Filtered Score Query--

A custom_filters_score query allows to execute a query, and if the hit
matches a provided filter (ordered), use either a boost or a script
associated with it to compute the score. Here is an example:
{
    "custom_filters_score" : {
        "query" : {
            "match_all" : {}
        },
        "filters" : [
            {
                "filter" : { "range" : { "age" : {"from" : 0, "to" : 10} } },
                "boost" : "3"
            },
            {
                "filter" : { "range" : { "age" : {"from" : 10, "to" : 20} } },
                "boost" : "2"
            }
        ]
    }
}
This can considerably simplify and increase performance for
parameterized based scoring since filters are easily cached for faster
performance, and boosting / script is considerably simpler.

Score Mode

A score_mode can be defined to control how multiple matching filters
control the score. By default, it is set to first which means the first
matching filter will control the score of the result. It can also be set
to max/total/avg which will aggregate the result
from all matching filters based on the aggregation type.

Script

A script can be used instead of boost for more complex score
calculations. With optional params and lang (on the same level as query
and filters).

--官网地址--

from: http://www.elasticsearch.cn/guide/reference/query-dsl/)

other: http://elasticsearch.qiniudn.com/

时间: 2024-10-18 08:09:06

Query DSL for elasticsearch Query的相关文章

ElasticSearch search api的基础语法+Query DSL搜索+filter与query对比+组合查询+定位不合法的搜索

一. search api的基础语法 1.search语法 GET /search{} GET /index1,index2/type1,type2/search{} GET /_search{ "from": 0, "size": 10} 2.http协议中get是否可以带上request body HTTP协议,一般不允许get请求带上request body,但是因为get更加适合描述查询数据的操作,因此还是这么用了 GET /_search?from=0&a

Query DSL(1)

https://www.elastic.co/guide/en/elasticsearch/reference/2.3/query-dsl.html Query DSL GET _search { "query": { "bool": { "must": [ { "match": { "title": "Search" }}, { "match": { "c

Elasticsearch Index API & Aggregations API & Query DSL

这篇小菜给大家演示和讲解一些Elasticsearch的API,如在工作中用到时,方便查阅. 一.Index API 创建索引库 curl -XPUT 'http://127.0.0.1:9200/test_index/' -d '{     "settings" : {       "index" : {       "number_of_shards" : 3,       "number_of_replicas" : 1

Elasticsearch学习笔记(二)Search API 与 Query DSL

一. Search API eg: GET /mall/product/_search?q=name:productName&sort=price desc 特点:search的请求参数都是以HTTP请求的的query stirng 附带的 适用范围:适用于临时的在命令行使用一些工具,比如curl,快速的发出请求,来检索想要的信息: 适用于简单的查询条件 二.Query DSL 将Query DSL视为ASL查询则有两种类型的查询语句: 叶子查询语句(Leaf Query clause) : 叶

Elasticsearch Query DSL 整理总结(三)—— Match Phrase Query 和 Match Phrase Prefix Query

目录 引言 Match Phase Query slop 参数 analyzer 参数 zero terms query Match Phrase 前缀查询 max_expansions 小结 参考文档 系列文章列表 Query DSL Java Rest Client API 引言 今天再读庄子的<逍遥游>,其中鲲鹏之扶摇直上九万里之气势,蜩(tiao)与学鸠之渺小之对比,令人印象深刻,并对鲲鹏之志心生向往.而郭象在注<庄子>卷中却说,"苟足于其性,则虽大鹏无以自贵于小

Elasticsearch Query DSL备忘(1)(Constant score query和Bool Query)

Query DSL (Domain Specific Language),基于json的查询方式 1.Constant score query,常量分值查询,目的就是返回指定的score,一般都结合filter使用,因为filter context忽略score. GET /customer/_search { "query": { "constant_score": { "filter": { "match": { &quo

elasticsearch query 和 filter 的区别

Query查询器 与 Filter 过滤器 尽管我们之前已经涉及了查询DSL,然而实际上存在两种DSL:查询DSL(query DSL)和过滤DSL(filter DSL).过滤器(filter)通常用于过滤文档的范围,比如某个字段是否属于某个类型,或者是属于哪个时间区间* 创建日期是否在2014-2015年间?* status字段是否为success? * lat_lon字段是否在某个坐标的10公里范围内? 查询器(query)的使用方法像极了filter,但query更倾向于更准确的查找.

009-elasticsearch【三】示例数据导入、URI查询方式简介、Query DSL简介、查询简述【_source、match、must、should等】、过滤器、聚合

一.简单数据 客户银行账户信息,json { "account_number": 0, "balance": 16623, "firstname": "Bradshaw", "lastname": "Mckenzie", "age": 29, "gender": "F", "address": "2

016-elasticsearch【五】-Query DSL【1】-查询上下文,过滤上下文、match_all

一.概述 Elasticsearch提供基于JSON的完整查询DSL来定义查询.将Query DSL视为查询的AST,由两种类型的子句组成: 叶子查询子句 叶子查询子句在特定字段中查找特定值,例如匹配,词条或范围查询.这些查询可以自己使用. 复合查询子句 复合查询子句包装其他叶或复合查询,并用于以逻辑方式(例如bool或dis_max查询)组合多个查询,或者改变它们的行为(如constant_score查询). 查询子句的行为有所不同,具体取决于它们是在查询上下文还是过滤器上下文中使用. 二.查