ElasticSearch 嵌套映射和过滤器及查询

ElasticSearch - 嵌套映射和过滤器

Because nested objects are indexed as separate hidden documents, we can’t query them directly. Instead, we have to use the nested query to access them:

GET /my_index/blogpost/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "eggs" }}, 
        {
          "nested": {
            "path": "comments", 
            "query": {
              "bool": {
                "must": [ 
                  { "match": { "comments.name": "john" }},
                  { "match": { "comments.age":  28     }}
                ]
        }}}}
      ]
}}}

①The title clause operates on the root document.②The nested clause “steps down” into the nested comments field. It no longer has access to fields in the root document, nor fields in any other nested document.③ The comments.name and comments.age clauses operate on the same nested  document

nested field can contain other nested fields. Similarly, a nested query can contain othernested queries. The nesting hierarchy is applied as you would expect.

Of course, a nested query could match several nested documents. Each matching nested document would have its own relevance score, but these multiple scores need to be reduced to a single score that can be applied to the root document.

By default, it averages the scores of the matching nested documents. This can be controlled by setting thescore_mode parameter to avgmaxsum, or even none (in which case the root document gets a constant score of 1.0).

GET /my_index/blogpost/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "eggs" }},
        {
          "nested": {
            "path":       "comments",
            "score_mode": "max", 
            "query": {
              "bool": {
                "must": [
                  { "match": { "comments.name": "john" }},
                  { "match": { "comments.age":  28     }}
                ]
        }}}}
      ]
}}}

①Give the root document the _score from the best-matching nested document.

If placed inside the filter clause of a Boolean query, a nested query behaves much like anested query, except that it doesn’t accept the score_mode parameter. Because it is being used as a non-scoring query — it includes or excludes, but doesn’t score —  a score_modedoesn’t make sense since there is nothing to score.

curl -XPOST "http://localhost:9200/index-1/movie/" -d‘{   "title": "The Matrix",   "cast": [      {         "firstName": "Keanu",         "lastName": "Reeves"      },      {         "firstName": "Laurence",         "lastName": "Fishburne"      }   ]}‘

Given many such movies in our index we can find all movies with an actor named "Keanu" using a search request such as:

curl -XPOST "http://localhost:9200/index-1/movie/_search" -d‘{   "query": {      "filtered": {         "query": {            "match_all": {}         },         "filter": {            "term": {               "cast.firstName": "keanu"            }         }      }   }}‘

Running the above query indeed returns The Matrix. The same is true if we try to find movies that have an actor with the first name "Keanu" and last name "Reeves":

curl -XPOST "http://localhost:9200/index-1/movie/_search" -d‘{   "query": {      "filtered": {         "query": {            "match_all": {}         },         "filter": {            "bool": {               "must": [                  {                     "term": {                        "cast.firstName": "keanu"                     }                  },                  {                     "term": {                        "cast.lastName": "reeves"                     }                  }               ]            }         }      }   }}‘

Or at least so it seems. However, let‘s see what happens if we search for movies with an actor with "Keanu" as first name and "Fishburne" as last name.

curl -XPOST "http://localhost:9200/index-1/movie/_search" -d‘{   "query": {      "filtered": {         "query": {            "match_all": {}         },         "filter": {            "bool": {               "must": [                  {                     "term": {                        "cast.firstName": "keanu"                     }                  },                  {                     "term": {                        "cast.lastName": "fishburne"                     }                  }               ]            }         }      }   }}‘

Clearly this should, at first glance, not match The Matrix as there‘s no such actor amongst its cast. However, ElasticSearch will return The Matrix for the above query. After all, the movie does contain an author with "Keanu" as first name and (albeit a different) actor with "Fishburne" as last name. Based on the above query it has no way of knowing that we want the two term filters to match the same unique object in the list of actors. And even if it did, the way the data is indexed it wouldn‘t be able to handle that requirement.

Nested mapping and filter to the rescue

Luckily ElasticSearch provides a way for us to be able to filter on multiple fields within the same objects in arrays; mapping such fields as nested. To try this out, let‘s create ourselves a new index with the "actors" field mapped as nested.

curl -XPUT "http://localhost:9200/index-2" -d‘{   "mappings": {      "movie": {         "properties": {            "cast": {               "type": "nested"            }         }      }   }}‘

After indexing the same movie document into the new index we can now find movies based on multiple properties of each actor by using a nested filter. Here‘s how we would search for movies starring an actor named "Keanu Fishburne":

curl -XPOST "http://localhost:9200/index-2/movie/_search" -d‘{   "query": {      "filtered": {         "query": {            "match_all": {}         },         "filter": {            "nested": {               "path": "cast",               "filter": {                  "bool": {                     "must": [                        {                           "term": {                              "firstName": "keanu"                           }                        },                        {                           "term": {                              "lastName": "fishburne"                           }                        }                     ]                  }               }            }         }      }   }}‘

As you can see we‘ve wrapped our initial bool filter in a nested filter. The nested filter contains a path property where we specify that the filter applies to the cast property of the searched document. It also contains a filter (or a query) which will be applied to each value within the nested property.

As intended, running the abobe query doesn‘t return The Matrix while modifying it to instead match "Reeves" as last name will make it match The Matrix. However, there‘s one caveat.

Including nested values in parent documents

If we go back to our very first query, filtering only on actors first names without using a nested filter, like the request below, we won‘t get any hits.

curl -XPOST "http://localhost:9200/index-2/movie/_search" -d‘{   "query": {      "filtered": {         "query": {            "match_all": {}         },         "filter": {            "term": {               "cast.firstName": "keanu"            }         }      }   }}‘

This happens because movie documents no longer have cast.firstName fields. Instead each element in the cast array is, internally in ElasticSearch, indexed as a separate document.

Obviously we can still search for movies based only on first names amongst the cast, by using nested filters though. Like this:

curl -XPOST "http://localhost:9200/index-2/movie/_search" -d‘{   "query": {      "filtered": {         "query": {            "match_all": {}         },         "filter": {            "nested": {               "path": "cast",               "filter": {                  "term": {                     "firstName": "keanu"                  }               }            }         }      }   }}‘

The above request returns The Matrix. However, sometimes having to use nested filters or queries when all we want to do is filter on a single property is a bit tedious. To be able to utilize the power of nested filters for complex criterias while still being able to filter on values in arrays the same way as if we hadn‘t mapped such properties as nested we can modify our mappings so that the nested values will also be included in the parent document. This is done using theinclude_in_parent property, like this:

curl -XPUT "http://localhost:9200/index-3" -d‘{   "mappings": {      "movie": {         "properties": {            "cast": {               "type": "nested",               "include_in_parent": true            }         }      }   }}‘

In an index such as the one created with the above request we‘ll both be able to filter on combinations of values within the same complex objects in the actors array using nested filters while still being able to filter on single fields without using nested filters. However, we now need to carefully consider where to use, and where to not use, nested filters in our queries as a query for "Keanu Fishburne" will match The Matrix using a regular bool filter while it won‘t when wrapping it in a nested filter. In other words, when using include_in_parent we may get unexpected results due to queries matching documents that it shouldn‘t if we forget to use nested filters.

PS. For updates about new posts, sites I find useful and the occasional rant you can follow me on Twitter. You are also most welcome to subscribe to the RSS-feed.

Array Type

Read the doc on elasticsearch.org

As its name suggests, it can be an array of native types (string, int, …) but also an array of objects (the basis used for “objects” and “nested”).

Here are some valid indexing examples :

{
    "Article" : [
      {
        "id" : 12
        "title" : "An article title",
        "categories" : [1,3,5,7],
        "tag" : ["elasticsearch", "symfony",‘Obtao‘],
        "author" : [
            {
                "firstname" : "Francois",
                "surname": "francoisg",
                "id" : 18
            },
            {
                "firstname" : "Gregory",
                "surname" : "gregquat"
                "id" : "2"
            }
        ]
      }
    },
    {
        "id" : 13
        "title" : "A second article title",
        "categories" : [1,7],
        "tag" : ["elasticsearch", "symfony",‘Obtao‘],
        "author" : [
            {
                "firstname" : "Gregory",
                "surname" : "gregquat",
                "id" : "2"
            }
        ]
      }
}

You can find different Array :

  • Categories : array of integers
  • Tags : array of strings
  • author : array of objects (inner objects or nested)

We explicitely specify this “simple” type as it can be more easy/maintainable to store a flatten value rather than the complete object.
Using a non relational structure should make you think about a specific model for your search engine :

  • To filter : If you just want to filter/search/aggregate on the textual value of an object, then flatten the value in the parent object.
  • To get the list of objects that are linked to a parent (and if you do not need to filter or index these objects), just store the list of ids and hydrate them with Doctrine and Symfony (in French for the moment).

Inner objects

The inner objects are just the JSON object association in a parent. For example, the “authors” in the above example. The mapping for this example could be :

fos_elastica:
    clients:
        default: { host: %elastic_host%, port: %elastic_port% }
    indexes:
        blog :
            types:
                article :
                    mappings:
                        title : ~
                        categories : ~
                        tag : ~
                        author :
                            type : object
                            properties :
                                firstname : ~
                                surname : ~
                                id :
                                    type : integer

You can Filter or Query on these “inner objects”. For example :

query: author.firstname=Francois will return the post with the id 12 (and not the one with the id 13).

You can read more on the Elasticsearch website

Inner objects are easy to configure. As Elasticsearch documents are “schema less”, you can index them without specify any mapping.

The limitation of this method lies in the manner as ElasticSearch stores your data. Reusing the above example, here is the internal representation of our objects :

[
      {
        "id" : 12
        "title" : An article title",
        "categories" : [1,3,5,7],
        "tag" : ["elasticsearch", "symfony",‘Obtao‘],
        "author.firstname" : ["Francois","Gregory"],
        "author.surname" : ["Francoisg","gregquat"],
        "author.id" : [18,2]
      }
      {
        "id" : 13
        "title" : "A second article",
        "categories" : [1,7],
        "tag" : ["elasticsearch", "symfony",‘Obtao‘],
        "author.firstname" : ["Gregory"],
        "author.surname" : ["gregquat"],
        "author.id" : [2]
      }
]

The consequence is that the query :

{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "term": {
          "firstname": "francois",
          "surname": "gregquat"
        }
      }
    }
  }
}

author.firstname=Francois AND surname=gregquat will return the document “12″. In the case of an inner object, this query can by translated as “Who has at least one author.surname = gregquat and one author.firstname=francois”.

To fix this problem, you must use the nested.

Les nested

First important difference : nested must be specified in your mapping.

The mapping looks like an object one, only the type changes :

fos_elastica:
    clients:
        default: { host: %elastic_host%, port: %elastic_port% }
    indexes:
        blog :
            types:
                article :
                    mappings:
                        title : ~
                        categories : ~
                        tag : ~
                        author :
                            type : nested
                            properties :
                                firstname : ~
                                surname : ~
                                id :
                                    type : integer

This time, the internal representation will be :

[
      {
        "id" : 12
        "title" : "An article title",
        "categories" : [1,3,5,7],
        "tag" : ["elasticsearch", "symfony",‘Obtao‘],
        "author" : [{
            "firstname" : "Francois",
            "surname" : "Francoisg",
            "id" : 18
        },
        {
            "firstname" : "Gregory",
            "surname" : "gregquat",
            "id" : 2
        }]
      },
      {
        "id" : 13
        "title" : "A second article title",
        "categories" : [1,7],
        "tags" : ["elasticsearch", "symfony",‘Obtao‘],
        "author" : [{
            "firstname" : "Gregory",
            "surname" : "gregquat",
            "id" : 2
        }]
      }
]

This time, we keep the object structure.

Nested have their own filters which allows to filter by nested object. If we go on with our example (with the limitation of inner objects), we can write this query :

{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "nested" : {
          "path" : "author",
          "filter": {
            "bool": {
              "must": [
                {
                  "term" : {
                    "author.firsname": "francois"
                  }
                },
                {
                  "term" : {
                    "author.surname": "gregquat"
                  }
                }
              ]
            }
          }
        }
      }
    }
  }
}

hi
We can translate it as “Who has an author object whose surname is equal to ‘gregquat’ and whose firstname is ‘francois’”. This query will return no result.

There is still a problem which is penalizing when working with bug objects : when you want to change a single value of the nester, you have to reindex the whole parent document (including the nested).
If the objects are heavy, and often updated, the impact on performances can be important.

To fix this problem, you can use the parent/child associations.

Parent/Child

Parent/child associations are very similar to OneToMany relationships (one parent, several children).
The relationship remains hierarchical : an object type is only associated to one parent, and it’s impossible to create a ManyToMany relationship.

We are going to link our article to a category :

fos_elastica:
    clients:
        default: { host: %elastic_host%, port: %elastic_port% }
    indexes:
        blog :
            types:
                category :
                    mappings :
                        id : ~
                        name : ~
                        description : ~
                article :
                    mappings:
                        title : ~
                        tag : ~
                        author : ~
                    _routing:
                        required: true
                        path: category
                    _parent:
                        type : "category"
                        identifier: "id" #optional as id is the default value
                        property : "category" #optional as the default value is the type value

When indexing an article, a reference to the Category will also be indexed (category.id).
So, we can index separately categories and article while keeping the references between them.

Like for nested, there are Filters and Queries that allow to search on parents or children :

  • Has Parent Filter / Has Parent Query : Filter/query on parent fields, returns children objects. In our case, we could filter articles whose parent category contains “symfony” in his description.
  • Has Child Filter / Has Child Query : Filter/query on child fields, returns the parent object. In our case, we could filter Categories for which “francoisg” has written an article.
{
  "query": {
    "has_child": {
      "type": "article",
      "query" : {
        "filtered": {
          "query": { "match_all": {}},
          "filter" : {
              "term": {"tag": "symfony"}
          }
        }
      }
    }
  }
}

This query will return the Categories that have at least one article tagged with “symfony”.

The queries are here written in JSON, but are easily transformable into PHP with the Elastica library.

时间: 2024-10-21 19:36:50

ElasticSearch 嵌套映射和过滤器及查询的相关文章

ElasticSearch(七)--请求体查询

简单查询lite search (字符串查询)是一种有效的命令行ad hoc 查询,但是想要善用搜索,必须使用请求体查询request  body search API.之所以这么称呼,是因为大多数的参数以JSON格式所容纳,而不是查询字符串. 请求体查询不但可以处理查询,而且还可以高亮返回结果中的片段. 1.空查询 GET _search {} 同字符串查询一样,你可以查询一个,或多个索引及类型 GET /index_2014*/type1,type2/_search {} 也可以使用from

Django内建模版标签和过滤器

第四章列出了许多的常用内建模板标签和过滤器.然而,Django自带了更多的内建模板标签及过滤器.这章附录列出了截止到编写本书时,Django所包含的各个内建模板标签和过滤器,但是,新的标签是会被定期地加入的. 对于提供的标签和过滤器,最好的参考就是直接进入你的管理界面.Django的管理界面包含了一份针对当前站点的所有标签和过滤器的完整参考.想看到它的话,进入你的管理界面,单击右上角的Documentation(文档)链接. 内建文档中的“标签和过滤器”小节阐述了所有内建标签(事实上,本附录中的

.Net Core中间件和过滤器实现错误日志记录

1.中间件的概念 ASP.NET Core的处理流程是一个管道,中间件是组装到应用程序管道中用来处理请求和响应的组件. 每个中间件可以: 选择是否将请求传递给管道中的下一个组件. 可以在调用管道中的下一个组件之前和之后执行业务逻辑. 中间件是一个请求委托( public delegate Task RequestDelegate(HttpContext context) )的实例,所以中间件的本质就是一个方法,方法的参数是HttpContext,返回Task.传入的HttpContext参数包含

Elasticsearch入门教程(三):Elasticsearch索引&映射

原文:Elasticsearch入门教程(三):Elasticsearch索引&映射 版权声明:本文为博主原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明. 本文链接:https://blog.csdn.net/vbirdbest/article/details/79213163 索引概念简介 通常说的索引有两种词性,名称和动词. 动词索引indexing,索引一个文档,表示把一个文档存储到索引Index里,可以用来查询和检索,es采用倒排索引 名词索引index,

django 内建标签和过滤器参考

下面的标签和过滤器参考就是为那些没有 admin 站点的可用的人准备的.由于 Django 是高度可定制的,你的 admin 里的关于标签和过滤器的参考可以认为是最可信的. 内建标签参考 block 定义一个能被子模板覆盖的 块. 参阅 模板继承 了解更多信息 comment 注释.模板引擎会忽略掉 {% comment %} 和 {% endcomment %} 之间的所有内容. cycle 在循环时轮流使用给定的字符串列表中的值. 在一个循环中, 在循环过程中的每次循环里轮流使用给定的字符串

HBase概念学习(三)Java API之扫描和过滤器

HBase基本的CRUD操作就不多介绍了,无非就是Put,Get,Delete三个类的运用. 本文相当于是阅读HBase权威指南的总结. 一.扫描(Scan) 现在看一下扫描技术,这种技术类似于关系型数据库的游标(cursor),并利用到了HBase底层顺序存储的特性. 使用扫描的一般步骤是: 1.创建Scan实例 2.为Scan实例增加扫描的限制条件 3.调用HTable的getScanner()方法获取ResultScanner对象 4.迭代ResultScanner对象中的Result对象

[Elasticsearch] 全文搜索 (二) - 多词查询及查询的合并

多词查询(Multi-word Queries) 如果我们一次只能搜索一个词,那么全文搜索就会显得相当不灵活.幸运的是,通过match查询来实现多词查询也同样简单: GET /my_index/my_type/_search { "query": { "match": { "title": "BROWN DOG!" } } } 以上的查询会返回所有的四份文档: { "hits": [ { "_id

vue.js基础知识篇(1):简介、数据绑定、指令、计算属性、表单控件绑定和过滤器

目录第一章:vue.js是什么? 代码链接: http://pan.baidu.com/s/1qXCfzRI 密码: 5j79 第一章:vue.js是什么? 1.vue.js是MVVM框架 MVVM的代表框架是Angular.js,以及vue.js. MVVM的view和model是分离的,View的变化会自动更新到ViewModel上,ViewModel的变化会自动同步到View上显示.这种自动同步依赖于ViewModel的属性实现了Observer. 2.它与angular.js的区别 相同

Oracle Sales Cloud:报告和分析(BIEE)小细节2——利用变量和过滤器传参(例如,根据提示展示不同部门的数据)

在上一篇随笔中,我们建立了部门和子部门的双提示,并将部门和子部门做了关联.那么,本篇随笔我们重点介绍利用建好的双提示进行传参. 在操作之前,我们来看一个报告和分析的具体需求: [1] 两个有关联的提示:部门和子部门. 1.部门包括北京销售部.郑州销售部(此处为小的模拟例子,具体项目实施中可以根据相应字段选取): 2.子部门包括北京销售1部.北京销售2部.郑州销售1组.郑州销售2组: 3.当部门的值为北京销售部时,子部门的值列表范围为:北京销售1部.北京销售2部:郑州销售部同理. (已满足,并且: