elasticSearch indices VS type

elasticSearch 的中文文档 http://es.xiaoleilu.com/010_Intro/05_What_is_it.html

https://www.elastic.co/blog/index-vs-type

Who has never wondered whether new data should be put into a new type of an existing index, or into a new index? This is a recurring question for new users, that can’t be answered without understanding how both are implemented.

In the past we tried to make elasticsearch easier to understand by building an analogy with relational databases: indices would be like a database, and types like a table in a database. This was a mistake: the way data is stored is so different that any comparisons can hardly make sense, and this ultimately led to an overuse of types in cases where they were more harmful than helpful.

What is an index?

An index is stored in a set of shards, which are themselves Lucene indices. This already gives you a glimpse of the limits of using a new index all the time: Lucene indices have a small yet fixed overhead in terms of disk space, memory usage and file descriptors used. For that reason, a single large index is more efficient than several small indices: the fixed cost of the Lucene index is better amortized across many documents.

Another important factor is how you plan to search your data. While each shard is searched independently, Elasticsearch eventually needs to merge results from all the searched shards. For instance if you search across 10 indices that have 5 shards each, the node that coordinates the execution of a search request will need to merge 5x10=50 shard results. Here again you need to be careful: if there are too many shard results to merge and/or if you ran an heavy request that produces large shard responses (which can easily happen with aggregations), the task of merging all these shard results can become very resource-intensive, both in terms of CPU and memory. Again this would advocate for having fewer indices.

What is a type?

This is where types help: types are a convenient way to store several types of data in the same index, in order to keep the total number of indices low for the reasons exposed above. In terms of implementation it works by adding a “_type” field to every document that is automatically used for filtering when searching on a specific type. One nice property of types is that searching across several types of the same index comes with no overhead compared to searching a single type: it does not change how many shard results need to be merged.

However this comes with limitations as well:

  • Fields need to be consistent across types. For instance if two fields have the same name in different types of the same index, they need to be of the same field type (string, date, etc.) and have the same configuration.
  • Fields that exist in one type will also consume resources for documents of types where this field does not exist. This is a general issue with Lucene indices: they don’t like sparsity. Sparse postings lists can’t be compressed efficiently because of high deltas between consecutive matches. And the issue is even worse with doc values: for speed reasons, doc values often reserve a fixed amount of disk space for every document, so that values can be addressed efficiently. This means that if Lucene establishes that it needs one byte to store all value of a given numeric field, it will also consume one byte for documents that don’t have a value for this field. Future versions of Elasticsearch will have improvements in this area but I would still advise you to model your data in a way that will limit sparsity as much as possible.
  • Scores use index-wide statistics, so scores of documents in one type can be impacted by documents from other types.

This means types can be helpful, but only if all types from a given index have mappings that are similar. Otherwise, the fact that fields also consume resources in documents where they don’t exist could make things worse than if the data had been stored in separate indices.

Which one should I use?

This is a tough question, and the answer will depend on your hardware, data and use-case. First it is important to realize that types are useful because they can help reduce the number of Lucene indices that Elasticsearch needs to manage. But there is another way that you can reduce this number: creating indices that have fewer shards. For instance, instead of folding 5 types into the same index, you could create 5 indices with 1 primary shard each.

I will try to summarize the questions you should ask yourself to make a decision:

  • Are you using parent/child? If yes this can only be done with two types in the same index.
  • Do your documents have similar mappings? If no, use different indices.
  • If you have many documents for each type, then the overhead of Lucene indices will be easily amortized so you can safely use indices, with fewer shards than the default of 5 if necessary.
  • Otherwise you can consider putting documents in different types of the same index. Or even in the same type.

In conclusion, you may be surprised that there are not as many use cases for types as you expected. And this is right: there are actually few use cases for having several types in the same index for the reasons that we mentioned above. Don’t hesitate to allocate different indices for data that would have different mappings, but still keep in mind that you should keep a reasonable number of shards in your cluster, which can be achieved by reducing the number of shards for indices that don’t require a high write throughput and/or will store low numbers of documents.

时间: 2024-12-13 00:08:23

elasticSearch indices VS type的相关文章

在ElasticSearch中,集群(Cluster),节点(Node),分片(Shard),Indices(索引),replicas(备份)之间是什么关系?

最近在知乎上看到了这个问题,自己也搞了半学期的Elasticsearch,于是就想用自己所知道的浅陋知识来回答一下这个问题. Cluster包含多个node,Indices不应该理解成动词索引,Indices可理解成关系数据库中的databases,Indices可包含多个Index,Index对应关系数据库中的database,它是用来存储相关文档的.Elasticsearch与关系数据的类比对应关系如下: Relational DB ⇒ Databases ⇒ Tables ⇒ Rows ⇒

elasticsearch index 之 put mapping

mapping机制使得elasticsearch索引数据变的更加灵活,近乎于no schema.mapping可以在建立索引时设置,也可以在后期设置.后期设置可以是修改mapping(无法对已有的field属性进行修改,一般来说只是增加新的field)或者对没有mapping的索引设置mapping.put mapping操作必须是master节点来完成,因为它涉及到集群matedata的修改,同时它跟index和type密切相关.修改只是针对特定index的特定type. 在Action su

ElasticSearch的基本用法与集群搭建

ElasticSearch的基本用法与集群搭建 一.简介 ElasticSearch和Solr都是基于Lucene的搜索引擎,不过ElasticSearch天生支持分布式,而Solr是4.0版本后的SolrCloud才是分布式版本,Solr的分布式支持需要ZooKeeper的支持. 这里有一个详细的ElasticSearch和Solr的对比:http://solr-vs-elasticsearch.com/ 二.基本用法 Elasticsearch集群可以包含多个索引(indices),每一个索

(转)Elasticsearch .net client NEST使用说明 2.x

Elasticsearch.Net与NEST是Elasticsearch为C#提供的一套客户端驱动,方便C#调用Elasticsearch服务接口.Elasticsearch.Net是较基层的对Elasticsearch服务接口请求响应的实现,NEST是在前者基础之上进行的封装.本文是针对NEST的使用的总结. 引用 Install-Package NEST 包含以下dll NEST.dll Elasticsearch.Net.dll   Newtonsoft.Json.dll 概念 存储结构:

Elasticsearch 学习笔记1 入门

参考资料 官方学习文档:https://www.elastic.co/guide/index.html 入门中文权威指南: http://es.xiaoleilu.com/ 1 认识es elasticsearch功能: - 全文检索(可以解决DB检索问题(文本检索性能,全文检索等)) - 分布式的实时文件存储,每个字段都被索引并可被搜索(牛逼) - 分布式的实时分析搜索引擎(实时性能好) - 可以扩展到上百台服务器,处理PB级结构化或非结构化数据(容易扩展,支持大数据量) 基于apache l

ElasticSearch Recovery 分析

上周出现了一次故障,recovery的过程比较慢,然后发现Shard 在做恢复的过程一般都是卡在TRANSLOG阶段,所以好奇这块是怎么完成的,于是有了这篇文章 这是一篇源码分析类的文章,大家需要先建立一个整体的概念,建议参看 这篇文章 另外你可能还需要了解下 Recovery 阶段迁移过程: INIT -> INDEX -> VERIFY_INDEX -> TRANSLOG -> FINALIZE -> DONE 概览 Recovery 其实有两种: Primary的迁移/

Elasticsearch的学习笔记

在介绍Elasticsearch的用法之前先讲讲为什么要用它吧.首先学习搜索引擎,肯定不可避免的都听过lucene,solr和Elasticsearch都是基于它的.spinx文章很多,但是数据库的入侵性太强(插件模式).Elasticsearch是当下最流行的分布式搜索引擎之一.solr也稍微玩过,文章也多.同时也希望能通过Elasticsearch进一步学习完善自己对于分布式的学习.更深入的同学可以考虑开始学习ELK(Elasticsearch, Logstash, Kibana). 推荐:

Heka–>Elasticsearch 索引数据过程的优化

Heka 的参数配置跟Elasticsearch的参数没有关系,Heka只负责按照配置发送数据,所以索引的优化主要在 Elaticsearch端来完成. 下面是Elasticsearch的一些相关概念和知识点: 一些概念 在Elasticsearch中,文档归属于一种类型(type),而这些类型存在于索引(index)中,我们可以画一些简单的对比图来类比传统关系型数据库: Relational DB -> Databases -> Tables -> Rows -> Columns

全文搜索引擎Elasticsearch初探

前言: 在Web应用或后台数据管理中,随着数据量的倍数增长,搜索引擎特别是全文搜索引擎的应用越来越迫切.基于技术和成本考虑,我们不可能去开发一个搜索引擎以满足我们的需求,庆幸的是业界已有许多优秀的开源搜索引擎可供我们使用,Elasticsearch便是其中之一. 简介: Elasticsearch是一个基于Apache Lucene(TM)的开源搜索引擎.无论在开源还是专有领域,Lucene可以被认为是迄今为止最先进.性能最好的.功能最全的搜索引擎库.但是,Lucene只是一个库.想要使用它,你