Elasticsearch - glossary

From http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/glossary.html

glossary of terms

analysis

Analysis is the process of converting full text to terms. Depending on which analyzer is used, these phrases: FOO BARFoo-Barfoo,bar will probably all result in the terms foo and bar. These terms are what is actually stored in the index. A full text query (not a term query) for FoO:bAR will also be analyzed to the terms foo,bar and will thus match the terms stored in the index. It is this process of analysis (both at index time and at search time) that allows elasticsearch to perform full text queries. Also see text and term.

cluster

A cluster consists of one or more nodes which share the same cluster name. Each cluster has a single master node which is chosen automatically by the cluster and which can be replaced if the current master node fails.

document

A document is a JSON document which is stored in elasticsearch. It is like a row in a table in a relational database. Each document is stored in an index and has a type and an id. A document is a JSON object (also known in other languages as a hash / hashmap / associative array) which contains zero or more fields, or key-value pairs. The original JSON document that is indexed will be stored in the _source field, which is returned by default when getting or searching for a document.

id

The ID of a document identifies a document. The index/type/id of a document must be unique. If no ID is provided, then it will be auto-generated. (also see routing)

field

document contains a list of fields, or key-value pairs. The value can be a simple (scalar) value (eg a string, integer, date), or a nested structure like an array or an object. A field is similar to a column in a table in a relational database. The mapping for each field has a field type (not to be confused with document type) which indicates the type of data that can be stored in that field, eg integerstringobject. The mapping also allows you to define (amongst other things) how the value for a field should be analyzed.

index

An index is like a database in a relational database. It has a mapping which defines multipletypes. An index is a logical namespace which maps to one or more primary shards and can have zero or more replica shards.

mapping

A mapping is like a schema definition in a relational database. Each index has a mapping, which defines each type within the index, plus a number of index-wide settings. A mapping can either be defined explicitly, or it will be generated automatically when a document is indexed.

node

A node is a running instance of elasticsearch which belongs to a cluster. Multiple nodes can be started on a single server for testing purposes, but usually you should have one node per server. At startup, a node will use unicast (or multicast, if specified) to discover an existing cluster with the same cluster name and will try to join that cluster.

primary shard

Each document is stored in a single primary shard. When you index a document, it is indexed first on the primary shard, then on all replicas of the primary shard. By default, an index has 5 primary shards. You can specify fewer or more primary shards to scale the number ofdocuments that your index can handle. You cannot change the number of primary shards in an index, once the index is created. See also routing

replica shard

Each primary shard can have zero or more replicas. A replica is a copy of the primary shard, and has two purposes:

  1. increase failover: a replica shard can be promoted to a primary shard if the primary fails
  2. increase performance: get and search requests can be handled by primary or replica shards. By default, each primary shard has one replica, but the number of replicas can be changed dynamically on an existing index. A replica shard will never be started on the same node as its primary shard.

routing

When you index a document, it is stored on a single primary shard. That shard is chosen by hashing the routing value. By default, the routing value is derived from the ID of the document or, if the document has a specified parent document, from the ID of the parent document (to ensure that child and parent documents are stored on the same shard). This value can be overridden by specifying a routing value at index time, or a routing field in the mapping.

shard

A shard is a single Lucene instance. It is a low-level “worker” unit which is managed automatically by elasticsearch. An index is a logical namespace which points to primary andreplica shards. Other than defining the number of primary and replica shards that an index should have, you never need to refer to shards directly. Instead, your code should deal only with an index. Elasticsearch distributes shards amongst all nodes in the cluster, and can move shards automatically from one node to another in the case of node failure, or the addition of new nodes.

source field

By default, the JSON document that you index will be stored in the _source field and will be returned by all get and search requests. This allows you access to the original object directly from search results, rather than requiring a second step to retrieve the object from an ID. Note: the exact JSON string that you indexed will be returned to you, even if it contains invalid JSON. The contents of this field do not indicate anything about how the data in the object has been indexed.

term

A term is an exact value that is indexed in elasticsearch. The terms fooFooFOO are NOT equivalent. Terms (i.e. exact values) can be searched for using term queries. See also text andanalysis.

text

Text (or full text) is ordinary unstructured text, such as this paragraph. By default, text will beanalyzed into terms, which is what is actually stored in the index. Text fields need to be analyzed at index time in order to be searchable as full text, and keywords in full text queries must be analyzed at search time to produce (and search for) the same terms that were generated at index time. See also term and analysis.

type

A type is like a table in a relational database. Each type has a list of fields that can be specified for documents of that type. The mapping defines how each field in the document is analyzed.

时间: 2024-10-11 05:04:36

Elasticsearch - glossary的相关文章

全文搜索之 Elasticsearch

概述 Elasticsearch (ES)是一个基于 Lucene 的开源搜索引擎,它不但稳定.可靠.快速,而且也具有良好的水平扩展能力,是专门为分布式环境设计的. 特性 安装方便:没有其他依赖,下载后安装非常方便:只用修改几个参数就可以搭建起来一个集群 JSON:输入/输出格式为 JSON,意味着不需要定义 Schema,快捷方便 RESTful:基本所有操作(索引.查询.甚至是配置)都可以通过 HTTP 接口进行 分布式:节点对外表现对等(每个节点都可以用来做入口):加入节点自动均衡 多租户

使用logstash+elasticsearch+kibana快速搭建日志平台

日志的分析和监控在系统开发中占非常重要的地位,系统越复杂,日志的分析和监控就越重要,常见的需求有: 根据关键字查询日志详情 监控系统的运行状况 统计分析,比如接口的调用次数.执行时间.成功率等 异常数据自动触发消息通知 基于日志的数据挖掘 很多团队在日志方面可能遇到的一些问题有: 开发人员不能登录线上服务器查看详细日志,经过运维周转费时费力 日志数据分散在多个系统,难以查找 日志数据量大,查询速度慢 一个调用会涉及多个系统,难以在这些系统的日志中快速定位数据 数据不够实时 常见的一些重量级的开源

ElasticSearch

一.概述 1.简介 ElasticSearch是一个基于Lucene实现的开源.分布式.Restful的全文本搜索引擎:此外,它还是一个分布式实时文档存储,其中每个文档的每个field均是被索引的数据,且可被搜索:也是一个带实时分析功能的分布式搜索引擎,能够扩展至数以百计的节点实时处理PB级的数据. 应用场景:当我们建立一个网站或应用程序,并要添加搜索功能,但是想要完成搜索工作的创建是非常困难的.我们希望搜索解决方案要运行速度快.能有一个零配置和一个完全免费的搜索模式.能够简单地使用JSON通过

学习elasticsearch(一)linux环境搭建(2)——启动elasticsearch

在启动访问es的过程中遇到了各种的奇葩问题. 1.网上各种版本的启动方式让人眼花缭乱不知如何启动.简单粗暴--到es的bin目录下直接 执行 ./elasticsearch //显示启动,ctrl+c可停止,如要操作,换个终端 ./elasticsearch -d 后台启动,可在当前终端继续操作 //后台启动,如要停止执行 kill -9 pid //哈哈,直接杀掉进程 //搜索es进程pid可以酱紫 ps aux | grep elasticsearch //注意,不确定那个是pid的话多执行

ELK学习笔记(一)安装Elasticsearch、Kibana、Logstash和X-Pack

最近在学习ELK的时候踩了不少的坑,特此写个笔记记录下学习过程. 日志主要包括系统日志.应用程序日志和安全日志.系统运维和开发人员可以通过日志了解服务器软硬件信息.检查配置过程中的错误及错误发生的原因.经常分析日志可以了解服务器的负荷,性能安全性,从而及时采取措施纠正错误. 通常,日志被分散的储存不同的设备上.如果你管理数十上百台服务器,你还在使用依次登录每台机器的传统方法查阅日志.这样是不是感觉很繁琐和效率低下.当务之急我们使用集中化的日志管理,例如:开源的syslog,将所有服务器上的日志收

elasticsearch index 之 put mapping

mapping机制使得elasticsearch索引数据变的更加灵活,近乎于no schema.mapping可以在建立索引时设置,也可以在后期设置.后期设置可以是修改mapping(无法对已有的field属性进行修改,一般来说只是增加新的field)或者对没有mapping的索引设置mapping.put mapping操作必须是master节点来完成,因为它涉及到集群matedata的修改,同时它跟index和type密切相关.修改只是针对特定index的特定type. 在Action su

Elasticsearch VS Solr

最近公司用到了ES搜索引擎,调研发现大公司常用的搜索引擎还有Solr. 鉴于 Lucene 强大的特性和稳定性,有很多种基于 Lucene 封装的企业级搜索平台.其中最流行有两个:Apache Solr 和 Elastic search. Apache Solr:它本身是 Apache Lucene 项目下的开源企业搜索平台,算是 Lucene 的直系.美团.阿里搜索服务是基于 Solr 来搭建的. Elastic Search:简称 ES,由 Elastic 公司开发.Elastic 成立于

Elasticsearch实践指南

http://nginxs.blog.51cto.com/ 从2014年到现在接触ES(Elasticsearch)已经两年多了,感触良多尤其ES的开盒即用特性完全区别于之前接触复杂的hadoop和solor.ES不需要你对它了解就能很快入门,而且ES的实时搜索,自动拓展,自愈功能深深吸引我.最近很多朋友也开始使用向我问了很多常见问题,我在这总结了一些使用中踩过的坑希望大家对ES有更多的了解. 简介 Elasticsearch是基于Lucene开发的一个准实时搜索服务,搜索延时在秒级.ES存储主

[Elasticsearch] 关于字段重复值的常用查询和操作总结

1. 取得某个索引中某个字段中的所有出现过的值 这种操作类似于使用SQL的SELECT UNIQUE语句.当需要获取某个字段上的所有可用值时,可以使用terms聚合查询完成: GET /index_streets/_search?search_type=count { "aggs": { "street_values": { "terms": { "field": "name.raw", "siz