explore your hadoop data and get real-time results

deep api integration makes getting value from your big data easy

深度api集成使你大数据访问更加容易

Elasticsearch is quickly becoming the de facto search and analytics solution that organizations are using to provide real-time insights into their Hadoop data. Elasticsearch for Hadoop—affectionately known as es-hadoop—is a two-way connector that lets you
index data into Elasticsearch and query it in real time. With a native API implementation, fast indexing, and a rich query language, es-hadoop is optimized for performance and efficiency, making it an elegant solution for your big data projects. With support
for a wide range of libraries, Elasticsearch helps you to make better use of your data across the entire Hadoop ecosystem.

data can seamlessly move between Elasticsearch and Hadoop

  • Index directly into Elasticsearch from Hadoop 直接对hadoop上的数据建立索引

    The native integration allows you to efficiently push data into Elasticsearch using the existing Hadoop tools you know and love ,原生态的集成允许你通过你喜欢的hadoop工具将数据推送到ElasticSearch中

  • Query Elasticsearch from Hadoop从hadoop查询Elasticsearch

    The rich query API of Elasticsearch allows you to ask complex questions and use the real-time results in Hadoop.Elasticsearch丰富的查询api支持你迅速取得对hadoop的复杂查询结果。

  • Use HDFS as a long-term archive for Elasticsearch使用HDFS对Elasticsearch索引长期存档

    es-hadoop allows Elasticsearch to push backup data to HDFS using the built-in snapshot and restore capability.es-hadoop插件允许es推送备份数据到HDFS通过使用快照的方式和恢复这些数据到es

how people are using Elasticsearch and Hadoop

      • Klout Queries Over 400M Users’ Data To Build Marketing Campaigns

        Using HDFS to store user data and index it into Elasticsearch, Klout builds real-time targeted marketing campaigns that are generated in seconds rather than minutes.

      • MutualMind Replaces 15-Minute Batch Process with Real-Time Analysis

        With customers like AT&T, Kraft, Nestle, and Starbucks interested in keeping a pulse on their brands, MutualMind uses Elasticsearch to get quick insight and Hadoop for batch-based statistical analysis.

      • International Financial Services Firm Quickly Analyzes Access Logs

        Instead of waiting hours to run MapReduce jobs to analyze access logs, a global financial institution gets value from its data with Elasticsearch in minutes—and even increased the quantity of log data it processed from one hour to a full week.

works with any flavor of Hadoop distribution

We are official partners with a number of organizations within the Hadoop ecosystem, including Cloudera, MapR, Hortonworks, Databricks, and Concurrent. Whether you’re using vanilla Hadoop, or other distributions like CDH,
HDP, and MapR, Elasticsearch has got you covered. As an added bonus, we are also certified on Cloudera Enterprise 5 and are Certified Technology Partners with Hortonworks.

take a look under the hood

visualize your big data

Elasticsearch works with the visualization tool Kibana to help you explore your big data with in real time. With beautifully designed graphs, charts, and maps, Kibana transforms your data into real-time, customizable dashboards that let you visualize the value
of your data.

leave the real-time analytics to us

Gone are the days of waiting hours or more for a batch process to run in order to get insight into your Hadoop data. Elasticsearch provides responses in milliseconds, which can significantly reduce a Hadoop job’s execution time and the cost associated with
it, especially on “rented resources” such as Amazon EMR or EC2.

ask more sophisticated questions

Elasticsearch provides a robust query DSL that lets users to ask sophisticated questions that result in more complete answers, faster.

prepared for when things go awry

Elasticsearch is designed to tolerate hardware failures. Es-hadoop continues communicating with the cluster, even when failures occur.

added efficiency with our native integration

Elasticsearch is natively integrated with Hadoop so there is no gap for the user to bridge. We provide a dedicated Input and Output format for vanilla MapReduce, taps for reading and writing data in Cascading, storages for Pig and Hive, a native Spark Resilient
Distributed Dataset (RDD) for both Java and Scala, and support for Storm’s bolt and spout abstractions so you can access Elasticsearch just as if the data were in HDFS.

enhance your workflow to get the best of both worlds

Get maximum flexibility with the es-hadoop connector by leveraging everything that Hadoop has to offer (via MapReduce, Hive, Pig, Cascading, Spark, and Storm) and combining it with a real-time search and analytics capability of Elasticsearch.

need to grow? just add more nodes.

Elasticsearch can be scaled in the same way as your Hadoop cluster – add more Elasticsearch nodes and the data will be automatically re-balanced.

原文网址:http://www.elasticsearch.com/products/hadoop/

时间: 2024-10-24 09:38:02

explore your hadoop data and get real-time results的相关文章

hadoop data join

概念: Hadoop有一个叫DataJoin的包为Data Join提供相应的框架.它的Jar包存在于contrib/datajoin/hadoop-*-datajoin. 为区别于其他的data join技术,我们称其为reduce-side join.(因为我们在reducer上作大多数的工作) reduce-side join引入了一些术语及概念: 1.Data Source:基本与关系数据库中的表相似,形式为:(例子中为CSV格式) Customers  Orders 1,Stephan

Six Key Hadoop Data Types

1. Sentiment  How your customers feel Understand how your coustomer feel about your brand and products right now. 2. Clickstream Website visitors' data Capture and analyze website visitors' data trails and optimize your website. 3. Sensor/Machine Dat

【Repost】A Practical Intro to Data Science

Are you a interested in taking a course with us? Learn about our programs or contact us at [email protected]. There are plenty of articles and discussions on the web about what data science is, what qualitiesdefine a data scientist, how to nurture th

10 tools and platforms for data preparation

10 tools and platforms for data preparation Traditional approaches to enterprise reporting, analysis and Business Intelligence such as Data Warehousing, upfront modelling and ETL have given way to new, more agile tools and ideas. Within this landscap

Choosing Between ElasticSearch, MongoDB & Hadoop

An interesting trend has been developing in the IT landscape over the past few years.  Many new technologies develop and immediately latch onto the "Big Data" buzzword.  And as older technologies add "Big Data" features in an attempt t

测试搭建成功的单机hadoop环境

1.关闭防火墙service iptables stop,(已经这是开机关闭的忽略) 2.进入hadoop目录,修改hadoop配置文件(4个) core-site.xml <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost.localdomain:8020</value> </property> <property

基于OGG的Oracle与Hadoop集群准实时同步介绍

Oracle里存储的结构化数据导出到Hadoop体系做离线计算是一种常见数据处置手段.近期有场景需要做Oracle到Hadoop体系的实时导入,这里以此案例做以介绍.Oracle作为商业化的数据库解决方案,自发性的获取数据库事务日志等比较困难,故选择官方提供的同步工具OGG(Oracle GoldenGate)来解决. 安装与基本配置 环境说明 软件配置 角色 数据存储服务及版本 OGG版本 IP 源服务器 OracleRelease11.2.0.1 Oracle GoldenGate 11.2

Apache Hadoop集群离线安装部署(一)——Hadoop(HDFS、YARN、MR)安装

虽然我已经装了个Cloudera的CDH集群(教程详见:http://www.cnblogs.com/pojishou/p/6267616.html),但实在太吃内存了,而且给定的组件版本是不可选的,如果只是为了研究研究技术,而且是单机,内存较小的情况下,还是建议安装Apache的原生的集群拿来玩,生产上自然是Cloudera的集群,除非有十分强大的运维. 我这次配了3台虚拟机节点.各给了4G,要是宿主机内存就8G的,可以搞3台2G,应该也是ok的. 〇.安装文件准备 Hadoop 2.7.3:

hadoop配置名称节点HA原理

Architecture In a typical HA clusiter, two separate machines are configured as NameNodes. At any point in time, exactly one of the NameNodes is in an Active state, and the other is in a Standby state. The Active NameNode is responsible for all client