hadoop发行版本

Azure HDInsight

Azure HDInsight is Microsoft‘s distribution of Hadoop. The Azure HDInsight ecosystem includes the following features/components: Pig, Hive, Hbase, Sqoop, Oozie, Ambari, Microsoft Avro Library, YARN, Cluster Dashboard and Tez.

Apart from the above listed features/components, there are a few other components which enable reporting and analytics on top of data present in Azure HDInsight. These components include the following:

More information: http://azure.microsoft.com/en-us/documentation/articles/hdinsight-introduction

Here are few highlights of Azure HDInsight:

  • Azure HDInsight is based on Hortonworks Data Platform.
  • Azure HDInsight enables Apache Hadoop as a service in Microsoft Azure cloud thereby leveraging all the benefits of cloud computing.
  • Azure HDInsight offers strong support for PowerShell via HDInsight PowerShell Cmdlets.
  • Windows Azure and HDInsight PowerShell Cmdlets can be used to perform various activities including uploading, downloading, movement of data to and from Azure Blob Storage and On-Premise file systems, configuring/executing/post-processing jobs on HDInsight, and other related activities.
  • Azure HDInsight being a Hadoop service in the cloud, one can provision a cluster, process the data, and destroy the cluster and pay for only the resources used.
  • Microsoft also offers an HDInsight Emulator which allows developers to explore HDInsight on premise without requiring an Azure Account.

Links & Additional Information

Getting Started


Cloudera

Cloudera was the first company to be formed to build enterprise solutions based on Hadoop. Cloudera has a Hadoop distribution known as Cloudera‘s Distribution for Hadoop (CDH). Here is a simplified representation of Cloudera‘s Hadoop Ecosystem.

Source: http://www.cloudera.com/content/cloudera/en/products-and-services/cdh.html

Cloudera‘s Hadoop Ecosystem includes the following features/components: Apache Avro, Apache Crunch, Apache DataFu, Apache Flume, Apache Hadoop, Apache Hbase, Apache Hive, Hue, Cloudera Impala, Kite SDK (formerly CDK), LLAMA, Apache Mahout, Apache Oozie, Parquet, Apache Pig, Cloudera Search, Apache Sentry, Apache Spark, Apache Sqoop and Apache ZooKeeper.

More Information: http://www.cloudera.com/content/dev-center/en/home/developer-admin-resources/cdh-components.html

Here are few highlights of CDH:

  • CDH can be deployed on-premise as well as in the cloud.
  • Cloudera manager simplifies the deployment and management of Hadoop and other components in Cloudera‘s Hadoop Ecosystem.
  • Cloudera has an Enterprise edition - Cloudera Enterprise, and is proprietary. There three variations of this - Basic, Flex, and Data Hub.
  • Express edition is available via a free download.
  • Cloudera Enterprise Data Hub edition is supported on AWS cloud.

Links & Additional Information

Getting Started


Hortonworks

Hortonworks has a Hadoop distribution known as Hortonworks Data Platform (HDP). Here is a simplified representation of Hortonworks Data Platform.

Source: http://hortonworks.com/hdp/

Hortonworks Data Platform includes the following features/components: Apache Hadoop, Apache Pig, Apache Hive, Apache Hbase, Apache ZooKeeper, Apache Oozie, Apache Sqoop, Apache Flume, Apache Ambari, Hue, Apache Mahout, Apache Knox, Apache Storm, Apache Tez, Apache Phoenix, Apache Accumulo and Apache Falcon.

More Information: http://hortonworks.com/hadoop/

Here are few highlights of Hortonworks Data Platform:

  • Can be deployed on-premise as well as in the cloud.
  • Supports deploying on Linux as well Windows platforms.
  • HDP is built in open through Apache Projects.

Links & Additional Information

Getting Started


Amazon Elastic Map Reduce (EMR)

Amazon Web Services (AWS) Elastic MapReduce (EMR) was among the first Hadoop offerings available in the market. Here is a high-level architecture/job flow of Amazon EMR.

Source: http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-what-is-emr.html

Amazon EMR contains most of the popular features/components like Hive, Pig, HBase, DistCp, Ganglia, etc. integrated into it.

Here are few highlights of Amazon EMR:

  • EMR is a Hadoop distribution in the Cloud.
  • Leverages AWS‘s Elastic Compute Cloud (EC2) for computation.
  • Leverages AWS‘s Simple Storage Service (S3) for storage.
  • Is tightly integrated with other AWS services.
  • Deployment and Management is simplified using AWS Management Console and AWS Toolkit.

Links & Additional Information

Getting Started


MapR

MapR is another major distribution available in the market. Below is a simplified architecture of MapR Data Platform.

Source: http://www.mapr.com/products/product-overview/overview

Here are few highlights of MapR:

  • MapR is available in the cloud through some of the leading cloud providers - Amazon Web Services (AWS)Google Compute EngineCenturyLink Technology Solutions, and OpenStack.
  • MapR integrates/supports more than 20 open source projects.
  • MapR supports multiple versions of various individual projects it integrates into its data platform. This gives the users flexibility to migrate to the subsequent/latest versions at their own pace.

Links & Additional Information

Getting Started

Apart from the distributions listed above, there are various other distributions available in the market from leading providers like Intel, Oracle, HP, and many others.

hadoop发行版本

时间: 2024-10-13 17:56:01

hadoop发行版本的相关文章

Hadoop发行版本介绍

前言 从2011年开始,中国进入大数据风起云涌的时代,以Hadoop为代表的家族软件,占据了大数据处理的广阔地盘.开源界及厂商,所有数据软件,无一不向Hadoop靠拢.Hadoop也从小众的高富帅领域,变成了大数据开发的标准.在Hadoop原有技术基础之上,出现了Hadoop家族产品,通过“大数据”概念不断创新,推出科技进步. 目录 Hadoop的发展史 Hadoop的发行版本的选择和介绍 1. Hadoop发展史 1.1Hadoop产生背景 Hadoop 最早起源于Nutch .Nutch 是

Hadoop 发行版本 Hortonworks 安装详解(四) 开启Kerberos集群安全验证

一.安装KDC Server 需要选择一个节点安装KDC服务器,这里选择备用头结点 yum install -y krb5-server krb5-libs krb5-workstation 修改配置文件 vi /etc/krb5.conf 把 EXAMPLE.COM 改为 自己想要的名字,[realms]配置段也需要根据实际情况修改 创建数据库 kdb5_util create -s -r EXAMPLE.COM(改成实际名称) 耐心等待一会,创建数据库有点慢 随后会要求你输入数据库主密钥并完

Hadoop 发行版本 Hortonworks 安装详解(一) 准备工作

一.前言 目前Hadoop发行版非常多,所有这些发行版均是基于Apache Hadoop衍生出来的,之所以有这么多的版本,完全是由Apache Hadoop的开源协议决定的:任何人可以对其进行修改,并作为开源或商业产品发布/销售. Hortonworks这个名字源自儿童书中一只叫Horton的大象.雅虎主导Hadoop开发的副总裁,带领二十几个核心成员成立Hortonworks. Hortonworks有两款核心产品:HDP和HDF Hortonworks没有对产品收费,而是将这两款产品完全开放

Hadoop发行版本之间的区别

商业发行版主要是提供了更为专业的技术支持,这对于大型企业更为重要,不同发行版都有自己的一些特点,本文就各发行版做简单对比介绍.对比版选择:DKhadoop发行版.cloudera发行版.hortonworks发行版.MAPR发行版.华为hadoop发行版 Hadoop是一个能够对大量数据进行分布式处理的软件框架. Hadoop 以一种可靠.高效.可伸缩的方式进行数据处理.Hadoop的发行版除了有Apache hadoop外cloudera,hortonworks,mapR,华为,DKhadoo

hadoop 有那些发行版本

hadoop发行版本 1. apache hadoop  http://hadoop.apache.org/ 2. cloudera hadoop(CDH) https://www.cloudera.com/ 3. hortonworks hadoop(HDP)  https://hortonworks.com 4. MapR  https://mapr.com/ 5. fusionInsight hadoop (华为大数据平台hadoop) http://carrier.huawei.com/

hadoop三大发行版本-优势-

Hadoop三大发行版本:Apache.Cloudera.Hortonworks. Apache版本最原始(最基础)的版本,对于入门学习最好. Cloudera在大型互联网企业中用的较多. Hortonworks文档较好. Apache Hadoop 官网地址:http://hadoop.apache.org/releases.html 下载地址:https://archive.apache.org/dist/hadoop/common/ Cloudera Hadoop 官网地址:https:/

Hadoop入门扫盲:hadoop发行版介绍与选择

一.hadoop发行版介绍 目前Hadoop发行版非常多,有Intel发行版,华为发行版.Cloudera发行版(CDH).Hortonworks版本等,所有这些发行版均是基于Apache Hadoop衍生出来的,之所以有这么多的版本,是由于Apache Hadoop的开源协议决定的:任何人可以对其进行修改,并作为开源或商业产品发布/销售. 目前而言,不收费的Hadoop版本主要有三个,都是国外厂商,分别是: Apache(最原始的版本,所有发行版均基于这个版本进行改进) Cloudera版本(

微软的R语言发行版本MRO及开发工具RTVS

(此文章同时发表在本人微信公众号"dotNET每日精华文章",欢迎右边二维码来关注.) 题记:微软在收购R语言的开发商后,也独立发行或在自己的产品中集成了R语言,这里就介绍下它们包括开发工具RTVS. R是世界上最强大的统计计算.机器学习和图形化语言/平台,同时伴有一个众多用户.开发者和贡献者的全球化社区.R在我之前从事的环境分析领域也被广泛使用,据朋友说一个从环境专业毕业的博士就因为R用得熟还成功进入Facebook成为数据科学家. 众所周知,微软去年初收购了R语言的开发商Revol

Hadoop发行版厂商星环发布TPC-DS评测结果

    引言 大数据基础技术领域中Hadoop的地位已获得广泛认同,但目前国内外市场上的Hadoop版本也是林林总总,到底该参照什么标准来考评Hadoop,尤其是给企业应用的Hadoop发行版平台呢? 大家可能都听说过TPC–TransactionProcessing Performance Council,它是一个非赢利的标准化组织.它定义了多组标准测试集用于客观地/可重现地评测数据库的性能.TPC中有个Decision Support(DS)子集,即TPC-DS,是用于评测决策支持系统(或数