What Is Apache Hadoop?

http://hadoop.apache.org/

1

The Apache™ Hadoop® project develops open-source software for reliable, scalable,distributed computing.

The Apache Hadoop software library is a framework that allows for the distributedprocessing of large data sets across clusters of computers using simple programming models.

It is designed to scale up from single servers to thousands of machines, each offering local
computation and storage. Rather than rely on hardware to deliver high-availability, the
library itself is designed to detect and handle failures at the application layer, so delivering
a highly-available service on top of a cluster of computers, each of which may be prone to
failures.
The project includes these modules:
• Hadoop Common: The common utilities that support the other Hadoop modules.
• Hadoop Distributed File System (HDFS™): A distributed file system that provides
high-throughput access to application data.
• Hadoop YARN: A framework for job scheduling and cluster resource management.
• Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.
Other Hadoop-related projects at Apache include:
• Ambari™: A web-based tool for provisioning, managing, and monitoring Apache
Hadoop clusters which includes support for Hadoop HDFS, Hadoop MapReduce, Hive,
HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard
for viewing cluster health such as heatmaps and ability to view MapReduce, Pig and Hive
applications visually alongwith features to diagnose their performance characteristics in a
user-friendly manner.
• Avro™: A data serialization system.
• Cassandra™: A scalable multi-master database with no single points of failure.
• Chukwa™: A data collection system for managing large distributed systems.
• HBase™: A scalable, distributed database that supports structured data storage for large
tables.
• Hive™: A data warehouse infrastructure that provides data summarization and ad hoc
querying.
• Mahout™: A Scalable machine learning and data mining library.
• Pig™: A high-level data-flow language and execution framework for parallel
computation.
• Spark™: A fast and general compute engine for Hadoop data. Spark provides a simple
and expressive programming model that supports a wide range of applications, including
ETL, machine learning, stream processing, and graph computation.
Welcome to Apache™ Hadoop®!
Page 3 Copyright © 2014 The Apache Software Foundation. All rights reserved.
• Tez™: A generalized data-flow programming framework, built on Hadoop YARN,
which provides a powerful and flexible engine to execute an arbitrary DAG of tasks to
process data for both batch and interactive use-cases. Tez is being adopted by Hive™,
Pig™ and other frameworks in the Hadoop ecosystem, and also by other commercial
software (e.g. ETL tools), to replace Hadoop™ MapReduce as the underlying execution
engine.
• ZooKeeper™: A high-performance coordination service for distributed applications.

1

1

xxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

时间: 2024-11-09 03:37:52

What Is Apache Hadoop?的相关文章

Datanode启动问题 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool <registering>

2017-04-15 21:21:15,423 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: supergroup = supergroup 2017-04-15 21:21:15,467 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity:

Hive创建表格报【Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException】引发的血案

在成功启动Hive之后感慨这次终于没有出现Bug了,满怀信心地打了长长的创建表格的命令,结果现实再一次给了我一棒,报了以下的错误Error, return code 1 from org.apache.Hadoop.hive.ql.exec.DDLTask. MetaException,看了一下错误之后,先是楞了一下,接着我就发出感慨,自从踏上编程这条不归路之后,就没有一天不是在找Bug的路上就是在处理Bug,给自己贴了个标签:找Bug就跟吃饭一样的男人.抒发心中的感慨之后,该干活还是的干活.

Apache Hadoop 3.0.0-alpha1,重写 Shell 脚本

Apache Hadoop 3.0.0-alpha1发布了. 部分更新内容: Hadoop 3.0.0-alpha1在Java 8下编译,使用Java 7以及以下版本需更新到Java 8 重写了shell脚本,支持超过两个NameNode 详情请参照发行说明: Hadoop 3.0.0-alpha1 Release Notes 主页:http://hadoop.apache.org/docs/r3.0.0-alpha1/index.html 下载:http://hadoop.apache.org

Win下Eclipse提交Hadoop程序出错:org.apache.hadoop.security.AccessControlException: Permission denied: user=D

描述:在Windows下使用Eclipse进行Hadoop的程序编写,然后Run on hadoop 后,出现如下错误: 11/10/28 16:05:53 INFO mapred.JobClient: Running job: job_201110281103_000311/10/28 16:05:54 INFO mapred.JobClient: map 0% reduce 0%11/10/28 16:06:05 INFO mapred.JobClient: Task Id : attemp

Java 向Hbase表插入数据报(org.apache.hadoop.hbase.client.HTablePool$PooledHTable cannot be cast to org.apac

org.apache.hadoop.hbase.client.HTablePool$PooledHTable cannot be cast to org.apac 代码: //1.create HTablePool HTablePool hp=new HTablePool(con, 1000); //2.get HTable from HTablepool HTable ht=(HTable)hp.getTable(tName); 原因:如今应用的api版本中pool.getTable返回的类型

用java运行Hadoop程序报错:org.apache.hadoop.fs.LocalFileSystem cannot be cast to org.apache.

用java运行Hadoop例程报错:org.apache.hadoop.fs.LocalFileSystem cannot be cast to org.apache.所写代码如下: package com.pcitc.hadoop; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.h

org.apache.hadoop.conf-Configured

org.apache.hadoop.conf中的最后一个类,也是这个包中以后用的最频繁的一个,Configurable算是肉体,Configuration算是灵魂吧 1 package org.apache.hadoop.conf; 2 3 /** Base class for things that may be configured with a {@link Configuration}. */ 4 public class Configured implements Configurab

org.apache.hadoop.conf-Configuration

终于遇到第一块硬骨头 Hadoop没有使用java.util.Properties管理配置文件,而是自己定义了一套配置文件管理系统和自己的API. 1 package org.apache.hadoop.conf; 2 3 import java.io.BufferedInputStream; 4 import java.io.DataInput; 5 import java.io.DataOutput; 6 import java.io.File; 7 import java.io.FileI

org.apache.hadoop.hbase.PleaseHoldException: Master is initializing

电脑换了重新装了下系统,在本机ubuntu 的环境下搭建hadoopCDH4.5 伪分布式.进入Hbase shell,在创建表的时候出现异常如下: ERROR: org.apache.hadoop.hbase.PleaseHoldException: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing at org.apache.hadoop.hbase.master.HMaster.checkInitiali