1、Hadoop 项目的四大模块
- Hadoop Common: The common utilities that support the other Hadoop modules.
- Hadoop Distributed File System (HDFS?): A distributed file system that provides high-throughput access to application data.
- Hadoop YARN: A framework for job scheduling and cluster resource management.
- Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.
2、YARN:”云的操作系统”-- Hortonworks (Hadoop 商业版本的)
- 给部署在YARN上的应用,分配资源
- 管理资源
- JOB/APPLICATION 调度
3、技能
- 云计算,Hadoop 2.x
- 服务总线,SOA/OSB,Dubble
- 全文检索,Lucunce、Solr、Nutch
4、编译Hadoop 2.x 源码
4.1、环境:
1)Linux 64 位操作系统,CentOS 6.4 版本,VMWare 搭建的虚拟机
2)虚拟机可以联网
4.2、官方编译说明:
解压命令:tar -zxvf hadoop-2.2.0-src.tar.gz
之后进入到解压文件夹下,可以查看BUILDING.txt文件, more BUILDING.txt ,向下翻页是空格键,其中内容如下
Requirements:
* Unix System
* JDK 1.6+
* Maven 3.0 or later
* Findbugs 1.3.9 (if running findbugs)
* ProtocolBuffer 2.5.0
* CMake 2.6 or newer (if compiling native code)
* Internet connection for first build (to fetch all Maven and Hadoop dependencies)
----------------------------------------------------------------------------------
Maven main modules:
hadoop (Main Hadoop project)
- hadoop-project (Parent POM for all Hadoop Maven modules. )
(All plugins & dependencies versions are defined here.)
- hadoop-project-dist (Parent POM for modules that generate distributions.)
- hadoop-annotations (Generates the Hadoop doclet used to generated the Java
docs)
- hadoop-assemblies (Maven assemblies used by the different modules)
- hadoop-common-project (Hadoop Common)
- hadoop-hdfs-project (Hadoop HDFS)
- hadoop-mapreduce-project (Hadoop MapReduce)
- hadoop-tools (Hadoop tools like Streaming, Distcp, etc.)
- hadoop-dist (Hadoop distribution assembler)
----------------------------------------------------------------------------------
在编译完成之后,可以查看Hadoop的版本信息
libhadoop.so.1.0.0: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, not stripped
[[email protected] native]# pwd
/opt/hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0/lib/native
[[email protected] native]#
4.3、编译前准备之安装依赖包
安装linux系统包
- yum install autoconf automake libtool cmake
- yum install ncurses-devel
- yum install openssl-devel
- yum install lzo-devel zlib-devel gcc gcc-c++
安装Maven
- 下载:apache-maven-3.0.5-bin.tar.gz
- 解压:tar -zxvf apache-maven-3.0.5-bin.tar.gz
- 设置环境变量,打开/etc/profile文件,添加
- export MAVEN_HOME=/opt/apache-maven-3.0.5
- export PATH=$PATH:$MAVEN_HOME/bin
- 执行命令使之生效:source /etc/profile或者./etc/profile
- 验证:mvn -v
安装protobuf
- 解压:tar -zxvf protobuf-2.5.0.tar.gz
- 进入安装目录,进行配置,执行命令,./configure
- 安装命令:make & make check & make install
- 验证:protoc --version
安装findbugs
- 解压:tar -zxvf findbugs.tar.gz
- 设置环境变量:
- export export FINDBUGS_HOME=/opt/findbugs-3.0.0
- export PATH=$PATH:$FINDBUGS_HOME/bin
- 验证命令:findbugs -version
如果出现信息:java lang unsupportedclassversionerror unsupported major minor version 51.0,那么是因为你采用的jdk为open jdk,直接将其卸载,安装Sun公司原装的JDK即可,参考:
http://www.blogjava.net/Jay2009/archive/2009/04/23/267108.html
http://www.cnblogs.com/zhoulf/archive/2013/02/04/2891608.html
如果javac命令无法识别
[[email protected] ~]# javac
Error: could not find libjava.so
Error: could not find Java 2 Runtime Environment.
但是当我这样运行的时候:/usr/lib/jvm/jdk1.7.0_71/bin/javac -version,一切正常,说明可能是旧版jre影响,那么删除open-jre,安装Sun公司Jre
下载了rpm包之后,rpm -ivh jre-7u71-linux-x64.rpm,安装完成之后
[[email protected] ~]# java -version
java version "1.7.0_71"
Java(TM) SE Runtime Environment (build 1.7.0_71-b14)
Java HotSpot(TM) 64-Bit Server VM (build 24.71-b01, mixed mode)
[[email protected] ~]# javac -version
javac 1.7.0_71
[[email protected] ~]#
注意
Hadoop是Java写的,他无法使用Linux预安装的OpenJDK,因此安装hadoop前需要先安装JDK(1.6以上)
另外Hadoop2.2.0存在一个bug
需要修改/opt/hadoop-2.2.0-src/hadoop-common-project/hadoop-auth/pom.xml中的内容(添加黑体部分):
<dependency> <groupId>org.mortbay.jetty</groupId> <artifactId>jetty-util</artifactId> <scope>test</scope> </dependency> <dependency> <groupId>org.mortbay.jetty</groupId> <artifactId>jetty</artifactId> <scope>test</scope> </dependency> |
4.4、如何编译
进入到Hadoop源码目录下/opt/hadoop-2.2.0-src,运行红色字体[可选项]:
Building distributions:
Create binary distribution without native code and without documentation:
$ mvn package -Pdist -DskipTests -Dtar
Create binary distribution with native code and with documentation:
$ mvn package -Pdist,native,docs -DskipTests -Dtar
Create source distribution:
$ mvn package -Psrc -DskipTests
Create source and binary distributions with native code and documentation:
$ mvn
-e -X package -Pdist,native[,docs,src] -DskipTests -Dtar
Create a local staging version of the website (in /tmp/hadoop-site)
$ mvn clean site; mvn site:stage -DstagingDirectory=/tmp/hadoop-site
4.5、编译之前,可能需要配置MAVEN国内镜像配置
- 进入安装目录 /opt/modules/apache-maven-3.0.5/conf,编辑 settings.xml 文件
* 修改<mirrors>内容:
<mirror>
<id>nexus-osc</id>
<mirrorOf>*</mirrorOf>
<name>Nexus osc</name>
<url>http://maven.oschina.net/content/groups/public/</url>
</mirror>
* 修改<profiles>内容:
<profile>
<id>jdk-1.6</id>
<activation>
<jdk>1.6</jdk>
</activation>
<repositories>
<repository>
<id>nexus</id>
<name>local private nexus</name>
<url>http://maven.oschina.net/content/groups/public/</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
</repositories>
<pluginRepositories>
<pluginRepository>
<id>nexus</id>
<name>local private nexus</name>
<url>http://maven.oschina.net/content/groups/public/</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</pluginRepository>
</pluginRepositories>
</profile>
- 复制配置
将该配置文件复制到用户目录,使得每次对maven创建时,都采用该配置
* 查看用户目录【/home/hadoop】是否存在【.m2】文件夹,如没有,则创建
$ cd /home/hadoop
$ mkdir .m2
* 复制文件
$ cp /opt/modules/apache-maven-3.0.5/conf/settings.xml ~/.m2/
4.6、配置DNS
修改: vi /etc/resolv.conf
nameserver 8.8.8.8
nameserver 8.8.4.4
4.7、将Hadoop Project 导入到Eclipse
Importing projects to eclipse
When you import the project to eclipse, install hadoop-maven-plugins at first.
$ cd hadoop-maven-plugins
$ mvn install
Then, generate eclipse project files.
$ mvn eclipse:eclipse -DskipTests
At last, import to eclipse by specifying the root directory of the project via
[File] > [Import] > [Existing Projects into Workspace].
注意:
编译过程中如果出现任何有关jdk或者jre的问题:JAVA_HOME environment variable is not
set.
参看
http://www.cnblogs.com/RandyS/p/3909717.html
到 /etc/profile 最后,然后 . /etc/profile或者source /etc/profile使设置生效。
如果出现bash: javac: command not found
那么执行
yum install java-devel即可,这是因为:http://stackoverflow.com/questions/5407703/javac-command-not-found
编译过程中碰到的其他问题:http://blog.csdn.net/xichenguan/article/details/17636905
出现错误:Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.6:run (site) on project hadoop-hdfs: An Ant BuildException has occured:
input file /opt/hadoop-2.2.0-src/hadoop-hdfs-project/hadoop-hdfs/target/findbugsXml.xml does not exist
解决办法:
cd ~/hadoop-2.2.0-src/
mvn clean package -Pdist,native,docs -DskipTests -Dtar //编译中途出错修正后可从指定点开始继续编译,修改最后一个参数即可。如出现hadoop-hdfs/target/findbugsXml.xml
does not exist则从该命令删除docs参数再运行mvn package -Pdist,native -DskipTests -Dtar -rf :hadoop-pipes
build成功之后,进入到/opt/hadoop-2.2.0-src/hadoop-dist/target路径下查看hadoop-2.2.0.tar.gz就是编译完成之后的tar包
出现错误:Could not find goal ‘protoc‘ in plugin org.apache.hadoop:hadoop-maven-plugins:2.2.0 among available
解决办法:在/etc/profile中加入如下内容,之后source /etc/profile
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/protobuf/lib
export PATH=$PATH:/usr/local/bin
通常建议安装到/usr/local目录下,执行configure时,指定--prefix=/usr/local/protobuf即可,如果出现错误,那么make clean一下,之后再进行操作
我的/etc/profile文件内容:
export MAVEN_HOME=/opt/apache-maven-3.0.5
export PATH=$PATH:$MAVEN_HOME/bin
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_71/
export JRE_HOME=/usr/lib/jvm/jdk1.7.0_71/jre
export ANT_HOME=/usr/lib/jvm/apache-ant/
export CLASSPATH=.:$JRE_HOME/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin:$ANT_HOME/bin
export FINDBUGS_HOME=/opt/findbugs-3.0.0
export PATH=$PATH:$FINDBUGS_HOME/bin:/opt/protoc/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/protoc/lib
第一次编译结果:
第二次编译结果: