配置linux基本环境:
--> java、ip、hostname、hosts、iptables、chkconfig、ssh环境配置
hadoop2.2安装在linux64位机器上,需要对源码进行编译:
首先安装google的protobuf
yum install glibc-header
yum install gcc
yum install gcc-c++
yum install make
yum install cmake
yum install openssl-devel
yum install ncurses-devel
tar zxvf protobuf-2.5.0.tar.gz
cd protobuf-2.5.0 && ./configure && make check && make install
然后安装maven-3.0.5,配置 环境变量
下载hadoop2.20-src,解压后,执行mvn package -DskipTests -Pdist native 编译。
修改hadoop配置文件,etc/hadoop 目录下的配置文件core-site.xml,hdfs-site.xml,yarn-site.xml,mapred-site.xml
文件core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop0:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop-2.2.0/tmp</value> </property> <property> <name>fs.trash.interval</name> <value>1440</value> </property> </configuration>
文件hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> </configuration>
文件yarn-site.xml
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
文件mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
启动
启动——格式化:
bin/hdfs namenode -format
启动——hdfs
sbin/hadoop-daemon.sh start namenode 对应端口:50070
sbin/hadoop-daemon.sh start datanode
启动——yarn
sbin/yarn-daemon.sh start resourcemanager 对应端口:8088
sbin/yarn-daemon.sh start nodemanager
启动——historyserver
sbin/mr-jobhistory-daemon.sh start historyserver
测试上传:bin/hadoop fs -put LICENSE.txt /license
hadoop1默认block大小是64兆,hadoop2默认大小是128兆
测试单词计数:
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /license /out
启动jobhistory 查看作业运行日志 sbin/mr-jobhistory-daemon.sh start historyserver
hadoop应用在nodemanager上跑的。
MapReduce的ApplicationMaster叫MRAppMaster
nodemanager是有多个,resourcemanager只有一个