一、规划
192.168.10.135 Master
192.168.10.132 Slave1
192.168.10.133 Slave2
注:均已关闭SELinux和Firewalld.
二、部署前准备
a. 添加Hadoop用户并设置密码
# useradd hadoop # passwd --stdin hadoop
b. 添加sudo权限
# ls -la /etc/sudoers # chmod u+w /etc/sudoers ##添加写权限 # vi /etc/sudoers 98 root ALL=(ALL) ALL 99 hadoop ALL=(ALL) ALL ##添加Hadoop用户 # chmod u-w /etc/sudoers ##撤销写权限
c. 安装相关软件
$ yum -y install java-1.7.0-openjdk java-1.7.0-openjdk-devel rsync openssh-server openssh-clients $ java -version java version "1.7.0_91" OpenJDK Runtime Environment (rhel-2.6.2.3.el7-x86_64 u91-b00) OpenJDK 64-Bit Server VM (build 24.91-b01, mixed mode)
d. 配置ssh免密登录
Master:
# su -l hadoop ##切换到Hadoop用户 $ ssh-keygen -t rsa -P "" ##生成密钥对 $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys ##把id_rsa.pub追加到授权的key里面去 $ chmod 600 ~/.ssh/authorized_keys ##修改权限 $ scp ~/.ssh/id_rsa.pub [email protected]:~/ $ scp ~/.ssh/id_rsa.pub [email protected]:~/
Slave:
# su -l hadoop ##切换到Hadoop用户 $ mkdir ~/.ssh $ chmod 700 ~/.ssh $ cat ~/id_rsa.pub >> ~/.ssh/authorized_keys ##追加到授权文件"authorized_keys" $ chmod 600 ~/.ssh/authorized_keys ##修改权限 $ rm ~/id_rsa.pub ##删除公钥
在master上进行测试
$ ssh localhost $ ssh slave1 $ ssh slave2
三、Hadoop部署
在hadoop用户登录的环境中进行下列操作:
a. 下载Hadoop
$ wget http://apache.fayea.com/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz
b. 解压并安装
$ tar -zxvf hadoop-2.7.1.tar.gz $ sudo mv hadoop-2.7.1 /usr/local/hadoop $ sudo chown -R hadoop:hadoop /usr/local/hadoop/
c. 配置环境变量
$ vi /home/hadoop/.bashrc # Hadoop Start export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk export HADOOP_HOME=/usr/local/hadoop export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar: export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin: # Hadoop End $ source /home/hadoop/.bashrc
d. 添加hosts
$ sudo vi /etc/host 192.168.10.135 Master 192.168.10.132 Slave1 192.168.10.133 Slave2
e. 配置Master
①配置core-site.xml:
$ sudo vi /usr/local/hadoop/etc/hadoop/core-site.xml <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://Master:9000</value> </property> <property> <name>io.file.buffer.size</name> <value>4096</value> </property> </configuration>
注:
属性”fs.defaultFS“表示文件系统默认名称节点(即NameNode节点)地址,由”hdfs://主机名(或ip):端口号”组成;
属性“io.file.buffer.size”表示SequenceFiles在读写中可以使用的缓存大小,可减少I/O次数。
②配置hdfs-site.xml:
$ sudo vi /usr/local/hadoop/etc/hadoop/hdfs-site.xml <configuration> <property> <name>dfs.namenode.name.dir</name> <value>/usr/local/hadoop/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/usr/local/hadoop/dfs/data</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>Master:50090</value> </property> </configuration>
注:
属性”dfs.namenode.name.dir“表示NameNode存储命名空间和操作日志相关的元数据信息的本地文件系统目录,该项默认本地路径为”/tmp/hadoop-{username}/dfs/name“;
属性”dfs.datanode.data.dir“表示DataNode节点存储HDFS文件的本地文件系统目录,由”file://本地目录”组成,该项默认本地路径为”/tmp/hadoop-{username}/dfs/data”;
属性“dfs.replication”表示分布式文件系统的数据块复制份数,有几个datanode节点就复制几份;
属性“dfs.namenode.secondary.http-address”表示SecondNameNode主机及端口号(如果无需额外指定SecondNameNode角色,可以不进行此项配置)
③配置mapred-site.xml:
$ sudo cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml $ sudo vi /usr/local/hadoop/etc/hadoop/mapred-site.xml <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> <final>true</final> </property> <property> <name>mapreduce.jobhistory.address</name> <value>Master:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>Master:19888</value> </property> </configuration>
注:
属性”mapreduce.framework.name“表示执行mapreduce任务所使用的运行框架,默认为local,需要将其改为”yarn”。
④配置yarn-site.xml:
$ sudo vi /usr/local/hadoop/etc/hadoop/yarn-site.xml <configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.acl.enable</name> <value>false</value> </property> <property> <name>yarn.admin.acl</name> <value>*</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>false</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>Master:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>Master:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>Master:8035</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>Master:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>Master:8088</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>Master</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
注:
属性”yarn.nodemanager.aux-service“表示MR applicatons所使用的shuffle工具类
⑤指定JAVA_HOME安装目录
$ sudo vi /usr/local/hadoop/etc/hadoop/hadoop-env.sh 26 export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk
⑥指定集群中的master节点(NameNode、ResourceManager)所拥有的slaver节点
$ sudo vi /usr/local/hadoop/etc/hadoop/slaves Slave1 Slave2
⑦向Slave复制Hadoop
$ scp -r /usr/local/hadoop slave1:/usr/local/ $ scp -r /usr/local/hadoop slave2:/usr/local/
四、运行Hadoop
a. 格式化分布式文件系统
$ hdfs namenode -format 15/12/21 12:23:49 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = Master/192.168.10.135 ##主机 STARTUP_MSG: args = [-format] ##格式化 STARTUP_MSG: version = 2.7.1 ##Hadoop版本号 ...... STARTUP_MSG: java = 1.7.0_91 ##Java版本号 ************************************************************/ ...... 15/12/21 12:24:21 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at Master/192.168.10.135 ************************************************************/
b. 启动Hadoop
$ start-all.sh
c. 检测守护进程启动情况
①查看master后台java进程(前面是进程号,后面是进程)
Master:
$ jps 9863 NameNode 10459 Jps 10048 SecondaryNameNode 10202 ResourceManager
Slave:
# jps 2217 NodeManager 2138 DataNode 2377 Jps
②查看DFS使用状况
$ hadoop dfsadmin -report Configured Capacity: 39631978496 (36.91 GB) Present Capacity: 33985548288 (31.65 GB) DFS Remaining: 33985531904 (31.65 GB) DFS Used: 16384 (16 KB) DFS Used%: 0.00% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks (with replication factor 1): 0 ------------------------------------------------- Live datanodes (2): Name: 192.168.10.132:50010 (Slave1) Hostname: Slave1 Decommission Status : Normal Configured Capacity: 19815989248 (18.46 GB) DFS Used: 8192 (8 KB) Non DFS Used: 2777395200 (2.59 GB) DFS Remaining: 17038585856 (15.87 GB) DFS Used%: 0.00% DFS Remaining%: 85.98% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Tue Dec 22 19:27:27 CST 2015 Name: 192.168.10.133:50010 (Slave2) Hostname: Slave2 Decommission Status : Normal Configured Capacity: 19815989248 (18.46 GB) DFS Used: 8192 (8 KB) Non DFS Used: 2869035008 (2.67 GB) DFS Remaining: 16946946048 (15.78 GB) DFS Used%: 0.00% DFS Remaining%: 85.52% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Tue Dec 22 19:27:27 CST 2015
③检查是否运行成功
a. 浏览器输入http://localhost:8088进入ResourceManager管理页面
b.浏览器输入http://localhost:50070进入HDFS页面
五、测试验证
a. 首先创建相关文件夹(要一步一步的创建):
$ hadoop dfs -mkdir /user $ hadoop dfs -mkdir /user/hadoop $ hadoop dfs -mkdir /user/hadoop/input
b.建立测试文件
$ vi test.txt
hello hadoop
hello World
Hello Java
CentOS System
c. 将测试文件放到测试目录中
$ hadoop dfs -put test.txt /user/hadoop/input
d.执行Wordcount程序
$ cd /usr/local/hadoop/ $ hadoop jar share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.7.1-sources.jar org.apache.hadoop.examples.WordCount /user/hadoop/input /user/hadoop/output
f. 查看生成的单词统计数据
$ cd /usr/local/hadoop/ $ hadoop dfs -ls /user/hadoop/output Found 2 items -rw-r--r-- 2 hadoop supergroup 0 2015-12-22 19:54 /user/hadoop/output/_SUCCESS -rw-r--r-- 2 hadoop supergroup 58 2015-12-22 19:53 /user/hadoop/output/part-r-00000 $ hadoop dfs -cat /user/hadoop/output/part-r-00000 CentOS1 Hello1 Java1 System1 World1 hadoop1 hello2
详情:http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/ClusterSetup.html