Hadoop集群完全分布式配置部署
下面的部署步骤,除非说明是在哪个服务器上操作,否则默认为在所有服务器上都要操作。为了方便,使用root用户。
1.准备工作
1.1 centOS6服务器3台
手动指定3服务器台以下信息:
hostname |
IP |
mask |
gateway |
DNS |
备注 |
master |
172.17.138.82 |
255.255.255.0 |
172.17.138.1 |
202.203.85.88 |
服务器1 |
slave1 |
172.17.138.83 |
255.255.255.0 |
172.17.138.1 |
202.203.85.88 |
服务器2 |
slave2 |
172.17.138.84 |
255.255.255.0 |
172.17.138.1 |
202.203.85.88 |
服务器3 |
PC |
172.17.138.61 |
255.255.255.0 |
172.17.138.1 |
202.203.85.88 |
Windows PC |
1.2 软件包
hadoop-2.7.6.tar.gz
jdk-8u171-linux-x64.tar.gz
上传到3台服务器的/soft目录下
(下载地址:https://pan.baidu.com/s/1a_Pjl8uJ2d_-r1hbN05fWA)
1.3 关闭防火墙
关闭并检查防火墙
- [[email protected] ~]# chkconfig iptables off
- [[email protected] ~]# service iptables stop
- [[email protected] ~]# service iptables status
1.4 关闭selinux
临时关闭
[[email protected] ~]# setenforce 0
永久关闭,SELINUX=enforcing改为SELINUX=disabled
[[email protected] ~]# vi /etc/selinux/config
#SELINUX=enforcing
SELINUX=disabled
1.5 开启sshd,windows下用Xshell连接3台虚拟机,方便配置(复制、粘贴)
[[email protected] ~]# service sshd start
Windows下用Xshell连接3台虚拟机
(文件下载:https://pan.baidu.com/s/1K052DJT9Pq0xy8XAVa764Q)
1.6 安装JDK
解压jdk
[[email protected] ~]# mkdir -p /soft/java
[[email protected] soft]# tar -zxvf jdk-8u171-linux-x64.tar.gz -C /soft/java/
配置环境变量
[[email protected] soft]# echo -e "\nexport JAVA_HOME=/soft/java/jdk1.8.0_171" >> /etc/profile
[[email protected] soft]# echo -e "\nexport PATH=\$PATH:\$JAVA_HOME/bin" >> /etc/profile
[[email protected] soft]# echo -e "\nexport CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar" >> /etc/profile
[[email protected] soft]# source /etc/profile
1.7 配置主机域名
1)master 172.17.138.82上操作
[[email protected] soft]# hostname master
[[email protected] ~]# vi /etc/hostname master
2)slave1 172.17.138.83上操作
[[email protected] soft]# hostname slave1
[[email protected] ~]# vi /etc/hostname
slave1
3)slave2 172.17.138.84上操作
[[email protected] soft]# hostname slave2
[[email protected] ~]# vi /etc/hostname
slave2
1.8 配置hosts
3台服务器上都执行
[[email protected] ~]# echo ‘172.17.138.82 master‘ >> /etc/hosts
[[email protected] ~]# echo ‘172.17.138.83 slave1‘ >> /etc/hosts
[[email protected] ~]# echo ‘172.17.138.84 slave2‘ >> /etc/hosts
1.9 ssh免密码登录
master上操作
[[email protected] home]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Created directory ‘/root/.ssh‘.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
1d:33:50:ac:03:2f:d8:10:8f:3d:48:95:d3:f8:7a:05 [email protected]
The key‘s randomart image is:
+--[ RSA 2048]----+
| oo.+.o. |
| ..== E.. |
| o++= o+ |
| . o.=..+ |
| oSo. |
| . . |
| . |
| |
| |
+-----------------+
[[email protected] home]#
一直enter,信息中会看到.ssh/id_rsa.pub的路径。
[[email protected] ~]# cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys
检查slave1,slave2上/root下,是否有.ssh目录,没有则创建,注意要有ll -a命令
slave1,slave2上操作
[[email protected] ~]# ll -a /root/
总用量 36
dr-xr-x---. 2 root root 4096 11月 16 17:31 .
dr-xr-xr-x. 18 root root 4096 11月 17 16:49 ..
-rw-------. 1 root root 953 11月 16 17:27 anaconda-ks.cfg
-rw-------. 1 root root 369 11月 17 18:12 .bash_history
-rw-r--r--. 1 root root 18 12月 29 2013 .bash_logout
-rw-r--r--. 1 root root 176 12月 29 2013 .bash_profile
-rw-r--r--. 1 root root 176 12月 29 2013 .bashrc
-rw-r--r--. 1 root root 100 12月 29 2013 .cshrc
-rw-r--r--. 1 root root 129 12月 29 2013 .tcshrc
[[email protected] ~]# mkdir /root/.ssh
把master上的/root/.ssh/authorized_keys复制到slave1,slave2的/root/.ssh上
master上操作
[[email protected] ~]# scp /root/.ssh/authorized_keys [email protected]:/root/.ssh/
[[email protected] ~]# scp /root/.ssh/authorized_keys [email protected]:/root/.ssh/
master,slave1,slave2上都操作
[[email protected] ~]# chmod 700 /root/.ssh
验证
master上操作
ssh
master,ssh slave1,ssh slave2
[[email protected] .ssh]# ssh slave1
Last failed login: Fri Nov 18 16:52:28 CST 2016 from master on ssh:notty
There were 2 failed login attempts since the last successful login.
Last login: Fri Nov 18 16:22:23 2016 from 192.168.174.1
[[email protected] ~]# logout
Connection to slave1 closed.
[[email protected] .ssh]# ssh slave2
The authenticity of host ‘slave2 (172.17.138.84)‘ can‘t be established.
ECDSA key fingerprint is 95:76:9a:bc:ef:5e:f2:b3:cf:35:67:7a:3e:da:0e:e2.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added ‘slave2‘ (ECDSA) to the list of known hosts.
Last failed login: Fri Nov 18 16:57:12 CST 2016 from master on ssh:notty
There was 1 failed login attempt since the last successful login.
Last login: Fri Nov 18 16:22:40 2016 from 192.168.174.1
[[email protected] ~]# logout
Connection to slave2 closed.
[[email protected] .ssh]# ssh master
Last failed login: Fri Nov 18 16:51:45 CST 2016 from master on ssh:notty
There was 1 failed login attempt since the last successful login.
Last login: Fri Nov 18 15:33:56 2016 from 192.168.174.1
[[email protected] ~]#
2.配置hadoop集群
下面操作,若无特别指明,均是3台服务器都执行操作。
2.1 解压
[[email protected] soft]# mkdir -p /soft/hadoop/
[[email protected] soft]# tar -zxvf hadoop-2.7.6.tar.gz -C /soft/hadoop/
2.2 配置环境
[[email protected]
~]# vim /root/.bashrc
#HADOOP START
#export HADOOP_HOME=/soft/hadoop
export HADOOP_HOME=/soft/hadoop/hadoop-2.7.6
#HADOOP END
export
PATH=/usr/local/sbin:/usr/local/bin/:/usr/bin:/usr/sbin:/sbin:/bin:/soft/hadoop/hadoop-2.7.6/bin:/soft/hadoop/hadoop-2.7.6/sbin
[[email protected] ~]# source ~/.bashrc
[[email protected] hadoop-2.7.6]# source /etc/profile
[[email protected] hadoop-2.7.6]# hadoop version
Hadoop 2.7.6
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff
Compiled by root on 2016-08-18T01:41Z
Compiled with protoc 2.5.0
From source with checksum 2e4ce5f957ea4db193bce3734ff29ff4
This command was run using /soft/hadoop/hadoop-2.7.6/share/hadoop/common/hadoop-common-2.7.6.jar
[[email protected] hadoop-2.7.6]#
修改hadoop配置文件
hadoop-env.sh,yarn-env.sh增加JAVA_HOME配置
[[email protected] soft]# echo -e "export JAVA_HOME=/soft/java/jdk1.8.0_171" >> /soft/hadoop/hadoop-2.7.6/etc/hadoop/hadoop-env.sh
[[email protected] soft]# echo -e "export JAVA_HOME=/soft/java/jdk1.8.0_171" >> /soft/hadoop/hadoop-2.7.6/etc/hadoop/yarn-env.sh
创建目录/hadoop,/hadoop/tmp,/hadoop/hdfs/data,/hadoop/hdfs/name
[[email protected] hadoop]# mkdir -p /hadoop/tmp
[[email protected] hadoop]# mkdir -p /hadoop/hdfs/data
[[email protected] hadoop]# mkdir -p /hadoop/hdfs/name
修改core-site.xml文件
[[email protected] ~]# vi /soft/hadoop/hadoop-2.7.6/etc/hadoop/core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop/tmp</value>
<description>Abase for
other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
</property>
</configuration>
修改hdfs-site.xml
[[email protected] ~]# vi
/soft/hadoop/hadoop-2.7.6/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/hadoop/hdfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:9001</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
复制mapred-site.xml.template为mapred-site.xml,并修改
[[email protected] hadoop]# cd /soft/hadoop/hadoop-2.7.6/etc/hadoop/
[[email protected] hadoop]# cp mapred-site.xml.template mapred-site.xml
[[email protected] hadoop]# vi mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<final>true</final>
</property>
<property>
<name>mapreduce.jobtracker.http.address</name>
<value>master:50030</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>http://master:9001</value>
</property>
</configuration>
修改yarn-site.xml
[[email protected] hadoop]# vi yarn-site.xml
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
/soft/hadoop/hadoop-2.7.6/etc/hadoop/slave,删除默认的,添加slave1,slave2
[[email protected] hadoop]# echo -e "slave1\nslave2" > /soft/hadoop/hadoop-2.7.6/etc/hadoop/slaves
2.3 启动
只在master执行,格式化
[[email protected] hadoop]# cd /soft/hadoop/hadoop-2.7.6/bin/
[[email protected] bin]# ./hadoop namenode -format
启动,只在master执行
[[email protected] bin]# cd /soft/hadoop/hadoop-2.7.6/sbin/
[[email protected] sbin]# ./start-all.sh
3.验证
请参照上一篇伪分布式截图,一模一样
3.1 jps查看各节点
master
[[email protected] sbin]# jps
3337 Jps
2915 SecondaryNameNode
3060 ResourceManager
2737 NameNode
[[email protected] sbin]#
slave1
[[email protected] hadoop]# jps
2608 DataNode
2806 Jps
2706 NodeManager
[[email protected] hadoop]#
slave2
[[email protected] hadoop]# jps
2614 DataNode
2712 NodeManager
2812 Jps
[[email protected] hadoop]#
浏览器访问master的50070,比如http://172.17.138.82:50070
http://172.17.138.82:8088/
好了,说明hadoop集群正常工作了
3.2 创建输入的数据,采用/etc/protocols文件作为测试
先将文件拷贝到 hdfs 上:
[[email protected] sbin]#hadoop dfs -put
/etc/protocols
/user/hadoop
/input
3.3 执行Hadoop WordCount应用(词频统计)
# 如果存在上一次测试生成的output,由于hadoop的安全机制,直接运行可能会报错,所以请手动删除上一次生成的output文件夹
$
hadoop jar /soft/hadoop/hadoop-2.7.6/share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.7.6-sources.jar org.apache.hadoop.examples.WordCount input output
3.4查看生成的单词统计数据
$ hadoop dfs -cat
/user/hadoop
/output/*
3.5停止
[[email protected] bin]# cd /soft/hadoop/hadoop-2.7.6/sbin/
[[email protected] sbin]# ./stop-all.sh
原文地址:https://www.cnblogs.com/AndyWong/p/9201756.html