前段时间去培训,按照教程装了一遍Hadoop。回来又重新装一次,捋下思路,加深理解。
基本配置如下,三个节点,一个namenode,两个datanode。
Namenode |
192.168.59.144 |
Datanode1 |
192.168.59.145 |
Datanode2 |
192.168.59.146 |
在虚拟机上做实验,暂且就使用nat和dhcp吧。
(一)把网卡设置成开机自启动:
# vim /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
HWADDR=00:0C:29:67:C5:E2
TYPE=Ethernet
UUID=5cb1d564-a7e8-4b57-bdb4-7e76e92f460a
ONBOOT=yes
NM_CONTROLLED=yes
BOOTPROTO=dhcp
然后重启network服务:
#service network restart
懒,于是想克隆虚拟机,克隆出来发现网络服务一直起不来,搜索下,有了解决办法,见:http://www.cnblogs.com/bonjour-chen/articles/4448029.html
(二)关闭防火墙:
[[email protected] wb]# chkconfig iptables off
(三)关闭SELinux安全子系统
[[email protected] wb]# /usr/sbin/sestatus -v
SELinux status: enabled
SELinuxfs mount: /selinux
Current mode: enforcing
Mode from config file: enforcing
Policy version: 24
Policy from config file: targeted
selinux是enable状态,于是修改/etc/selinux/config 文件,将SELINUX=enforcing改为SELINUX=disabled,重启#reboot
(四)添加集群中IP和机器名映射
[[email protected] wb]# vim /etc/host
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.59.144 namenode
192.168.59.145 datanode1
192.168.59.146 datanode2
测试一下:
[[email protected] wb]# ping datanode1
PING datanode1 (192.168.59.145) 56(84) bytes of data.
64 bytes from datanode1 (192.168.59.145): icmp_seq=1 ttl=64 time=2.45 ms
64 bytes from datanode1 (192.168.59.145): icmp_seq=2 ttl=64 time=0.367 ms
64 bytes from datanode1 (192.168.59.145): icmp_seq=3 ttl=64 time=0.291 ms
64 bytes from datanode1 (192.168.59.145): icmp_seq=4 ttl=64 time=0.312 ms
^C
--- datanode1 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3320ms
rtt min/avg/max/mdev = 0.291/0.856/2.457/0.925 ms
(五)安装hadoop要先安装openssh和rsync,检查下,我的centOS安装的时候就自带了:
[[email protected] wb]# rpm -qa | grep ssh
openssh-server-5.3p1-94.el6.x86_64
libssh2-1.4.2-1.el6.x86_64
openssh-clients-5.3p1-94.el6.x86_64
openssh-5.3p1-94.el6.x86_64
openssh-askpass-5.3p1-94.el6.x86_64
[[email protected] wb]# rpm -qa | grep rsync
rsync-3.0.6-9.el6_4.1.x86_64
(六)创建hadoop用户
在每个节点上都做如下配置:
[[email protected] wb]# groupadd hadoop
[[email protected] wb]# useradd hadoop -g hadoop
[[email protected] wb]# passwd hadoop
Changing password for user hadoop.
New password:
BAD PASSWORD: it is based on a dictionary word
BAD PASSWORD: is too simple
Retype new password:
passwd: all authentication tokens updated successfully.
(七)配置节点间SSH无密钥登陆
Hadoop运行过程中需要管理远端Hadoop守护进程,在Hadoop启动以后,NameNode是通过SSH来启动和停止各个DataNode上的各种守护进程的。这就必须在节点之间执行指令的时候是不需要输入密码的形式,故我们需要配置SSH运用无密码公钥认证的形式,这样NameNode使用SSH无密码登录DataNode,DataNode上也能使用SSH无密码登录到NameNode。
在/etc/ssh/sshd_config中,将下面这几句话注释掉:
RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys
之后重启SSH服务
[[email protected] wb]# service sshd restart
Stopping sshd: [ OK ]
Starting sshd: [ OK ]
之后,开始正式配置了:
[[email protected] wb]$ ssh-keygen -t rsa -P ‘‘
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Created directory ‘/home/hadoop/.ssh‘.
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
ea:99:37:05:6d:b1:84:43:93:56:e4:0a:e7:f0:c5:a4 [email protected]
The key‘s randomart image is:
+--[ RSA 2048]----+
| .o=+ |
| =*o |
| o.E++o |
| *.o+ |
| So |
| . . |
| . . |
| . oo |
| +. . |
+-----------------+
[[email protected] wb]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 对公钥授权
[[email protected] wb]$ chmod 600 ~/.ssh/authorized_keys
之后会在/home/hadoop下生成.ssh文件夹,文件夹里面存放了公钥和私钥
[[email protected] ~]$ ls -a
. .. .bash_logout .bash_profile .bashrc .gnome2 .mozilla .ssh
[[email protected] ~]$ cd .ssh/
[[email protected] .ssh]$ ls -a
. .. authorized_keys id_rsa id_rsa.pub
将.ssh文件夹发送到各个节点下的hadoop目录下:
[[email protected] ~]$ scp -r .ssh [email protected]:/home/hadoop/.ssh
The authenticity of host ‘datanode1 (192.168.59.145)‘ can‘t be established.
RSA key fingerprint is 6e:93:b0:ff:a3:bc:96:be:f0:35:09:bb:7b:12:37:74.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added ‘datanode1,192.168.59.145‘ (RSA) to the list of known hosts.
[email protected]‘s password:
known_hosts 100% 797 0.8KB/s 00:00
id_rsa 100% 1671 1.6KB/s 00:00
id_rsa.pub 100% 397 0.4KB/s 00:00
authorized_keys 100% 397 0.4KB/s 00:00
[[email protected] ~]$ scp -r .ssh [email protected]:/home/hadoop/.ssh
The authenticity of host ‘datanode2 (192.168.59.146)‘ can‘t be established.
RSA key fingerprint is 6e:93:b0:ff:a3:bc:96:be:f0:35:09:bb:7b:12:37:74.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added ‘datanode2,192.168.59.146‘ (RSA) to the list of known hosts.
[email protected]‘s password:
known_hosts 100% 1203 1.2KB/s 00:00
id_rsa 100% 1671 1.6KB/s 00:00
id_rsa.pub 100% 397 0.4KB/s 00:00
authorized_keys 100% 397 0.4KB/s 00:00
在每个节点上给发过来的.ssh文件夹授权
# chown -R hadoop.hadoop /home/hadoop/.ssh
# chmod -R 700 /home/hadoop/.ssh
SSH免密钥配置完成,测试免ssh登录正常。
[[email protected] conf]$ ssh datanode1
(八)下载jdk和hadoop,这里使用的版本是hadoop1.0.4和jdk-6u45-linux-x64。
把文件上传至/home/hadoop目录下
创建两个目录:
mkdir –p ./hadoop/software/java
mkdir –p ./hadoop/software/hadoop
解压并安装:
$cd /home/hadoop/software/hadoop
$tar –zvxf /home/hadoop/hadoop-1.0.4.tar.gz
$cp /home/hadoop/jdk-6u45-linux-x64.bin /home/hadoop/software/java/
$cd /home/hadoop/software/java/
$chmod 777 jdk-6u45-linux-x64.bin
$./ jdk-6u45-linux-x64.bin
这里建议先把系统内自带的jdk卸载了,不然可能会有版本不一致的问题。如果存在两个版本的jdk且都配置了环境变量,那系统采取哪个优先级?
Java的安装路径:/home/hadoop/software/java/ jdk1.6.0_45
hadoop的安装路径:/home/hadoop/software/hadoop/hadoop-1.0.4
在root权限下添加环境变量
vim /etc/profile
#set java
export JAVA_HOME=/home/hadoop/software/java/jdk1.6.0_45
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
#set hadoop
export HADOOP_HOME=/home/hadoop/software/hadoop/hadoop-1.0.4
export PATH=$HADOOP_HOME/bin:$PATH
验证:
[[email protected] hadoop]# java -version
java version "1.6.0_28"
OpenJDK Runtime Environment (IcedTea6 1.13.0pre) (rhel-1.66.1.13.0.el6-x86_64)
OpenJDK 64-Bit Server VM (build 23.25-b01, mixed mode)
[[email protected] hadoop]# hadoop version
Warning: $HADOOP_HOME is deprecated.
Hadoop 1.0.4
Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290
Compiled by hortonfo on Wed Oct 3 05:13:58 UTC 2012
From source with checksum fe2baea87c4c81a2c505767f3f9b71f4
(九)修改hadoop配置文件
cd /home/hadoop/software/hadoop/hadoop-1.0.4/conf
vim masters (指定备用主节点)
datanode2
vim slaves (指定数据节点)
datanode1
datanode2
hadoop-env.sh
export JAVA_HOME=/home/hadoop/software/java/jdk1.6.0_45
hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.http.address</name>
<value>192.168.59.144:50070</value>
<description>
The address and the base port where the dfs namenode web ui will listen on.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>dfs.balance.bandwidthPerSec</name>
<value>10485760</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
core-site.xml
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.59.144:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>1440</value>
<description>
Number of minutes between trash checkpoints.
If zero, the trash feature is disabled.
</description>
</property>
mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>http://192.168.59.144:9001</value>
</property>
(十)将software文件夹发送给datanode1和datanode2,并在datanode1和datanode2上配置java和Hadoop环境变量
[[email protected] ~]$ scp -r /home/hadoop/software [email protected]:/home/hadoop
[[email protected] ~]$ scp -r /home/hadoop/software [email protected]:/home/hadoop
(十一)启动Hadoop
在hadoop用户下,在master(namenode)节点上格式化并启动namenode:
[[email protected] ~]$ cd /home/hadoop/software/hadoop/hadoop-1.0.4/bin
[[email protected] bin]$ hadoop namenode -format
[[email protected] bin]$ start-all.sh
Warning: $HADOOP_HOME is deprecated.
starting namenode, logging to /home/hadoop/software/hadoop/hadoop-1.0.4/libexec/../logs/hadoop-hadoop-namenode-namenode.out
datanode1: starting datanode, logging to /home/hadoop/software/hadoop/hadoop-1.0.4/libexec/../logs/hadoop-hadoop-datanode-datanode1.out
datanode2: starting datanode, logging to /home/hadoop/software/hadoop/hadoop-1.0.4/libexec/../logs/hadoop-hadoop-datanode-datanode2.out
datanode2: starting secondarynamenode, logging to /home/hadoop/software/hadoop/hadoop-1.0.4/libexec/../logs/hadoop-hadoop-secondarynamenode-datanode2.out
starting jobtracker, logging to /home/hadoop/software/hadoop/hadoop-1.0.4/libexec/../logs/hadoop-hadoop-jobtracker-namenode.out
datanode1: starting tasktracker, logging to /home/hadoop/software/hadoop/hadoop-1.0.4/libexec/../logs/hadoop-hadoop-tasktracker-datanode1.out
datanode2: starting tasktracker, logging to /home/hadoop/software/hadoop/hadoop-1.0.4/libexec/../logs/hadoop-hadoop-tasktracker-datanode2.out
每台机器上jps查看进程
[[email protected] ~]$ jps
6070 NameNode
6301 Jps
6223 JobTracker
[[email protected] ~]$ jps
27793 DataNode
27881 TaskTracker
27921 Jps
[[email protected] ~]$ jps
4628 SecondaryNameNode
4756 Jps
4547 DataNode
4716 TaskTracker
看下datanode的信息:
[[email protected] bin]$ hadoop dfsadmin -report
Warning: $HADOOP_HOME is deprecated.
Configured Capacity: 37139136512 (34.59 GB)
Present Capacity: 27002056734 (25.15 GB)
DFS Remaining: 27001999360 (25.15 GB)
DFS Used: 57374 (56.03 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 2 (2 total, 0 dead)
Name: 192.168.59.146:50010
Decommission Status : Normal
Configured Capacity: 18569568256 (17.29 GB)
DFS Used: 28687 (28.01 KB)
Non DFS Used: 4953018353 (4.61 GB)
DFS Remaining: 13616521216(12.68 GB)
DFS Used%: 0%
DFS Remaining%: 73.33%
Last contact: Thu Apr 23 16:01:29 CST 2015
Name: 192.168.59.145:50010
Decommission Status : Normal
Configured Capacity: 18569568256 (17.29 GB)
DFS Used: 28687 (28.01 KB)
Non DFS Used: 5184061425 (4.83 GB)
DFS Remaining: 13385478144(12.47 GB)
DFS Used%: 0%
DFS Remaining%: 72.08%
Last contact: Thu Apr 23 16:01:30 CST 2015
web界面管理Hadoop:
HDFS:
http:// 133.116.8.16:50070
Map/Reduce:
http:// 133.116.8.17:50030