【甘道夫】Hadoop2.4.1尝鲜部署+完整版配置文件 / 憋错料

引言

转眼间，Hadoop的stable版本已经升级到2.4.1了，社区的力量真是强大！3.0啥时候release呢？

今天做了个调研，尝鲜了一下2.4.1版本的分布式部署，包括NN HA（目前已经部署好了2.2.0的NN HA，ZK和ZKFC用现成的），顺便也结合官方文档 http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/ClusterSetup.html 梳理、补全了关键的配置文件属性，将同类属性归类，方便以后阅读修改，及作为模板使用。

下面记录参照官方文档及过去经验部署2.4.1的过程。

欢迎转载，请注明来源：http://blog.csdn.net/u010967382/article/details/37653177

注意

1.本文只记录配置文件，不记录其余部署过程，其余过程和2.2.0相同，参见

http://blog.csdn.net/u010967382/article/details/20380387

http://blog.csdn.net/u010967382/article/details/30976935

2.配置中所有的路径、IP、hostname均需根据实际情况修改。

1.实验环境：

4节点集群，ZK节点3个，hosts文件和各节点角色分配如下：

hosts：

192.168.66.91 master

192.168.66.92 slave1

192.168.66.93 slave2

192.168.66.94 slave3

角色分配：

	Active NN	Standby NN	DN	JournalNode	Zookeeper	FailoverController
master	V			V	V	V
slave1		V	V	V	V	V
slave2			V	V	V
slave3			V

2.hadoop-env.sh 修改以下三处即可

# The java implementation to use.

export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_07

# The directory where pid files are stored. /tmp by default.

# NOTE: this should be set to a directory that can only be written to by the user that will run the hadoop daemons. Otherwise there is the potential for a symlink attack.

export HADOOP_PID_DIR=/home/yarn/Hadoop/hadoop-2.4.1/hadoop_pid_dir

export HADOOP_SECURE_DN_PID_DIR=/home/yarn/Hadoop/hadoop-2.4.1/hadoop_pid_dir

3.core-site.xml 完整文件

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Licensed under the Apache License, Version 2.0 (the "License"); you

may not use this file except in compliance with the License. You may obtain

a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless

required by applicable law or agreed to in writing, software distributed

under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES

OR CONDITIONS OF ANY KIND, either express or implied. See the License for

the specific language governing permissions and limitations under the License.

See accompanying LICENSE file. -->

<name>fs.defaultFS</name>

<value>hdfs://myhadoop</value>

<description>NameNode UR，格式是hdfs://host:port/，如果开启了NN

HA特性，则配置集群的逻辑名，具体参见我的博客http://blog.csdn.net/u010967382/article/details/30976935

</description>

</property>

<name>hadoop.tmp.dir</name>

<value>/home/yarn/Hadoop/hadoop-2.4.1/tmp</value>

</property>

<name>io.file.buffer.size</name>

<description>Size of read/write buffer used in SequenceFiles.

</description>

</property>

<name>ha.zookeeper.quorum</name>

<value>master:2181,slave1:2181,slave2:2181</value>

<description>注意，配置了ZK以后，在格式化、启动NameNode之前必须先启动ZK，否则会报连接错误

</description>

</property>

</configuration>

4.hdfs-site.xml 完整文件