环境
1、centos6.5(64位)
机器规划及节点分布
主机 | 角色 | 节点 | 节点 | 节点 | 节点 | 节点 |
---|---|---|---|---|---|---|
192.168.115.132 | master | namenode | journalnode | zk | hive | |
192.168.115.133 | slave1 | namenode | datanode | journalnode | zk | hive |
192.168.115.134 | slave2 | datanode | journalnode | zk |
目录设置
dfs.namenode.name.dir = file:/home/hadoop/data/name
dfs.datanode.data.dir = file:/home/hadoop/data/datanode
dfs.namenode.edits.dir = file:/home/hadoop/data/hdfs/edits
dfs.journalnode.edits.dir = /home/hadoop/data/journaldata/jn
dfs.hosts.exclude = /home/hadoop/app/hadoop-2.6.0-cdh5.8.0/etc/hadoop/excludes 文件
pid目录:/home/hadoop/data/pid
临时目录:/home/hadoop/data/tmp
安装
1、分别在三台集群节点创建hadoop用户
2、jdk安装
下载jdk-8u65-linux-x64.tar.gz版本, 在hadoop的用户主目录修改.bash_profile设置jdk环境变量
3、配置主机别名
- 1、修改vi/etc/hosts(注意:三台机器上都需要添加,踩过此坑)
192.168.115.132 master
192.168.115.133 slave1
192.168.115.134 slave2
分别在三台节点上执行
hostname master
hostname slave1
hostname slave2
避免开机启动别名失效,需要修改 vi /etc/sysconfig/network中的hostname
4、关闭防火墙
1、查看防火墙的状态:service iptables status
2、关闭防火墙:service iptables stop
3、开机启动时防火墙也关闭:
- chkconfig --list | grep iptables
- chkconfig iptables off
5、关闭selinux
1、编辑vi /etc/sysconfig/selinux设置SELINUX=disabled
6、普通用户设置sudo权限
7、配置ssh免密钥登陆
- 1、到用户的当前目录下执行:ssh-keygen -t rsa 一路回车
- 2、执行:cp id_rsa.pub authorized_keys
- 3、其它机器需要执行ssh命令生成私钥和公钥
- 4、将其它机器生成的公钥里的内容拷贝到主节点的生成的authorized_keys 文件中
- 5、将配置拥有三台机器公钥的authorized_keys 文件 分发到其它机器上 scp authorized_keys [email protected]:~/.ssh/
- 6、修改权限 hadoop 用户登陆 chmod 700 ~/.ssh chmod 600 ~/.ssh/authorized_keys
zookeeper 集群安装
- 1、解压zookeeper压缩包,修改conf/zoo.cfg文件
- 在hadoop目录创建相应的zookeeper data目录 mkdir /home/hadoop/data/zookeeper (每台机器都是相同的配置)
- 在 /home/hadoop/data/zookeeper这个目录下,创建myid文件,写入对应的serverId 启动zookeeper 到bin目录下 /zkServer.sh start
hadoop-2.6.0-cdh5.8.0.tar 集群安装
- 在hadoop目录下创建相应的目录
mkdir -p /home/hadoop/data/name
mkdir -p /home/hadoop/data/datanode
mkdir -p /home/hadoop/data/hdfs/edits
mkdir -p /home/hadoop/data/journaldata/jn
mkdir -p /home/hadoop/data/pid
mkdir -p /home/hadoop/data/tmp
- 修改配置文件
1 # Licensed to the Apache Software Foundation (ASF) under one 2 # or more contributor license agreements. See the NOTICE file 3 # distributed with this work for additional information 4 # regarding copyright ownership. The ASF licenses this file 5 # to you under the Apache License, Version 2.0 (the 6 # "License"); you may not use this file except in compliance 7 # with the License. You may obtain a copy of the License at 8 # 9 # http://www.apache.org/licenses/LICENSE-2.0 10 # 11 # Unless required by applicable law or agreed to in writing, software 12 # distributed under the License is distributed on an "AS IS" BASIS, 13 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 # See the License for the specific language governing permissions and 15 # limitations under the License. 16 17 # Set Hadoop-specific environment variables here. 18 19 # The only required environment variable is JAVA_HOME. All others are 20 # optional. When running a distributed configuration it is best to 21 # set JAVA_HOME in this file, so that it is correctly defined on 22 # remote nodes. 23 24 # The java implementation to use. 25 export JAVA_HOME=/usr/local/java/jdk1.8.0_65 26 export HADOOP_HOME=/home/hadoop/app/hadoop-2.6.0-cdh5.8.0 27 export PATH=$PATH:$HADOOP_HOME/bin 28 export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop 29 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HADOOP_HOME}/lib/native:${HADOOP_HOME}/lib/native/Linux-amd64-64 30 31 32 33 # The jsvc implementation to use. Jsvc is required to run secure datanodes 34 # that bind to privileged ports to provide authentication of data transfer 35 # protocol. Jsvc is not required if SASL is configured for authentication of 36 # data transfer protocol using non-privileged ports. 37 #export JSVC_HOME=${JSVC_HOME} 38 39 export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"} 40 41 # Extra Java CLASSPATH elements. Automatically insert capacity-scheduler. 42 for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do 43 if [ "$HADOOP_CLASSPATH" ]; then 44 export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f 45 else 46 export HADOOP_CLASSPATH=$f 47 fi 48 done 49 50 # The maximum amount of heap to use, in MB. Default is 1000. 51 export HADOOP_HEAPSIZE=512 52 #export HADOOP_NAMENODE_INIT_HEAPSIZE="" 53 54 # Extra Java runtime options. Empty by default. 55 export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true" 56 57 # Command specific options appended to HADOOP_OPTS when specified 58 export HADOOP_NAMENODE_OPTS="-Xmx512m -Xms512m -Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS" 59 export HADOOP_DATANODE_OPTS="-Xmx256m -Xms256m -Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS" 60 61 export HADOOP_SECONDARYNAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_SECONDARYNAMENODE_OPTS" 62 63 export HADOOP_NFS3_OPTS="$HADOOP_NFS3_OPTS" 64 export HADOOP_PORTMAP_OPTS="-Xmx512m $HADOOP_PORTMAP_OPTS" 65 66 # The following applies to multiple commands (fs, dfs, fsck, distcp etc) 67 export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS" 68 #HADOOP_JAVA_PLATFORM_OPTS="-XX:-UsePerfData $HADOOP_JAVA_PLATFORM_OPTS" 69 70 # On secure datanodes, user to run the datanode as after dropping privileges. 71 # This **MUST** be uncommented to enable secure HDFS if using privileged ports 72 # to provide authentication of data transfer protocol. This **MUST NOT** be 73 # defined if SASL is configured for authentication of data transfer protocol 74 # using non-privileged ports. 75 export HADOOP_SECURE_DN_USER=${HADOOP_SECURE_DN_USER} 76 77 # Where log files are stored. $HADOOP_HOME/logs by default. 78 #export HADOOP_LOG_DIR=${HADOOP_LOG_DIR}/$USER 79 80 # Where log files are stored in the secure data environment. 81 export HADOOP_SECURE_DN_LOG_DIR=${HADOOP_LOG_DIR}/${HADOOP_HDFS_USER} 82 83 ### 84 # HDFS Mover specific parameters 85 ### 86 # Specify the JVM options to be used when starting the HDFS Mover. 87 # These options will be appended to the options specified as HADOOP_OPTS 88 # and therefore may override any similar flags set in HADOOP_OPTS 89 # 90 # export HADOOP_MOVER_OPTS="" 91 92 ### 93 # Advanced Users Only! 94 ### 95 96 # The directory where pid files are stored. /tmp by default. 97 # NOTE: this should be set to a directory that can only be written to by 98 # the user that will run the hadoop daemons. Otherwise there is the 99 # potential for a symlink attack. 100 export HADOOP_PID_DIR=/home/hadoop/data/pid 101 export HADOOP_SECURE_DN_PID_DIR=/home/hadoop/data/pid 102 103 # A string representing this instance of hadoop. $USER by default. 104 export HADOOP_IDENT_STRING=$USER
core-site.xml 文件
1 <?xml version="1.0" encoding="UTF-8"?> 2 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 3 <!-- 4 Licensed under the Apache License, Version 2.0 (the "License"); 5 you may not use this file except in compliance with the License. 6 You may obtain a copy of the License at 7 8 http://www.apache.org/licenses/LICENSE-2.0 9 10 Unless required by applicable law or agreed to in writing, software 11 distributed under the License is distributed on an "AS IS" BASIS, 12 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 See the License for the specific language governing permissions and 14 limitations under the License. See accompanying LICENSE file. 15 --> 16 17 <!-- Put site-specific property overrides in this file. --> 18 19 <configuration> 20 <property> 21 <name>fs.defaultFS</name> 22 <value>hdfs://mycluster</value> 23 <description>hdfs des</description> 24 </property> 25 26 <property> 27 <name>hadoop.tmp.dir</name> 28 <value>/home/hadoop/data/tmp</value> 29 <description>data tmp des</description> 30 </property> 31 32 <property> 33 <name>io.native.lib.available</name> 34 <value>true</value> 35 <description>should native hadoop libraries</description> 36 </property> 37 38 <property> 39 <name>fs.trash.interval</name> 40 <value>1440</value> 41 </property> 42 43 <property> 44 <name>io.compression.codecs</name> 45 <value>org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec</value> 46 <description>compression</description> 47 </property> 48 49 </configuration>
hdfs-site.xml 文件
1 <?xml version="1.0" encoding="UTF-8"?> 2 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 3 <!-- 4 Licensed under the Apache License, Version 2.0 (the "License"); 5 you may not use this file except in compliance with the License. 6 You may obtain a copy of the License at 7 8 http://www.apache.org/licenses/LICENSE-2.0 9 10 Unless required by applicable law or agreed to in writing, software 11 distributed under the License is distributed on an "AS IS" BASIS, 12 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 See the License for the specific language governing permissions and 14 limitations under the License. See accompanying LICENSE file. 15 --> 16 17 <!-- Put site-specific property overrides in this file. --> 18 19 <configuration> 20 <property> 21 <name>dfs.nameservices</name> 22 <value>mycluster</value> 23 <description>Comma-separated list of nameservices</description> 24 </property> 25 26 <property> 27 <name>dfs.datanode.address</name> 28 <value>0.0.0.0:50011</value> 29 <description>The datanode server address and port for data transfer</description> 30 </property> 31 32 <property> 33 <name>dfs.datanode.http.address</name> 34 <value>0.0.0.0:50076</value> 35 <description>The datanode http server address and port.</description> 36 </property> 37 38 <property> 39 <name>dfs.datanode.ipc.address</name> 40 <value>0.0.0.0:50021</value> 41 <description>The datanode ipc server address and port.</description> 42 </property> 43 44 <property> 45 <name>dfs.namenode.name.dir</name> 46 <value>file:/home/hadoop/data/name</value> 47 <description> 48 Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy. 49 </description> 50 <final>true</final> 51 </property> 52 53 <property> 54 <name>dfs.namenode.edits.dir</name> 55 <value>file:/home/hadoop/data/hdfs/edits</value> 56 <description> </description> 57 </property> 58 59 <property> 60 <name>dfs.datanode.data.dir</name> 61 <value>file:/home/hadoop/data/datanode</value> 62 <description> </description> 63 </property> 64 65 <property> 66 <name>dfs.replication</name> 67 <value>2</value> 68 </property> 69 70 <property> 71 <name>dfs.permission</name> 72 <value>true</value> 73 </property> 74 75 <property> 76 <name>dfs.datanode.hdfs-blocks-metadata.enabled</name> 77 <value>true</value> 78 <description> 79 Boolean which enables backend datanode-side support for the experimental DistributedFileSystem#getFileVBlockStorageLocations API. 80 </description> 81 </property> 82 83 <property> 84 <name>dfs.permissions.enabled</name> 85 <value>true</value> 86 <description> 87 Boolean which enables backend datanode-side support for the experimental DistributedFileSystem#getFileVBlockStorageLocations API. 88 </description> 89 </property> 90 91 <property> 92 <name>dfs.ha.namenodes.mycluster</name> 93 <value>nn1,nn2</value> 94 <description> </description> 95 </property> 96 97 <property> 98 <name>dfs.namenode.rpc-address.mycluster.nn1</name> 99 <value>master:8030</value> 100 <description> </description> 101 </property> 102 103 <property> 104 <name>dfs.namenode.rpc-address.mycluster.nn2</name> 105 <value>slave1:8030</value> 106 <description> </description> 107 </property> 108 109 <property> 110 <name>dfs.namenode.http-address.mycluster.nn1</name> 111 <value>master:50082</value> 112 <description> </description> 113 </property> 114 115 <property> 116 <name>dfs.namenode.http-address.mycluster.nn2</name> 117 <value>slave1:50082</value> 118 <description> </description> 119 </property> 120 121 <property> 122 <name>dfs.namenode.shared.edits.dir</name> 123 <value>qjournal://master:8488;slave1:8488;slave2:8488;/test</value> 124 <description> </description> 125 </property> 126 127 <property> 128 <name>dfs.journalnode.edits.dir</name> 129 <value>/home/hadoop/data/journaldata/jn</value> 130 <description> </description> 131 </property> 132 133 <property> 134 <name>dfs.journalnode.rpc-address</name> 135 <value>0.0.0.0:8488</value> 136 <description> </description> 137 </property> 138 139 <property> 140 <name>dfs.journalnode.http-address</name> 141 <value>0.0.0.0:8483</value> 142 <description></description> 143 </property> 144 145 <property> 146 <name>dfs.client.failover.proxy.provider.mycluster</name> 147 <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> 148 </property> 149 150 <property> 151 <name>dfs.ha.fencing.methods</name> 152 <value>shell(/bin/true)</value> 153 </property> 154 155 <property> 156 <name>dfs.ha.fencing.ssh.connect-timeout</name> 157 <value>10000</value> 158 </property> 159 160 <property> 161 <name>dfs.ha.fencing.methods</name> 162 <value>sshfence</value> 163 </property> 164 165 <property> 166 <name>dfs.ha.automatic-failover.enabled</name> 167 <value>true</value> 168 </property> 169 170 <property> 171 <name>dfs.ha.fencing.methods</name> 172 <value>sshfence</value> 173 </property> 174 175 <property> 176 <name>ha.zookeeper.quorum</name> 177 <value>slave1:2181</value> 178 </property> 179 180 <property> 181 <name>dfs.datanode.max.xcievers</name> 182 <value>8192</value> 183 </property> 184 185 <property> 186 <name>dfs.datanode.max.transfer.threads</name> 187 <value>4096</value> 188 </property> 189 190 <property> 191 <name>dfs.blocksize</name> 192 <value>64m</value> 193 </property> 194 195 <property> 196 <name>dfs.namenode.handler.count</name> 197 <value>10</value> 198 </property> 199 200 <property> 201 <name>dfs.datanode.du.reserved</name> 202 <value>5368709120</value> 203 </property> 204 205 <property> 206 <name>dfs.namenode.fs-limits.min-block-size</name> 207 <value>1</value> 208 </property> 209 210 <property> 211 <name>dfs.namenode.fs-limits.max-blocks-per-file</name> 212 <value>1048576</value> 213 </property> 214 215 <property> 216 <name>dfs.datanode.balance.bandwidthPerSec</name> 217 <value>3145728</value> 218 </property> 219 220 <property> 221 <name>dfs.hosts.exclude</name> 222 <value>/home/hadoop/app/hadoop-2.6.0-cdh5.8.0/etc/hadoop/excludes</value> 223 </property> 224 225 <property> 226 <name>dfs.image.compress</name> 227 <value>true</value> 228 </property> 229 230 <property> 231 <name>dfs.image.compression.codec</name> 232 <value>org.apache.hadoop.io.compress.DefaultCodec</value> 233 </property> 234 235 <property> 236 <name>dfs.image.transfer.timeout</name> 237 <value>60000</value> 238 </property> 239 240 <property> 241 <name>dfs.image.transfer.bandwidthPerSec</name> 242 <value>4194304</value> 243 </property> 244 245 <property> 246 <name>dfs.image.transfer.chunksize</name> 247 <value>65536</value> 248 </property> 249 250 <property> 251 <name>dfs.namenode.edits.noeditlogchannelflush</name> 252 <value>true</value> 253 </property> 254 255 <property> 256 <name>dfs.datanode.failed.volumes.tolerated</name> 257 <value>0</value> 258 </property> 259 </configuration>
- hdfs文件系统初始化
启动zookeeer zkServer.sh start
启动journalnode(所有的journalnode)
./sbin/hadoop-daemon.sh start journalnode (以下都是在hadoop的安装目录下执行)
主节点执行初始化操作(主namenode)
./bin/hdfs namenode -format ./bin/hdfs zkfc -formatZK ./bin/hdfs namenode
备节点同步数据
./bin/hdfs namenode -boot 备节点执行(slave1)(在hadoop的安装目录执行) ./bin/hdfs namenode -bootstrapStandby(同步主节点和备节点数据)
停止 hadoop
在master按下ctrl+c结束namenode 去各个节点停掉journalnode (在hadoop的安装目录执行) ./sbin/hadoop-daemon.sh stop journalnode
一键启动hdfs start-dfs.sh