一、Hadoop基础
1、伪分布式模型(单节点)
1.1 配置centos7默认JDK1.7的环境变量
[[email protected] ~]# vim /etc/profile.d/java.sh
i
export JAVA_HOME=/usr
[[email protected] ~]# source /etc/profile.d/java.sh
安装jdk-devl包:
[[email protected] ~]# yum install java-1.7.0-openjdk-devel.x86_64
1.2 创建hadoop目录,并将hadoop展开至目录
[[email protected] ~]# mkdir /bdapps
[[email protected] ~]# tar xf hadoop-2.6.2.tar.gz -C /bdapps/
[[email protected] ~]# cd /bdapps/
创建软链接:
[[email protected] bdapps]# ln -sv hadoop-2.6.2 hadoop
1.2 设置hadoop环境变量
[[email protected] hadoop]# vim /etc/profile.d/hadoop.sh
export HADOOP_PREFIX=/bdapps/hadoop
export PATH=$PATH:${HADOOP_PREFIX}/bin:${HADOOP_PREFIX}/sbin
export HADOOP_YARN_HOME=${HADOOP_PREFIX}
export HADOOP_MAPPERD_HOME=${HADOOP_PREFIX}
export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
重载文件:
[[email protected] ~]# source /etc/profile.d/hadoop.sh
1.3 创建运行Hadoop进程的用户和相关目录
创建组
[[email protected] ~]# groupadd hadoop
创建用户,划入hadoop组
[[email protected] ~]# useradd -g hadoop yarn
[[email protected] ~]# useradd -g hadoop hdfs
[[email protected] ~]# useradd -g hadoop mapred
创建数据目录:
[[email protected] ~]# mkdir -pv /data/hadoop/hdfs/{nn,snn,dn}
数据目录授权:
[[email protected] ~]# chown -R hdfs:hadoop /data/hadoop/hdfs
[[email protected] ~]# ll /data/hadoop/hdfs
total 0
drwxr-xr-x 2 hdfs hadoop 6 Apr 19 08:44 dn
drwxr-xr-x 2 hdfs hadoop 6 Apr 19 08:44 nn
drwxr-xr-x 2 hdfs hadoop 6 Apr 19 08:44 snn
创建日志目录并配置用户权限(在安装目录下配置):
[[email protected] ~]# cd /bdapps/hadoop
[[email protected] hadoop]# mkdir logs
[[email protected] hadoop]# chmod g+w logs/
[[email protected] hadoop]# chown -R yarn:hadoop logs
[[email protected] hadoop]# ll | grep log
drwxrwxr-x 2 yarn hadoop 6 Apr 19 08:47 logs
修改安装目录属主属组
[[email protected] hadoop]# chown -R yarn:hadoop ./*
1.4 配置hadoop
配置NS:
[[email protected] hadoop]# pwd
/bdapps/hadoop/etc/hadoop
[[email protected] hadoop]# vim core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:8020</value>
<final>true</final>
</property>
</configuration>
配置hdfs相关属性:
[[email protected] hadoop]# vim hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///data/hadoop/hdfs/nn</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///data/hadoop/hdfs/dn</value>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>file:///data/hadoop/hdfs/snn</value>
</property>
<property>
<name>fs.checkpoint.edits.dir</name>
<value>file:///data/hadoop/hdfs/snn</value>
</property>
</configuration>
配置mapred(MapReduce)
[[email protected] hadoop]# cp mapred-site.xml.template mapred-site.xml
[[email protected] hadoop]# vim mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
配置yarn:
[[email protected] hadoop]# vim yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>localhost:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>localhost:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>10.201.106.131:8088</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>
</configuration>
1.5 定义从节点,伪分布模式默认从节点是自己,不用定义
[[email protected] hadoop]# cat slaves
localhost
1.6 格式化HDFS
切换hdfs用户:
[[email protected] ~]# su - hdfs
hdfs命令查看帮助:
[[email protected] ~]$ hdfs --help
格式化:
[[email protected] ~]$ hdfs namenode -format
查看:
[[email protected] ~]$ ls /data/hadoop/hdfs/nn/current/
fsimage_0000000000000000000 seen_txid
fsimage_0000000000000000000.md5 VERSION
1.7 启动hadoop
1.7.1 mapreduce相关启动
以hdfs用户启动相关进程:
启动名称节点:
[[email protected] ~]$ hadoop-daemon.sh start namenode
查看java进程:
[[email protected] ~]$ jps
9127 NameNode
9220 Jps
查看详细java进程信息:
[[email protected] ~]$ jps -v
启动辅助名称节点:
[[email protected] ~]$ hadoop-daemon.sh start secondarynamenode
启动data节点:
[[email protected] ~]$ hadoop-daemon.sh start datanode
远程上传文件测试:
[[email protected] ~]$ hdfs dfs -mkdir /test
[[email protected] ~]$ hdfs dfs -put /etc/fstab /test/fstab
[[email protected] ~]$ hdfs dfs -ls /test
Found 1 items
-rw-r--r-- 1 hdfs supergroup 1065 2018-04-20 15:04 /test/fstab
这个就是刚才上传fstab文件
[[email protected] ~]$ cat /data/hadoop/hdfs/dn/current/BP-908063675-10.201.106.131-1524136482474/current/finalized/subdir0/subdir0/blk_1073741825
本地宿主机存放数据的目录(文件系统):
[[email protected] ~]$ ls /data/hadoop/hdfs/dn/current/
BP-908063675-10.201.106.131-1524136482474 VERSION
1.7.2 yarn集群启动
切换到yarn用户:
[[email protected] ~]# su - yarn
启动resourcemanager:
[[email protected] ~]$ yarn-daemon.sh start resourcemanager
启动nodemanager:
[[email protected] ~]$ yarn-daemon.sh start nodemanager
1.8 查看hadoop状态
浏览器访问:http://10.201.106.131:50070
浏览器访问:http://10.201.106.131:8088
1.9 hadoop上提交程序并运行
1.9.1 运行mapreduce测试程序
切换用户:
[[email protected] mapreduce]# su - hdfs
运行测试程序:
[[email protected] ~]$ yarn jar /bdapps/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.2.jar
统计单词个数:
[[email protected] ~]$ yarn jar /bdapps/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.2.jar wordcount /test/fstab /test/fstab.out
查看统计结果:
[[email protected] ~]$ hdfs dfs -cat /test/fstab.out/part-r-00000
原文地址:http://blog.51cto.com/zhongle21/2106524
时间: 2024-08-13 22:45:09