大数据伪分布式搭建
***对于大数据这块相信大家对linux有一定的认识,所有对创建虚拟机那块就不给予详细的说明了。
基础环境的搭建
1.系统环境
平台:VMware Workstation pro
系统:centos 7
Hadoop版本: Apache Hadoop 3.0.0
本次实验是搭建一台master和两台node节点。因为我们主要的目的是想让大家了解一下Hadoop伪分布式的搭建流程,如果说大家的电脑小于8G的话,那就每台节点就大概开个1.5G左右,也是为了大家有一个好的体验。
修改主机名以及selinux(每台节点都要修改)
[[email protected] ~]# hostnamectl set-hostname master[[email protected] ~]# vi /etc/sysconfig/selinux 修改 SELINUX=disabled[[email protected] ~]# setenforce 0[[email protected] ~]# getenforce
配置主机映射
[[email protected] ~]# vi /etc/hosts添加:192.168.200.111 master192.168.200.112 node1192.168.200.113 node2**每台节点都要配置
配置免密登录
配置的目的是master节点到另外两个node节点通信、传输无障碍。
[[email protected] ~]# ssh-keygen -t rsa(一路按回车完成密钥生成)[[email protected] ~]#cd .ssh/[[email protected] .ssh]# lsid_rsa id_rsa.pub(进行公钥复制)[[email protected] .ssh]# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys(修改authorized_keys文件的权限)[[email protected] .ssh]# chmod 600 ~/.ssh/authorized_keys[[email protected] .ssh]# ls -ltotal 16-rw-------. 1 root root 393 Mar 15 10:19 authorized_keys-rw-------. 1 root root 1675 Mar 15 10:18 id_rsa-rw-r--r--. 1 root root 393 Mar 15 10:18 id_rsa.pub(将专用密钥添加到 ssh-agent 的高速缓存中)[[email protected] .ssh]# ssh-agent bash[[email protected] .ssh]# ssh-add ~/.ssh/id_rsa(将authorized_keys复制到node1和node2节点的根目录下)[[email protected] .ssh]# scp ~/.ssh/authorized_keys [email protected]:~/[[email protected] .ssh]# scp ~/.ssh/authorized_keys [email protected]:~/在node1节点进行操作[[email protected] ~]# ssh-keygen -t rsa[[email protected] ~]# mv authorized_keys ~/.ssh/node2节点和node1节点的操作一样的在master节点验证免密是否成功[[email protected] .ssh]# ssh node1Last login: Thu Mar 19 06:56:50 2020 from 192.168.200.1[[email protected] ~]# [[email protected] .ssh]# ssh node2Last login: Thu Mar 19 06:56:58 2020 from 192.168.200.1[[email protected] ~]# 如果和什么的返回的结果成功了,否则就仔细检查一下上述步骤。??
JDK的安装
1. 将JDK压缩包通过secureFX传到master节点的opt目录下。2. 选择一个路径创建一个bigdata文件夹,笔者选择的是opt下面。[[email protected] opt]# lsbigdata centos hadoop-3.0.0.tar.gz jdk-8u161-linux-x64.tar.gz解压JDK压缩包[[email protected] opt]# tar -zxvf jdk-8u161-linux-x64.tar.gz[[email protected] opt]# mv jdk1.8.0_161 bigdata/(把解压得到的jdk1.8.0_161移动到bigdata里面)[[email protected] bigdata]# lsjdk1.8.0_161(配置Java环境变量)[[email protected] bigdata]# vi /etc/profile添加:export JAVA_HOME="/opt/bigdata/jdk1.8.0_161" (这里填的是你的jdk1.8.0_161的绝对路径)export PATH=$JAVA_HOME/bin:$PATH使环境变量生效并且验证[[email protected] bigdata]# source /etc/profile[[email protected] bigdata]# java -versionjava version "1.8.0_161"Java(TM) SE Runtime Environment (build 1.8.0_161-b12)Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)3.拷贝master节点的jdk到node1和node2上面,并且按相同的方式配置Java环境变量[[email protected] /]# scp -r /opt/bigdata/ node1:/opt/[[email protected] /]# scp -r /opt/bigdata/ node2:/opt/
Hadoop的安装
1.通过secureFX把Hadoop压缩包上传至/opt/bigdata目录,并且解压压缩包。[[email protected] /]# cd /opt/ [[email protected] opt]# ls bigdata hadoop-3.0.0.tar.gz jdk-8u161-linux-x64.tar.gz[[email protected] opt]# tar -zxvf hadoop-3.0.0.tar.gz [[email protected] opt]# mv hadoop-3.0.0 bigdata 把解压得到的hadoop-3.0.0 目录拷贝到bigdata目录里[[email protected] opt]# cd bigdata [[email protected] bigdata]#ls hadoop-3.0.0 jdk1.8.0_1612.配置Hadoop的环境变量[[email protected] ~]# vi /etc/profile添加:export HADOOP_HOME=/opt/bigdata/hadoop-3.0.0export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH激活环境变量[[email protected] bigdata]# source /etc/profile验证配置是否成功[[email protected] bigdata]# hadoop versionHadoop 3.0.0Source code repository https://git-wip-us.apache.org/repos/asf/hadoop.git -r c25427ceca461ee979d30edd7a4b0f50718e6533Compiled by andrew on 2017-12-08T19:16ZCompiled with protoc 2.5.0From source with checksum 397832cb5529187dc8cd74ad54ff22This command was run using /opt/bigdata/hadoop-3.0.0/share/hadoop/common/hadoop-common-3.0.0.jar
配置Hadoop
一共需要修改五个文件。1.在/opt/bigdata/hadoop-3.0.0/etc/hadoop/路径下,找到hadoop-env.sh[[email protected] hadoop]# vi hadoop-env.sh配置: export JAVA_HOME=/opt/bigdata/jdk1.8.0_161 这里的JAVA_HOME必须配置绝对路径?2.[[email protected] hadoop]# vi core-site.xml添加:configuration> <property> <name>fs.default.name</name> <value>hdfs://master:9000</value> </property> <property> <name>hadoop.temp.dir</name> <value>/opt/bigdata/hadoop-3.0.0/tmp</value> </property></configuration> ?3.[[email protected] hadoop]# vi hdfs-site.xml添加:<configuration> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/opt/bigdata/hadoop-3.0.0/hdfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/opt/bigdata/hadoop-3.0.0/hdfs/data</value> </property><property> <name>dfs.namenode.secondary.http-address</name> <value>node1:9001</value> </property> <property> <name>dfs.http.address</name> <value>0.0.0.0:50070</value> </property></configuration>?4.[[email protected] hadoop]# vi mapred-site.xml添加:<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapred.job.tracker.http.address</name> <value>0.0.0.0:50030</value> </property> <property> <name>mapred.task.tracker.http.address</name> <value>0.0.0.0:50060</value> </property><property> <name>mapreduce.applicaton.classpath</name> <value> /opt/bigdata/hadoop-3.0.0/etc/hadoop, /opt/bigdata/hadoop-3.0.0/share/hadoop/common/*, /opt/bigdata/hadoop-3.0.0/share/hadoop/common/lib/*, /opt/bigdata/hadoop-3.0.0/share/hadoop/hdfs/*, /opt/bigdata/hadoop-3.0.0/share/hadoop/hdfs/lib/* /opt/bigdata/hadoop-3.0.0/share/hadoop/mapreduce/*, /opt/bigdata/hadoop-3.0.0/share/hadoop/yarn/*, /opt/bigdata/hadoop-3.0.0/share/hadoop/yarn/lib/* </value> </property></configuration>?5.[[email protected] hadoop]# vi yarn-site.xml添加:<configuration><!-- Site specific YARN configuration properties --> <property> <name>yarn.resourcemanager.hostname</name> <value>master</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>master:8099</value> </property><property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value></property><property> <name>yarn.application.classpath</name> <value>/opt/bigdata/hadoop-3.0.0/etc/hadoop:/opt/bigdata/hadoop-3.0.0/share/hadoop/common/lib/*:/opt/bigdata/hadoop-3.0.0/share/hadoop/common/*:/opt/bigdata/hadoop-3.0.0/share/hadoop/hdfs:/opt/bigdata/hadoop-3.0.0/share/hadoop/hdfs/lib/*:/opt/bigdata/hadoop-3.0.0/share/hadoop/hdfs/*:/opt/bigdata/hadoop-3.0.0/share/hadoop/mapreduce/*:/opt/bigdata/hadoop-3.0.0/share/hadoop/yarn:/opt/bigdata/hadoop-3.0.0/share/hadoop/yarn/lib/*:/opt/bigdata/hadoop-3.0.0/share/hadoop/yarn/* </value> </property></configuration>?6.配置workers[email protected] hadoop]# vi workers添加:node1node2?7.将配置好的hadoop复制到node1和node2[[email protected] bigdata]# scp -r hadoop-3.0.0/ node1:/opt/bigdata[[email protected] bigdata]# scp -r hadoop-3.0.0/ node2:/opt/bigdata
配置启动脚本文件
1.在/opt/bigdata/hadoop-3.0.0/sbin路径下找到start-dfs.sh和stop-dfs.sh[[email protected] sbin]# vi start-dfs.sh 添加:HDFS_DATANODE_USER=rootHDFS_DATANODE_SECURE_USER=hdfsHDFS_NAMENODE_USER=rootHDFS_SECONDARYNAMENODE_USER=root两个脚本文件添加的内容一样2.然后找到start-yarn.sh和stop-yarn.sh 添加:YARN_RESOURCEMANAGER_USER=rootHADOOP_SECURE_DN_USER=yarnYARN_NODEMANAGER_USER=root
关闭防火墙以及启动hadoop
1.关闭防火墙[[email protected] ~]# systemctl stop firewalld.service[[email protected] ~]# systemctl disable firewalld.service2.首次启动需要格式化命名节点[[email protected] ~]# hadoop namenode -format3.启动[[email protected] ~]# start-all.shStarting namenodes on [master]Last login: Thu Mar 19 10:06:13 EDT 2020 from 192.168.200.1 on pts/2Starting datanodesLast login: Thu Mar 19 10:13:28 EDT 2020 on pts/2Starting secondary namenodes [node1]Last login: Thu Mar 19 10:13:31 EDT 2020 on pts/2Starting resourcemanagerLast login: Thu Mar 19 10:13:46 EDT 2020 on pts/2Starting nodemanagersLast login: Thu Mar 19 10:13:56 EDT 2020 on pts/24.检验在各个节点jps下的进程[[email protected] ~]# jps2753 NameNode3505 Jps3155 ResourceManager[[email protected] ~]# jps2658 Jps2500 DataNode2617 NodeManager2557 SecondaryNameNode[[email protected] ~]# jps2741 Jps2569 NodeManager2509 DataNode
最后访问master://50070
至此大数据伪分布式搭建完毕,欢迎大家交流学习!!!!!!!!!
原文地址:https://www.cnblogs.com/lfz0/p/12530817.html
时间: 2024-10-09 07:51:51