前言:前段时间将hadoop01的虚拟机弄的崩溃掉了,也没有备份,重新从hadoop02虚拟上克隆过来的,结果hadoop-eclipse插件一样的编译,居然用不起了,找了3天的原因,最后还是没有解决,只能用hadoop shell 命令去测试了,反正影响不大,只不过用着不方便而已。
心累中...........
正文:
解压安装Hadoop
[[email protected] ~]$ cp /home/hadoop/Resources/hadoop-3.2.0.tar.gz ~/ [[email protected] ~]$ tar -zxvf ~/hadoop-3.2.0.tar.gz [[email protected] ~]$ cd hadoop-3.2.0 [[email protected] hadoop-3.2.0]$ ls -l total 184 drwxr-xr-x. 2 hadoop hadoop 203 Jan 8 2019 bin drwxr-xr-x. 3 hadoop hadoop 20 Jan 8 2019 etc drwxr-xr-x. 2 hadoop hadoop 106 Jan 8 2019 include drwxr-xr-x. 3 hadoop hadoop 20 Jan 8 2019 lib drwxr-xr-x. 4 hadoop hadoop 4096 Jan 8 2019 libexec -rw-rw-r--. 1 hadoop hadoop 150569 Oct 19 2018 LICENSE.txt -rw-rw-r--. 1 hadoop hadoop 22125 Oct 19 2018 NOTICE.txt -rw-rw-r--. 1 hadoop hadoop 1361 Oct 19 2018 README.txt drwxr-xr-x. 3 hadoop hadoop 4096 Jan 8 2019 sbin drwxr-xr-x. 4 hadoop hadoop 31 Jan 8 2019 share
配置Hadoop环境变量
[[email protected] hadoop-3.2.0]$ gedit /home/hadoop/hadoop-3.2.0/etc/hadoop/hadoop-env.sh 编辑文件并保存: # The java implementation to use. By default, this environment # variable is REQUIRED on ALL platforms except OS X! # export JAVA_HOME= export JAVA_HOME=/usr/java/jdk1.8.0_11/
配置YARN环境变量
[[email protected] hadoop-3.2.0]$ gedit /home/hadoop/hadoop-3.2.0/etc/hadoop/yarn-env.sh 编辑文件并保存: export JAVA_HOME=/usr/java/jdk1.8.0_11/
配置核心组件文件(core-site.xml)
[[email protected] hadoop-3.2.0]$ gedit /home/hadoop/hadoop-3.2.0/etc/hadoop/core-site.xml 编辑文件并保存: <configuration> <!--hdfs 的默认地址、端口 访问地址--> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop01:9802</value> </property> <!--hdfs临时路径--> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/hadoopdata</value> </property> </configuration>
配置文件系统(hdfs-site.xml)
[[email protected] hadoop-3.2.0]$ gedit /home/hadoop/hadoop-3.2.0/etc/hadoop/hdfs-site.xml 编辑文件并保存: <configuration> <!--hdfs web的地址 --> <property> <name>dfs.namenode.http-address</name> <value>hadoop01:50070</value> </property> <!-- 副本数--> <property> <name>dfs.replication</name> <value>3</value> </property> <!-- 是否启用hdfs权限检查 false 关闭 --> <property> <name>dfs.permissions.enabled</name> <value>false</value> </property> <!-- 块大小,默认字节, 可使用 k m g t p e--> <property> <name>dfs.blocksize</name> <!--128m--> <value>134217728</value> </property> <!--hadoop的name和data目录路径--> <property> <name>dfs.namenode.name.dir</name> <value>file:/home/hadoop/hdfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/home/hadoop/hdfs/data</value> </property> </configuration>
配置yarn-site.xml文件
[[email protected] hadoop-3.2.0]$ gedit /home/hadoop/hadoop-3.2.0/etc/hadoop/yarn-site.xml 编辑文件并保存: <configuration> <!-- Site specific YARN configuration properties --> <!--集群master,--> <property> <name>yarn.resourcemanager.hostname</name> <value>hadoop01</value> </property> <!-- NodeManager上运行的附属服务--> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!--容器可能会覆盖的环境变量,而不是使用NodeManager的默认值--> <property> <name>yarn.nodemanager.env-whitelist</name> <value> JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ</value> </property> <!-- 关闭内存检测,虚拟机需要,不配会报错--> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property> </configuration>
配置MapReduce计算框架文件
[[email protected] hadoop-3.2.0]$ gedit /home/hadoop/hadoop-3.2.0/etc/hadoop/mapred-site.xml
编辑文件并保存:
<configuration> <!--local表示本地运行,classic表示经典mapreduce框架,yarn表示新的框架--> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <!--如果map和reduce任务访问本地库(压缩等),则必须保留原始值 当此值为空时,设置执行环境的命令将取决于操作系统: Linux:LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native. windows:PATH =%PATH%;%HADOOP_COMMON_HOME%\\bin. --> <property> <name>mapreduce.admin.user.env</name> <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value> </property> <!-- 可以设置AM【AppMaster】端的环境变量 如果上面缺少配置,可能会造成mapreduce失败 --> <property> <name>yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value> </property> </configuration>
【选】配置slaves文件(hadoop2.x修改slaves)
[[email protected] hadoop-3.2.0]$ gedit /home/hadoop/hadoop-3.2.0/etc/hadoop/slaves
编辑文件并保存:
hadoop02 hadoop03
【选】配置workers文件(hadoop3.x修改workers)
[[email protected] hadoop-3.2.0]$ gedit /home/hadoop/hadoop-3.2.0/etc/hadoop/workers
编辑文件并保存:
hadoop01 hadoop02 hadoop03
复制hadoop01上的Hadoop到hadoop02和hadoop03节点上
scp -r /home/hadoop/hadoop-3.2.0 [email protected]:~/
scp -r /home/hadoop/hadoop-3.2.0 [email protected]:~/
配置操作系统环境变量(需要在所有节点上进行,且使用一般用户权限)
gedit ~/.bash_profile
source ~/.bash_profile
编辑文件并保存:
#以下是新添加入代码 export JAVA_HOME=/usr/java/jdk1.8.0_11/ export PATH=$JAVA_HOME/bin:$PATH #hadoop export HADOOP_HOME=/home/hadoop/hadoop-3.2.0 export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
创建Hadoop数据目录(所有节点操作)
mkdir /home/hadoop/hadoopdata
格式化文件系统(主端进行)
hdfs namenode -format
启动和关闭Hadoop
cd ~/hadoop-3.2.0
sbin/start-all.sh
stop-all.sh
启动成功结果:
[[email protected] hadoop-3.2.0]$ jps 20848 DataNode 21808 Jps 21076 SecondaryNameNode 21322 ResourceManager 20668 NameNode 21468 NodeManager [[email protected] hadoop-3.2.0]$
【最后测试】在Hadoop集群中运行程序
将计算圆周率pi的Java程序包投入运行
[[email protected] hadoop-3.2.0]$ cd ~/hadoop-3.2.0/share/hadoop/mapreduce [[email protected] mapreduce]$ ls hadoop-mapreduce-client-app-3.2.0.jar hadoop-mapreduce-client-hs-plugins-3.2.0.jar hadoop-mapreduce-client-shuffle-3.2.0.jar lib hadoop-mapreduce-client-common-3.2.0.jar hadoop-mapreduce-client-jobclient-3.2.0.jar hadoop-mapreduce-client-uploader-3.2.0.jar lib-examples hadoop-mapreduce-client-core-3.2.0.jar hadoop-mapreduce-client-jobclient-3.2.0-tests.jar hadoop-mapreduce-examples-3.2.0.jar sources hadoop-mapreduce-client-hs-3.2.0.jar hadoop-mapreduce-client-nativetask-3.2.0.jar jdiff [[email protected] mapreduce]$ hadoop jar hadoop-mapreduce-examples-3.2.0.jar An example program must be given as the first argument. Valid program names are: aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files. aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files. bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi. dbcount: An example job that count the pageview counts from a database. distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi. grep: A map/reduce program that counts the matches of a regex in the input. join: A job that effects a join over sorted, equally partitioned datasets multifilewc: A job that counts words from several files. pentomino: A map/reduce tile laying program to find solutions to pentomino problems. pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method. randomtextwriter: A map/reduce program that writes 10GB of random textual data per node. randomwriter: A map/reduce program that writes 10GB of random data per node. secondarysort: An example defining a secondary sort to the reduce. sort: A map/reduce program that sorts the data written by the random writer. sudoku: A sudoku solver. teragen: Generate data for the terasort terasort: Run the terasort teravalidate: Checking results of terasort wordcount: A map/reduce program that counts the words in the input files. wordmean: A map/reduce program that counts the average length of the words in the input files. wordmedian: A map/reduce program that counts the median length of the words in the input files. wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files. [[email protected] mapreduce]$ hadoop jar hadoop-mapreduce-examples-3.2.0.jar pi 10 10 Number of Maps = 10 Samples per Map = 10 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Wrote input for Map #8 Wrote input for Map #9 Starting Job 2019-08-27 13:47:11,866 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.1.100:8032 2019-08-27 13:47:12,179 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1566884685380_0001 2019-08-27 13:47:12,285 INFO input.FileInputFormat: Total input files to process : 10 2019-08-27 13:47:12,341 INFO mapreduce.JobSubmitter: number of splits:10 2019-08-27 13:47:12,372 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 2019-08-27 13:47:12,479 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1566884685380_0001 2019-08-27 13:47:12,480 INFO mapreduce.JobSubmitter: Executing with tokens: [] 2019-08-27 13:47:12,645 INFO conf.Configuration: resource-types.xml not found 2019-08-27 13:47:12,645 INFO resource.ResourceUtils: Unable to find ‘resource-types.xml‘. 2019-08-27 13:47:13,018 INFO impl.YarnClientImpl: Submitted application application_1566884685380_0001 2019-08-27 13:47:13,099 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1566884685380_0001/ 2019-08-27 13:47:13,099 INFO mapreduce.Job: Running job: job_1566884685380_0001 2019-08-27 13:47:20,205 INFO mapreduce.Job: Job job_1566884685380_0001 running in uber mode : false 2019-08-27 13:47:20,209 INFO mapreduce.Job: map 0% reduce 0% 2019-08-27 13:47:27,371 INFO mapreduce.Job: map 20% reduce 0% 2019-08-27 13:47:46,535 INFO mapreduce.Job: map 20% reduce 7% 2019-08-27 13:47:50,559 INFO mapreduce.Job: map 40% reduce 7% 2019-08-27 13:47:51,570 INFO mapreduce.Job: map 50% reduce 7% 2019-08-27 13:47:53,586 INFO mapreduce.Job: map 60% reduce 7% 2019-08-27 13:47:58,631 INFO mapreduce.Job: map 60% reduce 20% 2019-08-27 13:47:59,641 INFO mapreduce.Job: map 80% reduce 20% 2019-08-27 13:48:00,665 INFO mapreduce.Job: map 100% reduce 20% 2019-08-27 13:48:01,682 INFO mapreduce.Job: map 100% reduce 100% 2019-08-27 13:48:01,708 INFO mapreduce.Job: Job job_1566884685380_0001 completed successfully 2019-08-27 13:48:01,780 INFO mapreduce.Job: Counters: 54 File System Counters FILE: Number of bytes read=226 FILE: Number of bytes written=2443397 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=2640 HDFS: Number of bytes written=215 HDFS: Number of read operations=45 HDFS: Number of large read operations=0 HDFS: Number of write operations=3 HDFS: Number of bytes read erasure-coded=0 Job Counters Launched map tasks=10 Launched reduce tasks=1 Data-local map tasks=10 Total time spent by all maps in occupied slots (ms)=270199 Total time spent by all reduces in occupied slots (ms)=31653 Total time spent by all map tasks (ms)=270199 Total time spent by all reduce tasks (ms)=31653 Total vcore-milliseconds taken by all map tasks=270199 Total vcore-milliseconds taken by all reduce tasks=31653 Total megabyte-milliseconds taken by all map tasks=276683776 Total megabyte-milliseconds taken by all reduce tasks=32412672 Map-Reduce Framework Map input records=10 Map output records=20 Map output bytes=180 Map output materialized bytes=280 Input split bytes=1460 Combine input records=0 Combine output records=0 Reduce input groups=2 Reduce shuffle bytes=280 Reduce input records=20 Reduce output records=0 Spilled Records=40 Shuffled Maps =10 Failed Shuffles=0 Merged Map outputs=10 GC time elapsed (ms)=67681 CPU time spent (ms)=63700 Physical memory (bytes) snapshot=2417147904 Virtual memory (bytes) snapshot=30882955264 Total committed heap usage (bytes)=2966421504 Peak Map Physical memory (bytes)=382750720 Peak Map Virtual memory (bytes)=2810384384 Peak Reduce Physical memory (bytes)=181923840 Peak Reduce Virtual memory (bytes)=2815541248 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=1180 File Output Format Counters Bytes Written=97 Job Finished in 49.977 seconds Estimated value of Pi is 3.20000000000000000000
原文地址:https://www.cnblogs.com/CQ-LQJ/p/11602927.html