Hadoop深入浅出-001

Doc By xvGe  Hadoop深入浅出-001

什么是Hadoop?

The Apache Hadoop project develops open-source software for reliable,scalable,distributed,computing.

Hadoop解决的问题:

--海量数据存储

--海量数据分析

--资源管理调度

作者:Doug Cutting

*********************************

(1)hadoop核心组件及文件系统概念:

*********************************

版本:

Apache:官方版本。

Cloudera:稳定,有商业支持,推荐使用。

HDP:Hortonworks公司的发行版

Hadoop核心:

--HDFS:分布式文件系统

--YARN:资源管理调度系统

--MapReduce:分布式运算框架

********************************

(2)hdfs的实现机制和文件系统概念:

********************************

1.容量可以线性扩展

2.有副本机制,存储可靠性和吞吐量大

3.有namenode后,客户端仅仅需要指定HDFS上的路径

实现机制:

1.文件被切块存储

2.客户端不需要关心分布式的细节,HDFS提供统一的抽象目录树

3.每一个文件都可以保存多个文件副本

4.HDFS的文件和具体文件位置之间的对应关系交由专门的服务器来管理

***********************

(3)mapreduce的基本思想:

***********************

1.将一个业务处理需求分成两个阶段进行,map阶段,reduce阶段

2.将分布式计算中面临的公共的问题封装成框架来实现(jar包的分发、任务的启动,任务的容错,调度,中间结果的分组传递...)

mapreduce(离线计算)只是分布式运算框架的实现,类似的框架还有storm(流式计算)、spark(内存迭代计算)

********************

(4)伪分布式集群搭建:

********************

1.配置网络参数:

-------------------------------------------------------------------------------------------------------

vim /etc/sysconfig/network     #修改网络配置

NETWORKING=yes

HOSTNAME=node0

:wq

vim /etc/sysconfig/network-scripts/ifcfg-eth0    #修改网卡配置

DEVICE=eth0

TYPE=Ethernet

ONBOOT=yes

BOOTPROTO=none

IPADDR=192.168.10.3

PREFIX=24

GATEWAY=192.168.1.1

:wq

/etc/init.d/network restart     #重启网络服务

Shutting down interface eth0:                              [  OK  ]

Shutting down loopback interface:                          [  OK  ]

Bringing up loopback interface:                            [  OK  ]

Bringing up interface eth0:  Determining if ip address 192.168.10.3 is already in use for device eth0...

[  OK  ]

vim /etc/hosts   #修改本地IP地址解析文件

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4

::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.10.3 node0

:wq

/etc/init.d/iptables stop    #停止防火墙

chkconfig iptables off       #取消防火墙开机自启动

chkconfig iptables --list    #查看防火墙启动状态

iptables        0:off   1:off   2:off   3:off   4:off   5:off   6:off

vim /etc/selinux/config   #修改selinux参数

SELINUX=disabled          #关闭selinux

:wq

reboot   #重启服务器

2.部署JDK

-------------------------------------------------------------------------------------------------------------

mkdir /app/    #创建应用目录

tar -zxvf ./jdk-8u131-linux-x64.tar.gz -C /app/     #解压文件

ln -s /app/jdk1.8.0_131/ /app/jdk        #创建软连接

vim /etc/profile        #编辑环境变量

export JAVA_HOME=/app/jdk

export PATH=$PATH:$JAVA_HOME/bin

:wq

source /etc/profile     #刷新环境变量配置文件

java                    #测试java命令

Usage: java [-options] class [args...]

(to execute a class)

or  java [-options] -jar jarfile [args...]

(to execute a jar file)

where options include:

-d32          use a 32-bit data model if available

-d64          use a 64-bit data model if available

-server       to select the "server" VM

The default VM is server.

-cp <class search path of directories and zip/jar files>

-classpath <class search path of directories and zip/jar files>

A : separated list of directories, JAR archives,

and ZIP archives to search for class files.

-D<name>=<value>

set a system property

-verbose:[class|gc|jni]

enable verbose output

-version      print product version and exit

-version:<value>

Warning: this feature is deprecated and will be removed

in a future release.

require the specified version to run

-showversion  print product version and continue

-jre-restrict-search | -no-jre-restrict-search

Warning: this feature is deprecated and will be removed

in a future release.

include/exclude user private JREs in the version search

-? -help      print this help message

-X            print help on non-standard options

-ea[:<packagename>...|:<classname>]

-enableassertions[:<packagename>...|:<classname>]

enable assertions with specified granularity

-da[:<packagename>...|:<classname>]

-disableassertions[:<packagename>...|:<classname>]

disable assertions with specified granularity

-esa | -enablesystemassertions

enable system assertions

-dsa | -disablesystemassertions

disable system assertions

-agentlib:<libname>[=<options>]

load native agent library <libname>, e.g. -agentlib:hprof

see also, -agentlib:jdwp=help and -agentlib:hprof=help

-agentpath:<pathname>[=<options>]

load native agent library by full pathname

-javaagent:<jarpath>[=<options>]

load Java programming language agent, see java.lang.instrument

-splash:<imagepath>

show splash screen with specified image

See http://www.oracle.com/technetwork/java/javase/documentation/index.html for more details.

javac                      #测试javac命令

Usage: javac <options> <source files>

where possible options include:

-g                         Generate all debugging info

-g:none                    Generate no debugging info

-g:{lines,vars,source}     Generate only some debugging info

-nowarn                    Generate no warnings

-verbose                   Output messages about what the compiler is doing

-deprecation               Output source locations where deprecated APIs are used

-classpath <path>          Specify where to find user class files and annotation processors

-cp <path>                 Specify where to find user class files and annotation processors

-sourcepath <path>         Specify where to find input source files

-bootclasspath <path>      Override location of bootstrap class files

-extdirs <dirs>            Override location of installed extensions

-endorseddirs <dirs>       Override location of endorsed standards path

-proc:{none,only}          Control whether annotation processing and/or compilation is done.

-processor <class1>[,<class2>,<class3>...] Names of the annotation processors to run; bypasses default discovery process

-processorpath <path>      Specify where to find annotation processors

-parameters                Generate metadata for reflection on method parameters

-d <directory>             Specify where to place generated class files

-s <directory>             Specify where to place generated source files

-h <directory>             Specify where to place generated native header files

-implicit:{none,class}     Specify whether or not to generate class files for implicitly referenced files

-encoding <encoding>       Specify character encoding used by source files

-source <release>          Provide source compatibility with specified release

-target <release>          Generate class files for specific VM version

-profile <profile>         Check that API used is available in the specified profile

-version                   Version information

-help                      Print a synopsis of standard options

-Akey[=value]              Options to pass to annotation processors

-X                         Print a synopsis of nonstandard options

-J<flag>                   Pass <flag> directly to the runtime system

-Werror                    Terminate compilation if warnings occur

@<filename>                Read options and filenames from file

java -version                  #查看Java的版本

java version "1.8.0_131"

Java(TM) SE Runtime Environment (build 1.8.0_131-b11)

Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)

3.部署Hadoop

----------------------------------------------------------------------------------------------------------

tar -zxvf ./hadoop-2.4.1.tar.gz -C /app/              #解压Hadoop文件

ln -s /app/hadoop-2.4.1/ /app/hadoop                  #创建软连接

##########################################################################################################

vim /app/hadoop/etc/hadoop/hadoop-env.sh

export JAVA_HOME=/app/jdk

:wq

##########################################################################################################

vim /app/hadoop/etc/hadoop/core-site.xml

<configuration>

<property>

<name>fs.defaultFS</name>                    #指定HADOOP所使用的文件系统schema(URI),HDFS的NameNode的地址

<value>hdfs://node0:9000</value>

</property>

<property>

<name>hadoop.tmp.dir</name>                   #指定hadoop运行时产生文件的存储目录

<value>/hadoop/tmpdata</value>

</property>

</configuration>

:wq

##########################################################################################################

vim /app/hadoop/etc/hadoop/hdfs-site.xml

<configuration>

<property>

<name>dfs.replication</name>                 #指定存储的副本数量,默认为3个

<value>1</value>

</property>

</configuration>

:wq

##########################################################################################################

cp  /app/hadoop/etc/hadoop/mapred-site.xml.template /app/hadoop/etc/hadoop/mapred-site.xml

vim /app/hadoop/etc/hadoop/mapred-site.xml

<configuration>

<property>

<name>mapreduce.framework.name</name>         #指定mapreduce运行在yarn上

<value>yarn</value>

</property>

</configuration>

:wq

##########################################################################################################

vim /app/hadoop/etc/hadoop/yarn-site.xml

<configuration>

<property>

<name>yarn.resourcemanager.hostname</name>    #指定YARN的ResourceManager的地址

<value>node0</value>

</property>

<property>

<name>yarn.nodemanager.aux-services</name>    #reducer获取数据的方式

<value>mapreduce_shuffle</value>

</property>

</configuration>

:wq

##########################################################################################################

vim /etc/profile

export JAVA_HOME=/app/jdk

export HADOOP_HOME=/app/hadoop

export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

source /etc/profile

4.格式化namenode

-------------------------------------------------------------------------------------------------------------

hdfs namenode -format

17/08/13 05:52:00 INFO namenode.NameNode: STARTUP_MSG:

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG:   host = node0/192.168.10.3

STARTUP_MSG:   args = [-format]

STARTUP_MSG:   version = 2.4.1

STARTUP_MSG:   classpath = /app/hadoop-2.4.1/etc/hadoop:/app/hadoop-2.4.1/share/hadoop/common/lib/log4j-1.2.17.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/commons-logging-1.1.3.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/jersey-json-1.9.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/jackson-core-asl-1.8.8.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/hadoop-auth-2.4.1.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/jsp-api-2.1.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/stax-api-1.0-2.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/xz-1.0.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/jetty-util-6.1.26.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/guava-11.0.2.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/commons-codec-1.4.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/jersey-server-1.9.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/asm-3.2.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/jasper-runtime-5.5.23.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/paranamer-2.3.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/java-xmlbuilder-0.4.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/servlet-api-2.5.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/commons-digester-1.8.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/jackson-mapper-asl-1.8.8.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/slf4j-api-1.7.5.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/commons-el-1.0.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/activation-1.1.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/jsr305-1.3.9.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/jettison-1.1.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/jasper-compiler-5.5.23.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/netty-3.6.2.Final.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/jsch-0.1.42.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/jackson-jaxrs-1.8.8.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/commons-io-2.4.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/junit-4.8.2.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/commons-httpclient-3.1.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/xmlenc-0.52.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/commons-net-3.1.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/jetty-6.1.26.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/commons-configuration-1.6.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/jersey-core-1.9.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/commons-math3-3.1.1.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/httpclient-4.2.5.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/avro-1.7.4.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/jets3t-0.9.0.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/zookeeper-3.4.5.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/httpcore-4.2.5.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/hadoop-annotations-2.4.1.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/commons-cli-1.2.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/jackson-xc-1.8.8.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/commons-collections-3.2.1.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/commons-compress-1.4.1.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/mockito-all-1.8.5.jar:/app/hadoop-2.4.1/share/hadoop/common/lib/commons-lang-2.6.jar:/app/hadoop-2.4.1/share/hadoop/common/hadoop-common-2.4.1.jar:/app/hadoop-2.4.1/share/hadoop/common/hadoop-common-2.4.1-tests.jar:/app/hadoop-2.4.1/share/hadoop/common/hadoop-nfs-2.4.1.jar:/app/hadoop-2.4.1/share/hadoop/hdfs:/app/hadoop-2.4.1/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/app/hadoop-2.4.1/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/app/hadoop-2.4.1/share/hadoop/hdfs/lib/jackson-core-asl-1.8.8.jar:/app/hadoop-2.4.1/share/hadoop/hdfs/lib/jsp-api-2.1.jar:/app/hadoop-2.4.1/share/hadoop/hdfs/lib/jetty-util-6.1.26.jar:/app/hadoop-2.4.1/share/hadoop/hdfs/lib/guava-11.0.2.jar:/app/hadoop-2.4.1/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/app/hadoop-2.4.1/share/hadoop/hdfs/lib/jersey-server-1.9.jar:/app/hadoop-2.4.1/share/hadoop/hdfs/lib/asm-3.2.jar:/app/hadoop-2.4.1/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/app/hadoop-2.4.1/share/hadoop/hdfs/lib/jasper-runtime-5.5.23.jar:/app/hadoop-2.4.1/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/app/hadoop-2.4.1/share/hadoop/hdfs/lib/jackson-mapper-asl-1.8.8.jar:/app/hadoop-2.4.1/share/hadoop/hdfs/lib/commons-el-1.0.jar:/app/hadoop-2.4.1/share/hadoop/hdfs/lib/jsr305-1.3.9.jar:/app/hadoop-2.4.1/share/hadoop/hdfs/lib/netty-3.6.2.Final.jar:/app/hadoop-2.4.1/share/hadoop/hdfs/lib/commons-io-2.4.jar:/app/hadoop-2.4.1/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/app/hadoop-2.4.1/share/hadoop/hdfs/lib/jetty-6.1.26.jar:/app/hadoop-2.4.1/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/app/hadoop-2.4.1/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/app/hadoop-2.4.1/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/app/hadoop-2.4.1/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/app/hadoop-2.4.1/share/hadoop/hdfs/hadoop-hdfs-2.4.1.jar:/app/hadoop-2.4.1/share/hadoop/hdfs/hadoop-hdfs-2.4.1-tests.jar:/app/hadoop-2.4.1/share/hadoop/hdfs/hadoop-hdfs-nfs-2.4.1.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/log4j-1.2.17.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/commons-logging-1.1.3.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/jersey-json-1.9.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/jackson-core-asl-1.8.8.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/stax-api-1.0-2.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/xz-1.0.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/jetty-util-6.1.26.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/guava-11.0.2.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/commons-codec-1.4.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/guice-3.0.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/jersey-server-1.9.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/asm-3.2.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/leveldbjni-all-1.8.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/jersey-client-1.9.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/servlet-api-2.5.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/jackson-mapper-asl-1.8.8.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/jline-0.9.94.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/activation-1.1.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/jsr305-1.3.9.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/jettison-1.1.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/jaxb-impl-2.2.3-1.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/jackson-jaxrs-1.8.8.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/commons-io-2.4.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/commons-httpclient-3.1.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/jetty-6.1.26.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/jersey-core-1.9.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/jaxb-api-2.2.2.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/aopalliance-1.0.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/zookeeper-3.4.5.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/commons-cli-1.2.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/jackson-xc-1.8.8.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/javax.inject-1.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/commons-collections-3.2.1.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/commons-compress-1.4.1.jar:/app/hadoop-2.4.1/share/hadoop/yarn/lib/commons-lang-2.6.jar:/app/hadoop-2.4.1/share/hadoop/yarn/hadoop-yarn-server-tests-2.4.1.jar:/app/hadoop-2.4.1/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.4.1.jar:/app/hadoop-2.4.1/share/hadoop/yarn/hadoop-yarn-api-2.4.1.jar:/app/hadoop-2.4.1/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.4.1.jar:/app/hadoop-2.4.1/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.4.1.jar:/app/hadoop-2.4.1/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.4.1.jar:/app/hadoop-2.4.1/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-2.4.1.jar:/app/hadoop-2.4.1/share/hadoop/yarn/hadoop-yarn-common-2.4.1.jar:/app/hadoop-2.4.1/share/hadoop/yarn/hadoop-yarn-client-2.4.1.jar:/app/hadoop-2.4.1/share/hadoop/yarn/hadoop-yarn-server-common-2.4.1.jar:/app/hadoop-2.4.1/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.4.1.jar:/app/hadoop-2.4.1/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/app/hadoop-2.4.1/share/hadoop/mapreduce/lib/jackson-core-asl-1.8.8.jar:/app/hadoop-2.4.1/share/hadoop/mapreduce/lib/xz-1.0.jar:/app/hadoop-2.4.1/share/hadoop/mapreduce/lib/guice-3.0.jar:/app/hadoop-2.4.1/share/hadoop/mapreduce/lib/jersey-server-1.9.jar:/app/hadoop-2.4.1/share/hadoop/mapreduce/lib/asm-3.2.jar:/app/hadoop-2.4.1/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/app/hadoop-2.4.1/share/hadoop/mapreduce/lib/protobuf-java-2.5.0.jar:/app/hadoop-2.4.1/share/hadoop/mapreduce/lib/junit-4.10.jar:/app/hadoop-2.4.1/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/app/hadoop-2.4.1/share/hadoop/mapreduce/lib/jersey-guice-1.9.jar:/app/hadoop-2.4.1/share/hadoop/mapreduce/lib/snappy-java-1.0.4.1.jar:/app/hadoop-2.4.1/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/app/hadoop-2.4.1/share/hadoop/mapreduce/lib/hamcrest-core-1.1.jar:/app/hadoop-2.4.1/share/hadoop/mapreduce/lib/netty-3.6.2.Final.jar:/app/hadoop-2.4.1/share/hadoop/mapreduce/lib/commons-io-2.4.jar:/app/hadoop-2.4.1/share/hadoop/mapreduce/lib/jersey-core-1.9.jar:/app/hadoop-2.4.1/share/hadoop/mapreduce/lib/avro-1.7.4.jar:/app/hadoop-2.4.1/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/app/hadoop-2.4.1/share/hadoop/mapreduce/lib/hadoop-annotations-2.4.1.jar:/app/hadoop-2.4.1/share/hadoop/mapreduce/lib/javax.inject-1.jar:/app/hadoop-2.4.1/share/hadoop/mapreduce/lib/commons-compress-1.4.1.jar:/app/hadoop-2.4.1/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.4.1-tests.jar:/app/hadoop-2.4.1/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.4.1.jar:/app/hadoop-2.4.1/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.4.1.jar:/app/hadoop-2.4.1/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.4.1.jar:/app/hadoop-2.4.1/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.4.1.jar:/app/hadoop-2.4.1/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.4.1.jar:/app/hadoop-2.4.1/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.4.1.jar:/app/hadoop-2.4.1/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.4.1.jar:/app/hadoop-2.4.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar:/app/hadoop/contrib/capacity-scheduler/*.jar

STARTUP_MSG:   build = http://svn.apache.org/repos/asf/hadoop/common -r 1604318; compiled by ‘jenkins‘ on 2014-06-21T05:43Z

STARTUP_MSG:   java = 1.8.0_131

************************************************************/

17/08/13 05:52:00 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]

17/08/13 05:52:00 INFO namenode.NameNode: createNameNode [-format]

Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /app/hadoop-2.4.1/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.

It‘s highly recommended that you fix the library with ‘execstack -c <libfile>‘, or link it with ‘-z noexecstack‘.

17/08/13 05:52:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Formatting using clusterid: CID-0f84f197-b0d5-4cd1-a4e4-14a5acfa009e

17/08/13 05:52:01 INFO namenode.FSNamesystem: fsLock is fair:true

17/08/13 05:52:01 INFO namenode.HostFileManager: read includes:

HostSet(

)

17/08/13 05:52:01 INFO namenode.HostFileManager: read excludes:

HostSet(

)

17/08/13 05:52:01 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000

17/08/13 05:52:01 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true

17/08/13 05:52:01 INFO util.GSet: Computing capacity for map BlocksMap

17/08/13 05:52:01 INFO util.GSet: VM type       = 64-bit

17/08/13 05:52:01 INFO util.GSet: 2.0% max memory 966.7 MB = 19.3 MB

17/08/13 05:52:01 INFO util.GSet: capacity      = 2^21 = 2097152 entries

17/08/13 05:52:01 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false

17/08/13 05:52:01 INFO blockmanagement.BlockManager: defaultReplication         = 1

17/08/13 05:52:01 INFO blockmanagement.BlockManager: maxReplication             = 512

17/08/13 05:52:01 INFO blockmanagement.BlockManager: minReplication             = 1

17/08/13 05:52:01 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2

17/08/13 05:52:01 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks  = false

17/08/13 05:52:01 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000

17/08/13 05:52:01 INFO blockmanagement.BlockManager: encryptDataTransfer        = false

17/08/13 05:52:01 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000

17/08/13 05:52:01 INFO namenode.FSNamesystem: fsOwner             = root (auth:SIMPLE)

17/08/13 05:52:01 INFO namenode.FSNamesystem: supergroup          = supergroup

17/08/13 05:52:01 INFO namenode.FSNamesystem: isPermissionEnabled = true

17/08/13 05:52:01 INFO namenode.FSNamesystem: HA Enabled: false

17/08/13 05:52:01 INFO namenode.FSNamesystem: Append Enabled: true

17/08/13 05:52:02 INFO util.GSet: Computing capacity for map INodeMap

17/08/13 05:52:02 INFO util.GSet: VM type       = 64-bit

17/08/13 05:52:02 INFO util.GSet: 1.0% max memory 966.7 MB = 9.7 MB

17/08/13 05:52:02 INFO util.GSet: capacity      = 2^20 = 1048576 entries

17/08/13 05:52:02 INFO namenode.NameNode: Caching file names occuring more than 10 times

17/08/13 05:52:02 INFO util.GSet: Computing capacity for map cachedBlocks

17/08/13 05:52:02 INFO util.GSet: VM type       = 64-bit

17/08/13 05:52:02 INFO util.GSet: 0.25% max memory 966.7 MB = 2.4 MB

17/08/13 05:52:02 INFO util.GSet: capacity      = 2^18 = 262144 entries

17/08/13 05:52:02 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033

17/08/13 05:52:02 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0

17/08/13 05:52:02 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000

17/08/13 05:52:02 INFO namenode.FSNamesystem: Retry cache on namenode is enabled

17/08/13 05:52:02 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis

17/08/13 05:52:02 INFO util.GSet: Computing capacity for map NameNodeRetryCache

17/08/13 05:52:02 INFO util.GSet: VM type       = 64-bit

17/08/13 05:52:02 INFO util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB

17/08/13 05:52:02 INFO util.GSet: capacity      = 2^15 = 32768 entries

17/08/13 05:52:02 INFO namenode.AclConfigFlag: ACLs enabled? false

17/08/13 05:52:02 INFO namenode.FSImage: Allocated new BlockPoolId: BP-833512525-192.168.10.3-1502574722280

17/08/13 05:52:02 INFO common.Storage: Storage directory /hadoop/tmpdata/dfs/name has been successfully formatted.   #表示格式化成功了

17/08/13 05:52:02 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0

17/08/13 05:52:02 INFO util.ExitUtil: Exiting with status 0

17/08/13 05:52:02 INFO namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at node0/192.168.10.3

************************************************************/

5.启动Hadoop

start-dfs.sh                      #启动DFS,没有先后顺序

Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /app/hadoop-2.4.1/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.

It‘s highly recommended that you fix the library with ‘execstack -c <libfile>‘, or link it with ‘-z noexecstack‘.

17/08/13 06:12:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Starting namenodes on [node0]

[email protected]‘s password:            #输入密码

node0: starting namenode, logging to /app/hadoop-2.4.1/logs/hadoop-root-namenode-node0.out

[email protected]‘s password:        #输入密码

localhost: starting datanode, logging to /app/hadoop-2.4.1/logs/hadoop-root-datanode-node0.out

Starting secondary namenodes [0.0.0.0]

[email protected]‘s password:          #输入密码

0.0.0.0: starting secondarynamenode, logging to /app/hadoop-2.4.1/logs/hadoop-root-secondarynamenode-node0.out

Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /app/hadoop-2.4.1/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.

It‘s highly recommended that you fix the library with ‘execstack -c <libfile>‘, or link it with ‘-z noexecstack‘.

17/08/13 06:13:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

start-yarn.sh                     #启动yarn

starting yarn daemons

resourcemanager running as process 31652. Stop it first.

[email protected]‘s password:        #输入密码

localhost: nodemanager running as process 31937. Stop it first.

jps                               #使用jps命令验证

32864 SecondaryNameNode

31937 NodeManager

32707 DataNode

31652 ResourceManager

32584 NameNode

33064 Jps

http://192.168.10.3:50070 (HDFS管理界面)

http://192.168.10.3:8088  (MR管理界面)

*******************

(5)ssh远程免密登录:

*******************

客户端生成密钥对:

ssh-keygen -t rsa

Generating public/private rsa key pair.

Enter file in which to save the key (/root/.ssh/id_rsa):

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in /root/.ssh/id_rsa.

Your public key has been saved in /root/.ssh/id_rsa.pub.

The key fingerprint is:

03:c8:7a:54:c7:a4:fc:74:cc:15:23:5b:ba:51:7b:b2 [email protected]

The key‘s randomart image is:

+--[ RSA 2048]----+

|      .oo . *.   |

|   . + o.o B o   |

|    + + . B o .  |

|   o   + . o +   |

|  . .   S . E    |

|   .     .       |

|                 |

|                 |

|                 |

+-----------------+

cd .ssh/

ll

total 12

-rw------- 1 root root 1675 Aug 13 07:11 id_rsa

-rw-r--r-- 1 root root  392 Aug 13 07:11 id_rsa.pub

-rw-r--r-- 1 root root 1180 Aug 13 06:11 known_hosts

将客户端公钥文件拷贝到服务器主机:

ssh-copy-id 192.168.10.3

The authenticity of host ‘192.168.10.3 (192.168.10.3)‘ can‘t be established.

RSA key fingerprint is b9:21:f9:a4:33:de:3e:79:6e:69:45:01:e6:5d:47:54.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added ‘192.168.10.3‘ (RSA) to the list of known hosts.

[email protected]‘s password:

Now try logging into the machine, with "ssh ‘192.168.10.3‘", and check in:

.ssh/authorized_keys

to make sure we haven‘t added extra keys that you weren‘t expecting.

在客户端测试:

ssh [email protected]

Last login: Sun Aug 13 04:44:30 2017 from 192.168.10.2

同样的道理,为服务器本机配置免密登录:

ssh-keygen -t rsa     #指定使用RSA算法

Generating public/private rsa key pair.

Enter file in which to save the key (/root/.ssh/id_rsa):

Created directory ‘/root/.ssh‘.

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in /root/.ssh/id_rsa.

Your public key has been saved in /root/.ssh/id_rsa.pub.

The key fingerprint is:

44:55:bb:1d:c4:b9:d8:e0:e5:6b:c2:58:19:f5:c0:57 [email protected]

The key‘s randomart image is:

+--[ RSA 2048]----+

|        .....++.E|

|       .    oo=o.|

|        .  ..O.o.|

|       .    =o+. |

|        S  +. .. |

|          . o o  |

|             o   |

|                 |

|                 |

+-----------------+

[[email protected] ~]# ssh-copy-id 192.168.10.3

The authenticity of host ‘192.168.10.3 (192.168.10.3)‘ can‘t be established.

RSA key fingerprint is b9:21:f9:a4:33:de:3e:79:6e:69:45:01:e6:5d:47:54.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added ‘192.168.10.3‘ (RSA) to the list of known hosts.

[email protected]‘s password:

Now try logging into the machine, with "ssh ‘192.168.10.3‘", and check in:

.ssh/authorized_keys

to make sure we haven‘t added extra keys that you weren‘t expecting.

ssh无密码验证免输入yes进行known_hosts添加:

虽然ssh可以进行无密码验证但是如果是多台服务器间进行验证,第一次需要手动输入多次yes来将各个主机的标示加入到known_hosts文件中去:

vim .ssh/config

StrictHostKeyChecking no

:wq

如果想在服务器ip更改后仍然无需进行更新known_hosts文件,或者想免除known_hosts未更新导致的冲突:

vim .ssh/config

UserKnownHostsFile /dev/null

:wq

/etc/init.d/sshd restart          #重启SSHD服务

Stopping sshd:                                             [  OK  ]

Starting sshd:                                               [  OK  ]

时间: 2024-10-10 09:59:16

Hadoop深入浅出-001的相关文章

Hadoop深入浅出实战经典–第02讲

本文转载:通通学--知识学习与分享平台 Hadoop的核心 HDFS: Hadoop Distributed File System 分布式文件系统 MapReduce:并行计算框架 Yarn:集群资源管理和调度框架 Hadoop是什么? 适合大数据的分布式存储.计算.资源管理平台 作者:Doug Cutting 受Google三篇论文的启发 Hadoop生态系统 HDFS 主从结构主节点,只有一个: namenode从节点,有很多个: datanodes namenode负责:接收用户操作请求

Hadoop深入浅出实战经典视频教程(共22讲)

该视频教程共22讲,由王家林老师主讲. --------------------------------------------------------- 第01讲:为什么会有第一代大数据技术Hadoop和第二代大数据技术Spark? 第02讲:10分钟从技术角度理解Hadoop 第03讲:Hadoop集群安装解析 第04讲:Hadoop集群构建硬件选择.集群规模.网络拓扑.机架感知等 第05讲:Hadoop集群之安装Java.创建Hadoop用户.配置SSH等实战 第06讲:Hadoop集群之

第126讲:Hadoop集群管理之Datanode目录元数据结构详解学习笔记

namenode是管理hdfs文件系统的元数据 datanode是负责当前节点上的数据的管理,具体目录内容是在初始阶段自动创建的.在用hdfs dfs namenode format时并没有对datanode进行format. 在datanode中目录是按文件信息存储的. datanode存在于具体节点上的hadoop-2.6.0/dfs/data/current中. datanode的VERSION内容与namenode的VERSION内容相似. storageID:在namenode与dat

第131讲:Hadoop集群管理工具均衡器Balancer 实战详解学习笔记

第131讲:Hadoop集群管理工具均衡器Balancer 实战详解学习笔记 为什么需要均衡器呢? 随着集群运行,具体hdfs各个数据存储节点上的block可能分布得越来越不均衡,会导致运行作业时降低mapreduce的本地性. 分布式计算中精髓性的一名话:数据不动代码动.降低本地性对性能的影响是致使的,而且不能充分利用集群的资源,因为导致任务计算会集中在部分datanode上,更易导致故障. balancer是hadoop的一个守护进程.会将block从忙的datanode移动到闲的datan

第124讲:Hadoop集群管理之fsimage和edits工作机制内幕详解学习笔记

客户端对hdfs进行写文件时会首先被记录在edits文件中. edits修改时元数据也会更新. 每次hdfs更新时edits先更新后客户端才会看到最新信息. fsimage:是namenode中关于元数据的镜像,一般称为检查点. 一般开始时对namenode的操作都放在edits中,为什么不放在fsimage中呢? 因为fsimage是namenode的完整的镜像,内容很大,如果每次都加载到内存的话生成树状拓扑结构,这是非常耗内存和CPU. 内容包含了namenode管理下的所有datanode

Hadoop学习笔记—13.分布式集群中的动态添加与下架

开篇:在本笔记系列的第一篇中,我们介绍了如何搭建伪分布与分布模式的Hadoop集群.现在,我们来了解一下在一个Hadoop分布式集群中,如何动态(不关机且正在运行的情况下)地添加一个Hadoop节点与下架一个Hadoop节点. 一.实验环境结构 本次试验,我们构建的集群是一个主节点,三个从节点的结构,其中三个从节点的性能配置各不相同,这里我们主要在虚拟机中的内存设置这三个从节点分别为:512MB.512MB与256MB.首先,我们暂时只设置两个从节点,另外一个作为动态添加节点的时候使用.主节点与

第130讲:Hadoop集群管理工具DataBlockScanner 实战详解学习笔记

第130讲:Hadoop集群管理工具DataBlockScanner 实战详解学习笔记 DataBlockScanner在datanode上运行的block扫描器,定期检测当前datanode节点上所有的block,从而在客户端读到有问题的块前及时检测和修复有问题的块. 它有所有维护的块的列表,通过对块的列表依次的扫描,查看是否有校验问题或错误问题,它还有截流机制. 什么叫截流机制?DataBlockScanner扫描时会消耗大量的磁盘带宽,如果占用磁盘带宽太大,会有性能问题.所以它会只占用一小

Hadoop学习笔记—13.分布式集群中节点的动态添加与下架

开篇:在本笔记系列的第一篇中,我们介绍了如何搭建伪分布与分布模式的Hadoop集群.现在,我们来了解一下在一个Hadoop分布式集群中,如何动态(不关机且正在运行的情况下)地添加一个Hadoop节点与下架一个Hadoop节点. 一.实验环境结构 本次试验,我们构建的集群是一个主节点,三个从节点的结构,其中三个从节点的性能配置各不相同,这里我们主要在虚拟机中的内存设置这三个从节点分别为:512MB.512MB与256MB.首先,我们暂时只设置两个从节点,另外一个作为动态添加节点的时候使用.主节点与

Spark大数据的学习历程

Spark主要的编程语言是Scala,选择Scala是因为它的简洁性(Scala可以很方便在交互式下使用)和性能(JVM上的静态强类型语言).Spark支持Java编程,但对于使用Java就没有了Spark-Shell这样方便的工具,其它与Scala编程是一样的,因为都是JVM上的语言,Scala与Java可以互操作,Java编程接口其实就是对Scala的封装. 大数据未来几年发展的重点方向,大数据战略已经在十八届五中全会上作为重点战略方向,中国在大数据方面才刚刚起步,但是在美国已经产生了上千亿