Cloudera CDH 5集群搭建(yum 方式)

1      集群环境

主节点

master001 ~~ master006

从节点

slave001 ~~ slave064

2      安装CDH5的YUM源

rpm -Uvhhttp://archive.cloudera.com/cdh5/one-click-install/redhat/6/x86_64/cloudera-cdh-5-0.x86_64.rpm

wgethttp://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/cloudera-cdh5.repo

mv cloudera-cdh5.repo /ect/yum.repo.d/

3      ZooKeeper

3.1    节点分配

ZooKeeperServer :

master002,master003, master004, master005, master006

ZooKeeperClient :

master001,master002, master003, master004, master005, master006

3.2    安装

ZooKeeper Client节点:

yum install -y zookeeper

ZooKeeper Server节点:

yum install -y zookeeper-server

3.3    配置

1.zookeeper节点修改zookeeper配置文件

/etc/zookeeper/conf/zoo.cfg

maxClientCnxns=50

# Thenumber of milliseconds of each tick

tickTime=2000

# Thenumber of ticks that the initial

#synchronization phase can take

initLimit=10

# Thenumber of ticks that can pass between

# sendinga request and getting an acknowledgement

syncLimit=5

# thedirectory where the snapshot is stored.

dataDir=/data/disk01/zookeeper/zk_data

dataLogDir=/data/disk01/zookeeper/zk_log

# theport at which the clients will connect

clientPort=2181

server.2=master002:2888:3888

server.3=master003:2888:3888

server.4=master004:2888:3888

server.5=master005:2888:3888

server.6=master006:2888:3888

2.初始化节点

master002:

service zookeeper-server init --myid=2

master003:

service zookeeper-server init --myid=3

master004:

service zookeeper-server init --myid=4

master005:

service zookeeper-server init --myid=5

master006:

service zookeeper-server init --myid=6

3.运行zookeeper

service zookeeper-server start

3.4    安装路径

程序路径

/usr/lib/zookeeper/

配置文件路径

/etc/zookeeper/conf

日志路径

/var/log/zookeeper

3.5    运行|关闭|查看状态

ZooKeeper

service zookeeper-server start|stop|status

3.6    常用命令

查看ZooKeeper节点状态

zookeeper-server status

手动清理日志

/usr/lib/zookeeper/bin/zkCleanup.shdataLogDir [snapDir] -n count

自动清理日志

autopurge.purgeInterval 这个参数指定了清理频率,单位是小时,需要填写一个1或更大的整数,默认是0,表示不开启自己清理功能。

autopurge.snapRetainCount 这个参数和上面的参数搭配使用,这个参数指定了需要保留的文件数目。默认是保留3个。

3.7    测试

https://github.com/phunt/zk-smoketest

3.8    参考文献

ZooKeeper参数配置

http://my.oschina.net/u/128568/blog/194820

ZooKeeper常见管理和运维

http://nileader.blog.51cto.com/1381108/1032157

4      HDFS

4.1    节点分配(配置NN HA)

namenode、zkfc:

master002, master003

datanode:

slave001-slave064

journalnode:

master002, master003, master004

4.2    安装

namenode:

yum install hadoop-hdfs-namenode

yum install hadoop-hdfs-zkfc

(yum install -y hadoop-hdfs-namenodehadoop-hdfs-zkfc hadoop-client)

datanode:

yum install hadoop-hdfs-datanode

(yum install -y hadoop-hdfs-datanodehadoop-client)

journalnode:

yum install hadoop-hdfs-journalnode

(yum install -y hadoop-hdfs-journalnode)

所有节点:

yum install hadoop-client

4.3    配置

1.配置文件

/etc/hadoop/conf/core-site.xml

<?xml-stylesheettype="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://bdcluster</value>

</property>

<property>

<name>fs.trash.interval</name>

<value>1440</value>

</property>

<property>

<name>hadoop.proxyuser.httpfs.hosts</name>

<value>*</value>

</property>

<property>

<name>hadoop.proxyuser.httpfs.groups</name>

<value>*</value>

</property>

</configuration>

/etc/hadoop/conf/hdfs-site.xml

<?xml-stylesheettype="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

<name>dfs.nameservices</name>

<value>bdcluster</value>

</property>

<property>

<name>dfs.ha.namenodes.bdcluster</name>

<value>nn002,nn003</value>

</property>

<property>

<name>dfs.namenode.rpc-address.bdcluster.nn002</name>

<value>master002:8020</value>

</property>

<property>

<name>dfs.namenode.rpc-address.bdcluster.nn003</name>

<value>master003:8020</value>

</property>

<property>

<name>dfs.namenode.http-address.bdcluster.nn002</name>

<value>master002:50070</value>

</property>

<property>

<name>dfs.namenode.http-address.bdcluster.nn003</name>

<value>master003:50070</value>

</property>

<property>

<name>dfs.namenode.shared.edits.dir</name>

<value>qjournal://master002:8485;master003:8485;master004:8485/bdcluster</value>

</property>

<property>

<name>dfs.journalnode.edits.dir</name>

<value>/data/disk01/hadoop/hdfs/journalnode</value>

</property>

<property>

<name>dfs.client.failover.proxy.provider.bdcluster</name>

<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

</property>

<property>

<name>dfs.ha.fencing.methods</name>

<value>sshfence</value>

</property>

<property>

<name>dfs.ha.fencing.ssh.private-key-files</name>

<value>/var/lib/hadoop-hdfs/.ssh/id_dsa</value>

</property>

<property>

<name>dfs.ha.automatic-failover.enabled</name>

<value>true</value>

</property>

<property>

<name>ha.zookeeper.quorum</name>

<value>master002:2181,master003:2181,master004:2181,master005:2181,master006:2181</value>

</property>

<property>

<name>dfs.permissions.superusergroup</name>

<value>hadoop</value>

</property>

<property>

<name>dfs.namenode.name.dir</name>

<value>/data/disk01/hadoop/hdfs/namenode</value>

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>/data/disk01/hadoop/hdfs/datanode,/data/disk02/hadoop/hdfs/datanode,/data/disk03/hadoop/hdfs/datanode,/data/disk04/hadoop/hdfs/datanode,/data/disk05/hadoop/hdfs/datanode,/data/disk06/hadoop/hdfs/datanode,/data/disk07/hadoop/hdfs/datanode</value>

</property>

<property>

<name>dfs.datanode.failed.volumes.tolerated</name>

<value>3</value>

</property>

<property>

<name>dfs.datanode.max.xcievers</name>

<value>4096</value>

</property>

<property>

<name>dfs.webhdfs.enabled</name>

<value>true</value>

</property>

</configuration>

/etc/hadoop/conf/slaves

slave001

slave002

slave064

2.配置hdfs用户的免密码登陆

3.创建数据目录

namenode

mkdir -p/data/disk01/hadoop/hdfs/namenode

chown -Rhdfs:hdfs /data/disk01/hadoop/hdfs/

chown -Rhdfs:hdfs /data/disk01/hadoop/hdfs/namenode

chmod 700/data/disk01/hadoop/hdfs/namenode

datanode

mkdir -p/data/disk01/hadoop/hdfs/datanode

chmod 700/data/disk01/hadoop/hdfs/datanode

chown -Rhdfs:hdfs /data/disk01/hadoop/hdfs/

mkdir -p/data/disk02/hadoop/hdfs/datanode

chmod 700/data/disk02/hadoop/hdfs/datanode

chown -Rhdfs:hdfs /data/disk02/hadoop/hdfs/

mkdir -p/data/disk03/hadoop/hdfs/datanode

chmod 700/data/disk03/hadoop/hdfs/datanode

chown -Rhdfs:hdfs /data/disk03/hadoop/hdfs/

mkdir -p/data/disk04/hadoop/hdfs/datanode

chmod 700/data/disk04/hadoop/hdfs/datanode

chown -Rhdfs:hdfs /data/disk04/hadoop/hdfs/

mkdir -p/data/disk05/hadoop/hdfs/datanode

chmod 700/data/disk05/hadoop/hdfs/datanode

chown -Rhdfs:hdfs /data/disk05/hadoop/hdfs/

mkdir -p/data/disk06/hadoop/hdfs/datanode

chmod 700/data/disk06/hadoop/hdfs/datanode

chown -Rhdfs:hdfs /data/disk06/hadoop/hdfs/

mkdir -p/data/disk07/hadoop/hdfs/datanode

chmod 700/data/disk07/hadoop/hdfs/datanode

chown -Rhdfs:hdfs /data/disk07/hadoop/hdfs/

journalnode

mkdir -p/data/disk01/hadoop/hdfs/journalnode

chown -Rhdfs:hdfs /data/disk01/hadoop/hdfs/journalnode

4.启动journalnode

service hadoop-hdfs-journalnode start

5.格式化namenode(master002)

sudo -u hdfs hadoop namenode -format

6.在ZooKeeper中初始化HA状态(namenodemaster002)

hdfs zkfc -formatZK

7.初始化Shared Editsdirectory(master002)

hdfs namenode -initializeSharedEdits

8.启动namenode

formatted namenode(master002):

service hadoop-hdfs-namenode start

standby namenode(master003):

sudo -u hdfs hdfs namenode-bootstrapStandby

service hadoop-hdfs-namenode start

9.启动datanode

service hadoop-hdfs-datanode start

10.启动zkfc(namenode)

service hadoop-hdfs-zkfc start

11.初始化HDFS目录

/usr/lib/hadoop/libexec/init-hdfs.sh

4.4    安装路径

程序路径

/usr/lib/hadoop-hdfs

配置文件路径

/etc/hadoop/conf

日志路径

/var/log/hadoop-hdfs

4.5    运行|关闭|查看状态

NameNode

service hadoop-hdfs-namenodestart|stop|status

DataNode

service hadoop-hdfs-datanodestart|stop|status

JournalNode

service hadoop-hdfs-journalnodestart|stop|status

zkfc

service hadoop-hdfs-zkfc start|stop|status

4.6    常用命令

查看集群状态

sudo -u hdfs hdfs dfsadmin -report

检查文件及其副本

sudo -u hdfs hdfs fsck [文件名] -files-blocks -locations –racks

5      YARN

5.1    节点分配

resourcemanager:

master004

nodemanager、mapreduce:

slave001-slave064

mapreduce-historyserver:

master006

5.2    安装

resourcemanager:

yum -y install hadoop-yarn-resourcemanager

nodemanager:

yum -y install hadoop-yarn-nodemanagerhadoop-mapreduce

mapreduce-historyserver:

yum -y installhadoop-mapreduce-historyserver hadoop-yarn-proxyserver

所有节点

yum -y install hadoop-client

5.3    配置

1.配置文件

/etc/hadoop/conf/mapred-site.xml

<?xml-stylesheettype="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

<property>

<name>mapreduce.task.io.sort.mb</name>

<value>1024</value>

</property>

<property>

<name>mapred.child.java.opts</name>

<value>-XX:-UseGCOverheadLimit-Xms1024m -Xmx2048m</value>

</property>

<property>

<name>yarn.app.mapreduce.am.command-opts</name>

<value>-Xmx2048m</value>

</property>

<property>

<name>mapreduce.jobhistory.address</name>

<value>master006:10020</value>

<description>MapReduce JobHistoryServer IPC host:port</description>

</property>

<property>

<name>mapreduce.jobhistory.webapp.address</name>

<value>master006:19888</value>

<description>MapReduce JobHistoryServer Web UI host:port</description>

</property>

<property>

<name>mapreduce.map.memory.mb</name>

<value>2048</value>

</property>

<property>

<name>mapreduce.reduce.memory.mb</name>

<value>4096</value>

</property>

<property>

<name>mapreduce.jobhistory.intermediate-done-dir</name>

<value>/user/history/done_intermediate</value>

</property>

<property>

<name>mapreduce.jobhistory.done-dir</name>

<value>/user/history/done</value>

</property>

</configuration>

/etc/hadoop/conf/yarn-site.xml

<?xml-stylesheettype="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

<name>yarn.resourcemanager.resource-tracker.address</name>

<value>master004:8031</value>

</property>

<property>

<name>yarn.resourcemanager.address</name>

<value>master004:8032</value>

</property>

<property>

<name>yarn.resourcemanager.scheduler.address</name>

<value>master004:8030</value>

</property>

<property>

<name>yarn.resourcemanager.admin.address</name>

<value>master004:8033</value>

</property>

<property>

<name>yarn.resourcemanager.webapp.address</name>

<value>master004:8088</value>

</property>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<property>

<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

<property>

<name>yarn.log-aggregation-enable</name>

<value>true</value>

</property>

<property>

<description>List of directories tostore localized files in.</description>

<name>yarn.nodemanager.local-dirs</name>

<value>/data/disk01/hadoop/yarn/local,/data/disk02/hadoop/yarn/local, /data/disk03/hadoop/yarn/local,/data/disk04/hadoop/yarn/local, /data/disk05/hadoop/yarn/local</value>

</property>

<property>

<description>Where to store containerlogs.</description>

<name>yarn.nodemanager.log-dirs</name>

<value>/data/disk01/hadoop/yarn/logs,/data/disk02/hadoop/yarn/logs, /data/disk03/hadoop/yarn/logs,/data/disk04/hadoop/yarn/logs, /data/disk05/hadoop/yarn/logs</value>

</property>

<!--property>

<description>Where to aggregate logsto.</description>

<name>yarn.nodemanager.remote-app-log-dir</name>

<value>/var/log/hadoop-yarn/apps</value>

</property-->

<property>

<description>Classpath for typicalapplications.</description>

<name>yarn.application.classpath</name>

<value>

$HADOOP_CONF_DIR,

$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,

$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,

$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,

$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*

</value>

</property>

<property>

<name>yarn.app.mapreduce.am.staging-dir</name>

<value>/user</value>

</property>

<property>

<description>The minimum allocationfor every container request at the RM,

in MBs. Memory requests lower than thiswon‘t take effect,

and the specified value will get allocatedat minimum.</description>

<name>yarn.scheduler.minimum-allocation-mb</name>

<value>1024</value>

</property>

<property>

<description>The maximum allocationfor every container request at the RM,

in MBs. Memory requests higher than thiswon‘t take effect,

and will get capped to thisvalue.</description>

<name>yarn.scheduler.maximum-allocation-mb</name>

<value>16384</value>

</property>

<property>

<description>The minimum allocationfor every container request at the RM,

in terms of virtual CPU cores. Requestslower than this won‘t take effect,

and the specified value will get allocatedthe minimum.</description>

<name>yarn.scheduler.minimum-allocation-vcores</name>

<value>1</value>

</property>

<property>

<description>The maximum allocationfor every container request at the RM,

in terms of virtual CPU cores. Requestshigher than this won‘t take effect,

and will get capped to thisvalue.</description>

<name>yarn.scheduler.maximum-allocation-vcores</name>

<value>32</value>

</property>

<property>

<description>Number of CPU cores thatcan be allocated

for containers.</description>

<name>yarn.nodemanager.resource.cpu-vcores</name>

<value>48</value>

</property>

<property>

<description>Amount of physicalmemory, in MB, that can be allocated

for containers.</description>

<name>yarn.nodemanager.resource.memory-mb</name>

<value>120000</value>

</property>

<property>

<description>Ratio between virtualmemory to physical memory when

setting memory limits for containers.Container allocations are

expressed in terms of physical memory, andvirtual memory usage

is allowed to exceed this allocation bythis ratio.

</description>

<name>yarn.nodemanager.vmem-pmem-ratio</name>

<value>6</value>

</property>

</configuration>

2. nodemanager创建本地目录

mkdir -p/data/disk01/hadoop/yarn/local /data/disk02/hadoop/yarn/local/data/disk03/hadoop/yarn/local /data/disk04/hadoop/yarn/local/data/disk05/hadoop/yarn/local

mkdir -p/data/disk01/hadoop/yarn/logs /data/disk02/hadoop/yarn/logs/data/disk03/hadoop/yarn/logs /data/disk04/hadoop/yarn/logs/data/disk05/hadoop/yarn/logs

chown -Ryarn:yarn /data/disk01/hadoop/yarn /data/disk02/hadoop/yarn/local/data/disk03/hadoop/yarn /data/disk04/hadoop/yarn /data/disk05/hadoop/yarn

chown -Ryarn:yarn /data/disk01/hadoop/yarn/local /data/disk02/hadoop/yarn/local/data/disk03/hadoop/yarn/local /data/disk04/hadoop/yarn/local/data/disk05/hadoop/yarn/local

chown -Ryarn:yarn /data/disk01/hadoop/yarn/logs /data/disk02/hadoop/yarn/logs/data/disk03/hadoop/yarn/logs /data/disk04/hadoop/yarn/logs/data/disk05/hadoop/yarn/logs

3. 创建history目录

sudo -u hdfs hadoop fs -mkdir /user/history

sudo -u hdfs hadoop fs -chmod -R 1777/user/history

sudo -u hdfs hadoop fs -chown yarn/user/history

4. 启动服务

resourcemanager:

sudo service hadoop-yarn-resourcemanagerstart

nodemanager:

sudo service hadoop-yarn-nodemanager start

mapreduce-historyserver:

sudo service hadoop-mapreduce-historyserverstart

5.4    安装路径

程序路径

/usr/lib/hadoop-yarn

配置文件路径

/etc/hadoop/conf

日志路径

/var/log/hadoop-yarn

5.5    运行|关闭|查看状态

resourcemanager:

service hadoop-yarn-resourcemanagerstart|stop|status

nodemanager:

service hadoop-yarn-nodemanagerstart|stop|status

mapreduce-historyserver:

service hadoop-mapreduce-historyserverstart|stop|status

Edit

5.6    常用命令

查看节点状态

yarn node -list -all

resourcemanager管理

yarm rmadmin ...

6      HBase

6.1    节点分配

hbase-master

master004, master005, master006

hbase-regionserver

slave001 ~~ 064

hbase-thrift

master004, master005, master006

hbase-rest

master004, master005, master006

6.2    安装

hbase-master

yum install -y hbase hbase-master

hbase-regionserver

yum install -y hbase hbase-regionserver

hbase-thrift

yum install -y hbase-thrift

hbase-rest

yum install -y hbase-rest

6.3    配置

1.配置文件

/etc/security/limits.conf

hdfs -nofile 32768

hbase -nofile 32768

/etc/hbase/conf/hbase-site.xml

<?xml-stylesheettype="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

<name>hbase.rest.port</name>

<value>60050</value>

</property>

<property>

<name>hbase.zookeeper.quorum</name>

<value>master002, master003,master004, master005,master006</value>

</property>

<property>

<name>hbase.cluster.distributed</name>

<value>true</value>

</property>

<property>

<name>hbase.tmp.dir</name>

<value>/tmp/hadoop/hbase</value>

</property>

<property>

<name>hbase.rootdir</name>

<value>hdfs://bdcluster/hbase/</value>

</property>

</configuration>

/etc/hbase/conf/hbase-env.sh

# Setenvironment variables here.

# Thisscript sets variables multiple times over the course of starting an hbaseprocess,

# so tryto keep things idempotent unless you want to take an even deeper look

# intothe startup scripts (bin/hbase, etc.)

# Thejava implementation to use.  Java 1.6required.

# exportJAVA_HOME=/usr/java/default/

# ExtraJava CLASSPATH elements.  Optional.

# exportHBASE_CLASSPATH=

# Themaximum amount of heap to use, in MB. Default is 1000.

# exportHBASE_HEAPSIZE=1000

# ExtraJava runtime options.

# Beloware what we set by default.  May onlywork with SUN JVM.

# Formore on why as well as other possible settings,

# seehttp://wiki.apache.org/hadoop/PerformanceTuning

exportHBASE_OPTS="-XX:+UseConcMarkSweepGC"

#Uncomment one of the below three options to enable java garbage collectionlogging for the server-side processes.

# Thisenables basic gc logging to the .out file.

# exportSERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails-XX:+PrintGCDateStamps"

exportSERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps$HBASE_GC_OPTS"

exportSERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M$HBASE_GC_OPTS"

# Thisenables basic gc logging to its own file.

# IfFILE-PATH is not replaced, the log file(.gc) would still be generated in theHBASE_LOG_DIR .

# exportSERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps-Xloggc:<FILE-PATH>"

# Thisenables basic GC logging to its own file with automatic log rolling. Onlyapplies to jdk 1.6.0_34+ and 1.7.0_2+.

# IfFILE-PATH is not replaced, the log file(.gc) would still be generated in theHBASE_LOG_DIR .

# exportSERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps-Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1-XX:GCLogFileSize=512M"

#Uncomment one of the below three options to enable java garbage collectionlogging for the client processes.

# Thisenables basic gc logging to the .out file.

# exportCLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails-XX:+PrintGCDateStamps"

exportCLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps$HBASE_GC_OPTS"

# Thisenables basic gc logging to its own file.

# IfFILE-PATH is not replaced, the log file(.gc) would still be generated in theHBASE_LOG_DIR .

# exportCLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps-Xloggc:<FILE-PATH>"

# Thisenables basic GC logging to its own file with automatic log rolling. Onlyapplies to jdk 1.6.0_34+ and 1.7.0_2+.

# IfFILE-PATH is not replaced, the log file(.gc) would still be generated in theHBASE_LOG_DIR .

# exportCLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps-Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1-XX:GCLogFileSize=512M"

#Uncomment below if you intend to use the EXPERIMENTAL off heap cache.

# exportHBASE_OPTS="$HBASE_OPTS -XX:MaxDirectMemorySize="

# Sethbase.offheapcache.percentage in hbase-site.xml to a nonzero value.

exportHBASE_USE_GC_LOGFILE=true

#Uncomment and adjust to enable JMX exporting

# Seejmxremote.password and jmxremote.access in $JRE_HOME/lib/management toconfigure remote password access.

# Moredetails at:http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html

#

# exportHBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false-Dcom.sun.management.jmxremote.authenticate=false"

# exportHBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE-Dcom.sun.management.jmxremote.port=10101"

# exportHBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE-Dcom.sun.management.jmxremote.port=10102"

# exportHBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS $HBASE_JMX_BASE-Dcom.sun.management.jmxremote.port=10103"

# exportHBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS $HBASE_JMX_BASE-Dcom.sun.management.jmxremote.port=10104"

# exportHBASE_REST_OPTS="$HBASE_REST_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10105"

# Filenaming hosts on which HRegionServers will run. $HBASE_HOME/conf/regionservers by default.

# exportHBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers

#Uncomment and adjust to keep all the Region Server pages mapped to be memoryresident

#HBASE_REGIONSERVER_MLOCK=true

#HBASE_REGIONSERVER_UID="hbase"

# Filenaming hosts on which backup HMaster will run. $HBASE_HOME/conf/backup-masters by default.

# exportHBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters

# Extrassh options.  Empty by default.

# exportHBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR"

# Wherelog files are stored.  $HBASE_HOME/logsby default.

# exportHBASE_LOG_DIR=${HBASE_HOME}/logs

# Enableremote JDWP debugging of major HBase processes. Meant for Core Developers

# exportHBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -Xdebug-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8070"

# exportHBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xdebug-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8071"

# exportHBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS -Xdebug-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8072"

# exportHBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS -Xdebug-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8073"

# Astring representing this instance of hbase. $USER by default.

# exportHBASE_IDENT_STRING=$USER

# Thescheduling priority for daemon processes. See ‘man nice‘.

# exportHBASE_NICENESS=10

# Thedirectory where pid files are stored. /tmp by default.

# exportHBASE_PID_DIR=/var/hadoop/pids

# Secondsto sleep between slave commands.  Unsetby default.  This

# can beuseful in large clusters, where, e.g., slave rsyncs can

#otherwise arrive faster than the master can service them.

# exportHBASE_SLAVE_SLEEP=0.1

# TellHBase whether it should manage it‘s own instance of Zookeeper or not.

exportHBASE_MANAGES_ZK=false

# Thedefault log rolling policy is RFA, where the log file is rolled as per the sizedefined for the

# RFAappender. Please refer to the log4j.properties file to see more details on thisappender.

# In caseone needs to do log rolling on a date change, one should set the environmentproperty

#HBASE_ROOT_LOGGER to "<DESIRED_LOG LEVEL>,DRFA".

# Forexample:

#HBASE_ROOT_LOGGER=INFO,DRFA

# Thereason for changing default to RFA is to avoid the boundary case of filling outdisk space as

# DRFAdoesn‘t put any cap on the log size. Please refer to HBase-5655 for morecontext.

2. 启动

hbase-master

service hbase-master start

hbase-regionserver

service hbase-regionserver start

hbase-thrift

service hbase-thrift start

hbase-rest

service hbase-rest start

6.4    安装路径

安装路径

/usr/lib/hbase

配置文件路径

/etc/hbase/conf

日志路径

/var/log/hbase

6.5    运行|关闭|查看状态

hbase-master:

service hbase-master start|stop|status

hbase-regionserver:

service hbase-regionserverstart|stop|status

hbase-thrift:

service hbase-thrift start|stop|status

hbase-rest:

service hbase-rest start|stop|status

6.6    常用命令

hbase shell

7      Spark

7.1    节点分配

master002 ~~ master006

7.2    安装

yum install spark-core spark-masterspark-worker spark-python

7.3    配置

1. /etc/spark/conf/spark-env.sh

export SPARK_HOME=/usr/lib/spark

2. 部署Spark到HDFS

source /etc/spark/conf/spark-env.sh

hdfs dfs -mkdir -p /user/spark/share/lib

sudo -u hdfs hdfs dfs -put/usr/lib/spark/assembly/lib/spark-assembly_2.10-0.9.0-cdh5.0.0-hadoop2.3.0-cdh5.0.0.jar/user/spark/share/lib/spark-assembly.jar

7.4    安装路径

程序路径

/usr/lib/spark

配置文件路径

/etc/spark/conf

日志路径

/var/log/spark

spark在hdfs的路径

/user/spark/share/lib/spark-assembly.jar

7.5    示例程序

source /etc/spark/conf/spark-env.sh

SPARK_JAR=hdfs://bdcluster/user/spark/share/lib/spark-assembly.jarAPP_JAR=$SPARK_HOME/examples/lib/spark-examples_2.10-0.9.0-cdh5.0.0.jar$SPARK_HOME/bin/spark-class org.apache.spark.deploy.yarn.Client --jar $APP_JAR--class org.apache.spark.examples.SparkPi
--args yarn-standalone --args 10

Cloudera CDH 5集群搭建(yum 方式),布布扣,bubuko.com

时间: 2024-12-20 09:52:09

Cloudera CDH 5集群搭建(yum 方式)的相关文章

Redis Cluster集群搭建测试

# Redis Clutser # ## 一.Redis Cluster集群 ## 参考资料: http://www.cnblogs.com/lykxqhh/p/5690923.html Redis集群搭建的方式有多种,例如使用zookper等,但从redis3.0之后版本支持redis cluster集群,Redis Cluster采用无中心结构,每个节点保存数据和整个集群状态,每个节点都和其他所有节点连接.其redis cluster架构图如下: 其结构特点: 1.所有的redis节点彼此互

Redis Cluster集群搭建与应用

1.redis-cluster设计 Redis集群搭建的方式有多种,例如使用zookeeper,但从redis 3.0之后版本支持redis-cluster集群,redis-cluster采用无中心结构,每个节点保存数据和整个集群状态,每个节点都和其他所有节点连接.其redis-cluster架构图如下: 其结构特点 所有的redis节点彼此互联(PING-PONG机制),内部使用二进制协议优化传输速度和带宽. 节点的fail是通过集群中超过半数的节点检测失效时才生效. 客户端与redis节点直

etcd集群搭建

一.etcd简介与应用场景 etcd 是一个分布式一致性k-v存储系统,可用于服务注册发现与共享配置,具有以下优点:1.简单 : 相比于晦涩难懂的paxos算法,etcd基于相对简单且易实现的raft算法实现一致性,并通过gRPC提供接口调用:2.安全:支持TLS通信,并可以针对不同的用户进行对key的读写控制:3.高性能:10,000 /秒的写性能.其主要应用于服务注册发现以及共享配置. 1. 服务注册与发现 服务启动后向etcd注册,并上报自己的监听的端口以及当前的权重因子等信息,且对该信息

CDH集群搭建步骤

CDH集群搭建步骤详细文档 一.关于CDH和Cloudera Manager CDH (Cloudera's Distribution,including Apache Hadoop),是Hadoop众多分支中的一种,由Cloudera维护,基于稳定版本的Apache Hadoop构建,并集成了很多补丁,可直接用于生产环境. Cloudera Manager则是为了便于在集群中进行Hadoop等大数据处理相关的服务安装和监控管理的组件,对集群中主机.Hadoop.Hive.Spark等服务的安装

Replica Set副本集方式的mongodb集群搭建

1.环境: 单台服务器上开启四个mongodb实例来实现mongodb的Replica Set副本集方式的集群搭建 2.配置文件: master主实例配置文件: [[email protected] ~]# cat /usr/local/mongodb/mongod.cnf logpath=/data/mongodb-master/logs/mongodb.log logappend = true #fork and run in background fork = true port = 27

Mongodb集群搭建的三种方式

Mongodb是时下流行的NoSql数据库,它的存储方式是文档式存储,并不是Key-Value形式.关于Mongodb的特点,这里就不多介绍了,大家可以去看看官方说明:http://docs.mongodb.org/manual/ 今天主要来说说Mongodb的三种集群方式的搭建:Replica Set / Sharding / Master-Slaver.这里只说明最简单的集群搭建方式(生产环境),如果有多个节点可以此类推或者查看官方文档.OS是Ubuntu_x64系统,客户端用的是Java客

《CDH集群搭建视频资料》百度云网盘下载

<CDH集群搭建视频资料>百度云下载 链接: http://pan.baidu.com/s/1i5DVBlb  密码:2mny

虚拟机下 solr7.1 cloud 集群搭建 (手动解压和官方脚本两种方式)

准备工作: vmware workstation 12,OS使用的是ubuntu16.04,三台虚拟机搭建一个solr集群,zookeeper共用这三台虚拟机组成zookeeper集群. zookeeper的版本为3.4.10,solr版本为7.1,不使用tomcat,使用solr自带的jetty.jdk版本为1.8.0_151. 第一步:虚拟机的建立 选择默认配置即可,内存我配置的2G一台,1CPU,网络采用NAT,DHCP自动分配.建好一台虚拟机后,我们可以去配置一些基本环境,如Jdk等,然

CDH 6.0.1 集群搭建 「After install」

集群搭建完成之后其实还有很多配置工作要做,这里我列举一些我去做的一些. 首先是去把 zk 的角色重新分配一下,不知道是不是我在配置的时候遗漏了什么在启动之后就有报警说目前只能检查到一个节点.去将 zk 角色调整到三个节点. 上一张目前的角色图 下面我将分别列出各应用的各个简写代表的意义: Hbase: M: Master | RS: RagionServer HDFS: B: Balance | DN: DataNode | FC: Failover Controller | JN: Journ