今天正在了解HBase和Hadoop,了解到HBase1.1.x为稳定版,对应的Hadoop2.5.x是最新的支持此版本HBase的,同时jdk版本为jdk7才能支持。--本段话内容参考自Apache官方文档:
1.本表格为jdk与hbase版本对应:
2.以下表格为hbase与Hadoop版本对应。
Hadoop version support matrix
- "S" = supported
- "X" = not supported
- "NT" = Not tested
官方强烈建议安装Hadoop2.x:
Hadoop 2.x is recommended. Hadoop 2.x is faster and includes features, such as short-circuit reads, which will help improve your HBase random read profile. Hadoop 2.x also includes important bug fixes that will improve your overall HBase experience. HBase 0.98 drops support for Hadoop 1.0, deprecates use of Hadoop 1.1+, and HBase 1.0 will not support Hadoop 1.x. |
本想把环境搭建起来,可是找不到机器,我找了一篇文章专门搭建和配置此环境的,先拿来贴在下面,等有机会自己搭一套。
以下详细安装配置的指导内容转自:http://blog.csdn.net/yuansen1999/article/details/50542018
===================================以下全文:
版权声明:本文为博主原创文章,未经博主允许不得转载。
【说明】
hbase自1.0版本发布之后,标志着hbase可以投入企业的生产使用。此后又发布了1.x版本, 这里的1.1.2版本就是其中的一个稳定版本。
因为hbase对Hadoop的库有依赖关系,对于hbase1.1.2要求hadoop的库为2.5.1,所以使用hadoop2.5.1版本做为基本环境。如果使用其它
的hadoop版本, 还需要它lib下的jar文件替换成hadoop的版本,不然就会报本地库找不到的错误, 下面是实际的安装步骤。
1、 软件安装版本
组件名 |
版本 |
备注 |
操作系统 |
CentOS release 6.4 (Final) |
64位 |
JDK |
jdk-7u80-linux-x64.gz |
|
Hadoop |
hadoop-2.5. 1.tar.gz |
|
ZooKeeper |
zookeeper-3.4.6.tar.gz |
|
HBase |
hbase-1.1.2.tar.gz |
2、 主机规划
IP |
HOST |
模块部署 |
192.168.8.127 |
master |
QuorumPeerMain DataNode ResourceManager HRegionServer NodeManager SecondaryNameNode NameNode HMaster |
192.168.8.128 |
slave01 |
DataNode QuorumPeerMain HRegionServer NodeManager |
192.168.8.129 |
slave02 |
QuorumPeerMain HRegionServer NodeManager DataNode |
3、 目录规划
IP |
目录 |
192.168.8.127 |
三个挂载点 根目录: /dev/sda1 / swap目录: tmpfs /dev/shm hadoop目录: /dev/sda3 /hadoop |
192.168.8.128 |
三个挂载点 根目录: /dev/sda1 / swap目录: tmpfs /dev/shm hadoop目录: /dev/sda3 /hadoop |
192.168.8.129 |
三个挂载点 根目录: /dev/sda1 / swap目录: tmpfs /dev/shm hadoop目录: /dev/sda3 /hadoop |
[[email protected]~]# df -h
4、 为每台主机创建用户hadoop并属于hadoop组
3.1、创建工作组hadoop:
[[email protected] ~]# groupadd hadoop
3.2、新建用户hadoop并添加至hadoop组别:
[[email protected] ~]# useradd hadoop -g hadoop
3.3、设置hadoop用户密码为hadoop:
[[email protected] ~]# passwd hadoop
5、 修改并配置主机名
[[email protected] ~]# vi /etc/hosts
127.0.0.1 localhost
192.168.8.127 master
192.168.8.128 slave01
192.168.8.129slave02
[[email protected] ~]# vi /etc/sysconfig/network
关机重启:
[[email protected] ~]# reboot
查看主机名:
修改hadoop目录的拥有者:
[[email protected] ~]# chown hadoop:hadoop -R /hadoop
[[email protected] ~]# ls -l /
6、 上传安装软件包至hadoop用户主目录
7、 安装JDK
6.1 安装JDK
[[email protected] ~]# cd /usr/local/
[[email protected] local]$ tar -zxvf jdk-7u80-linux-x64.gz
6.2 配置JDK环境变量
export JAVA_HOME= /usr/local/jdk1.7.0_80
export JRE_HOME= /usr/local/jdk1.7.0_80/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
6.3使环境变量生效
[[email protected]~]$ source .bashrc
6.4检测JDK是否安装成功:
8、 配置各节点间SSH安全通信协议:
7.1、创建文件目录:
[[email protected] ~]$ mkdir .ssh
7.2、进入.ssh目录进行相应配置:
[[email protected] ~]$ cd .ssh/
7.3、生成公钥文件:
[[email protected] .ssh]$ ssh-keygen -t rsa
备注:一路回车即可
7.4、将生成的公钥文件添加至认证文件:
[[email protected] .ssh]$ cat id_rsa.pub >>authorized_keys
7.5、赋予.ssh文件700权限:
[[email protected] .ssh]$ chmod 700 .ssh/
这个有的机器必须,但有的是可选。
7.5、赋予认证文件600权限:
[[email protected] .ssh]$ chmod 600 authorized_keys
一定是600,不然不会成功。
7.6、测试SSH无密码登录:
[[email protected] hadoop]$ ssh master
Last login: Tue Jan 19 13:58:27 2016 from 192.168.8.1
7.7、依次生成其他节点的SSH无密码登录(一样套路)
7.8、将master节点节点的公钥文件追加至其他节点(以master追加至slave01为例进行)
7.8.1、将master中的公钥id_rsa.pub远程拷贝至slave01节点的.ssh目录下并重新命名为:master.pub
[[email protected] .ssh]$ scp id_rsa.pub slave01:/home/hadoop/.ssh/master.pub
这个步骤,注意不要把人家的id_rsa.pub给覆盖了。
7.8.2、切换至slave01节点,将master.pub追加至认证文件authorized_keys文件中
[[email protected] .ssh]$ cat master_rsa.pub >>authorized_keys
7.8.3、slave02与以上步骤相同
备注:第一次登录时需要进行密码输入
9、 安装Hadoop:
8.1、解压安装包:
[[email protected] ~]$cd /hadoop
[[email protected] hadoop]$tar -zxvf hadoop-2.5.1.tar.gz
8.2、配置Hadoop环境变量:
[[email protected] ~]$vi .bashrc
export HADOOP_HOME=/hadoop/hadoop-2.5.1
export HADOOP_CONF_DIR=/hadoop/hadoop-2.5.1/etc/hadoop
exportPATH=.:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
8.3、使环境变量生效:
[[email protected] ~]$source .bashrc
8.4、进入hadoop配置目录按照以下表格进行配置:
备注:现将附件中的fairscheduler.xml文件copy至/hadoop/hadoop-2.5.1/
etc/hadoop中
[[email protected] hadoop]$ pwd
/hadoop/hadoop-2.5.1/etc/hadoop
core-site.xml |
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master:8020</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/tmp</value> </property> <property> <name>hadoop.proxyuser.root.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.root.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.yarn.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.yarn.groups</name> <value>*</value> </property> </configuration> |
hdfs-site.xml |
<configuration> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/home/hadoop/hadoop/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/home/hadoop/hadoop/dfs/data</value> </property> </configuration> |
mapred-site.xml |
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value> </property> <property> <name>mapred.child.Java.opts</name> <value>-Xmx4096m</value> </property> </configuration> |
yarn-site.xml |
<configuration> <!-- Site specific YARN configuration properties --> <property> <description>The hostname of the RM.</description> <name>yarn.resourcemanager.hostname</name> <value>master</value> </property> <property> <description>The address of the applications manager interface in the RM.</description> <name>yarn.resourcemanager.address</name> <value>${yarn.resourcemanager.hostname}:8032</value> </property> <property> <description>The address of the scheduler interface.</description> <name>yarn.resourcemanager.scheduler.address</name> <value>${yarn.resourcemanager.hostname}:8030</value> </property> <property> <description>The http address of the RM web application.</description> <name>yarn.resourcemanager.webapp.address</name> <value>${yarn.resourcemanager.hostname}:8088</value> </property> <property> <description>The https adddress of the RM web application.</description> <name>yarn.resourcemanager.webapp.https.address</name> <value>${yarn.resourcemanager.hostname}:8090</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>${yarn.resourcemanager.hostname}:8031</value> </property> <property> <description>The address of the RM admin interface.</description> <name>yarn.resourcemanager.admin.address</name> <value>${yarn.resourcemanager.hostname}:8033</value> </property> <property> <description>The class to use as the resource scheduler.</description> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value> </property> <property> <description>fair-scheduler conf location</description> <name>yarn.scheduler.fair.allocation.file</name> <value>${yarn.home.dir}/etc/hadoop/fairscheduler.xml</value> </property> <property> <description>List of directories to store localized files in. An application‘s localized file directory will be found in: ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}.Individual containers‘ work directories, calledcontainer_${contid}, will be subdirectories of this. </description> <name>yarn.nodemanager.local-dirs</name> <value>/home/hadoop/hadoop/local</value> </property> <property> <description>Whether to enable log aggregation</description> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <description>Where to aggregate logs to.</description> <name>yarn.nodemanager.remote-app-log-dir</name> <value>/tmp/logs</value> </property> <property> <description>Amount of physical memory, in MB, that can be allocated for containers.</description> <name>yarn.nodemanager.resource.memory-mb</name> <value>30720</value> </property> <property> <description>Number of CPU cores that can be allocated for containers.</description> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>8</value> </property> <property> <description>the valid service name should only contain a-zA-Z0-9_ and can not start with numbers</description> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration> |
slaves |
master slave01 slave02 |
hadoop-env.sh |
export JAVA_HOME=/hadoop/jdk1.7.0_80 备注:最后一行进行添加 |
8.5、配置各叶子节点的环境:
8.5.1、在master端将hadoop-2.5.1、jdk1.7.0_80、环境变量文件.bashrc文件远程拷贝至其他节点
8.5.2、在slave01、slave02节点执行使环境变量生效的命令:
[[email protected] ~]$ source.bashrc
8.6、进行格式化:
[[email protected] hadoop]$hadoop namenode –format
8.7、启动Hadoop:
[[email protected] hadoop]$ start-all.sh
[[email protected] hadoop]$ mr-jobhistory-daemon.shstart historyserver
8.8、查看启动进程:
8.8.1、master节点进程:
[[email protected] hadoop]$ jps
3456 Jps
2305 NameNode
3418 JobHistoryServer
2592 SecondaryNameNode
2844 NodeManager
2408 DataNode
2739 ResourceManager
8.8.2、slave01、slave02节点进程:
[[email protected]~]$ jps
2567Jps
2249DataNode
2317NodeManager
[[email protected]~]$ jps
2298NodeManager
2560Jps
2229DataNode
8.9、在各个节点关闭防火墙:
[[email protected] ~]# iptables -F
[[email protected] ~]# service iptables save
[[email protected] ~]# service iptables stop
[[email protected] ~]# chkconfig iptablesoff
有ip6tables的,也一样
[[email protected] ~]# ip6tables -F
[[email protected] ~]# service ip6tables save
[[email protected] ~]# service ip6tablesstop
[[email protected] ~]# chkconfig ip6tablesoff
8.10、访问Web页面:
http://master:8088/cluster/cluster
10、 安装ZooKeeper:
10.1、master端安装:
10.1.1、解压安装包:
[[email protected] ~]$cd /hadoop
[[email protected] hadoop]$tar -zxvf zookeeper-3.4.6.tar.gz
10.1.2、配置环境变量:
[[email protected] ~]$vi .bashrc
export ZOOKEEPER_HOME=/hadoop/zookeeper-3.4.6
exportPATH=.:$ZOOKEEPER_HOME/bin:$ZOOKEEPER_HOME/conf:$PATH
10.1.3、使环境变量生效:
[[email protected] ~]$source .bashrc
10.1.4、切换至ZooKeeper的配置文件目录进行配置:
[[email protected] ~]$ cd /hadoop/zookeeper-3.4.6/conf/
10.1.5、新建Zookeeper配置文件:
[[email protected] conf]$ cpzoo_sample.cfg zoo.cfg
10.1.6、对zoo.cfg进行配置:
内容 |
备注 |
dataDir=/hadoop/zookeeperdata |
1、 此为修改项 2、 hadoop为用户名 |
clientPort=2181 |
1、此为修改项 |
server.1=master:2888:3888 server.2=slave01:2888:3888 server.3= slave02:2888:3888 |
1、此为新增项 |
10.1.7、在主目录下进行一下操作:
[[email protected] ~]$ cd /hadoop
[[email protected] hadoop]$ mkdirzookeeperdata
[[email protected] hadoop]$ echo"1" > /hadoop/zookeeperdata/myid
10.2、salve01端安装:
10.2.1、将hadoop中zookeeper-3.4.6进行远程复制到salve01的主目录:
[[email protected] hadoop]$ scp -r zookeeper-3.4.6slave01:/hadoop
10.2.2、将master中.bashrc文件远程拷贝至datanode1中:
[[email protected] ~]$ cd
[[email protected] ~]$ scp.bashrc slave01:/home/hadoop
10.2.3、在slave01中使环境变量生效:
[[email protected]~]$ source .bashrc
10.2.4、在slave01中进行如下操作:
[[email protected] ~]$ cd /hadoop
[[email protected] hadoop]$ mkdir zookeeperdata
[[email protected] hadoop]$ echo"2" > /home/hadoop/zookeeperdata/myid
10.3、slave02端的安装(忽略):
[[email protected]~]$ source .bashrc
[[email protected] ~]$ cd /hadoop
[[email protected] hadoop]$ mkdir zookeeperdata
[[email protected] hadoop]$ echo"3" > /hadoop/zookeeperdata/myid
10.4、启动所有zookeeper服务:
[[email protected] hadoop]$ zkServer.shstart
JMX enabled by default
Using config:/hadoop/zookeeper-3.4.6/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[[email protected] ~]$ zkServer.sh start
JMX enabled by default
Using config:/hadoop/zookeeper-3.4.6/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[[email protected] hadoop]$ zkServer.shstart
JMX enabled by default
Using config:/hadoop/zookeeper-3.4.6/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
10.5、使用JPS查看进程:
[[email protected] hadoop]$ jps
2305 NameNode
3608 Jps
3418 JobHistoryServer
2592 SecondaryNameNode
2844 NodeManager
2408 DataNode
2739 ResourceManager
3577 QuorumPeerMain
其中“QuorumPeerMain” 就是我们的zookeeper进程。
[[email protected] ~]$ jps
2249 DataNode
2662 Jps
2317 NodeManager
2616 QuorumPeerMain
[[email protected] hadoop]$ jps
2599 QuorumPeerMain
2298 NodeManager
2652 Jps
2229 DataNode
11、安装HBASE:
11.1、配置NTP时间同步服务:
11.1.1、服务端(Master)配置:
[[email protected]]$ su - root
密码:
[[email protected] ~]# vi/etc/ntp.conf
修改以下配置:
#restrictdefault kod nomodify notrap nopeer noquery
restrictdefault kod nomodify
restrict-6 default kod nomodify notrap nopeer noquery
修改完成之后,启动ntpd.
[[email protected] ~]service ntpd start
[[email protected] ~]chkconfig ntpd on
11.1.2、客户端配置:
[[email protected] ~]$su - root
密码:
[[email protected]~]# crontab -e
输入 以下命令:
0-59/10 * * * */usr/sbin/ntpdate 192.168.8.127 && /sbin/hwclock -w
我们每隔10分钟与主机对一下时间。
11.2安装HBASE
11.2.1、解压缩hbase安装包
[[email protected] ~]$ cd /hadoop
[[email protected] hadoop]$ tar -zxvf hbase-1.1.2-bin.tar.gz
11.2.2、配置环境变量:
[[email protected] hadoop]$ vi ~/.bashrc
增加hbase的目录:
export HBASE_HOME=/hadoop/hbase-1.1.2
exportPATH=.:$HBASE_HOME/bin:$HBASE_HOME/conf:$PATH
11.2.3、使环境变量生效:
[[email protected] hadoop]$ source ~/.bashrc
11.2.4、切换至HBase的配置目录:
[[email protected] hadoop]$ cd /hadoop/hbase-1.1.2/conf
11.2.5、配置hbase-env.sh文件:
[[email protected]]$ vi hbase-env.sh
内容 |
备注 |
export HBASE_MANAGES_ZK=false |
1、此为修改项; |
11.2.6、配置hbase-site.xml文件:
[[email protected] conf]$ vihbase-site.xml
<configuration> <property> <name>hbase.rootdir</name> <value>hdfs://master:8020/hbase</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.master</name> <value>master</value> </property> <property> <name>hbase.zookeeper.property.clientPort</name> <value>2181</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>master,slave01,slave02</value> </property> </configuration> |
11.2.7、配置regionservers文件:
[[email protected] conf]$ vi regionservers
master
slave01
slave02
11.2.8、slave01与slave02配置:
同master配置
11.2.9、启动Hbase(确保HADOOP和ZOOKEEPER已经启动)
[[email protected] conf]$ start-hbase.sh
11.2.10、使用JPS查看进程:
[[email protected] hadoop]$ jps
2305 NameNode
3418 JobHistoryServer
2592 SecondaryNameNode
2844 NodeManager
2408 DataNode
2739 ResourceManager
3577 QuorumPeerMain
3840 HMaster
4201 Jps
3976 HRegionServer
11.2.11、进入HBASE命令行模式并进行相应查询:
[[email protected] hadoop]$ hbase shell
HBase Shell; enter‘help<RETURN>‘ for list of supported commands.
Type "exit<RETURN>" toleave the HBase Shell
Version 1.1.2,rcc2b70cf03e3378800661ec5cab11eb43fafe0fc, Wed Aug 26 20:11:27 PDT 2015
hbase(main):005:0> list
TABLE
0 row(s) in 0.0270 seconds
=> []
我们创建一个表,看看是否成功:
hbase(main):006:0> create‘test‘,‘info‘
0 row(s) in 2.3150 seconds
=> Hbase::Table - test
hbase(main):007:0>
看来是成功了,添加一条数据,看看是否能够保存。
hbase(main):008:0> put‘test‘,‘u00001‘,‘info:username‘,‘yuansen‘
0 row(s) in 0.1400 seconds
hbase(main):009:0> scan ‘test‘
ROW COLUMN+CELL
u00001 column=info:username,timestamp=1453186521452, value=yuansen
1 row(s) in 0.0550 seconds
看来的确是成功了。