Hadoop+Hbase分布式集群架构“完全篇

1、认识Hadoop和Hbase

1.1 hadoop简单介绍

  Hadoop是一个使用java编写的Apache开放源代码框架,它允许使用简单的编程模型跨大型计算机的大型数据集进行分布式处理。Hadoop框架工作的应用程序可以在跨计算机群集提供分布式存储和计算的环境中工作。Hadoop旨在从单一服务器扩展到数千台机器,每台机器都提供本地计算和存储。

1.2 Hadoop架构

Hadoop框架包括以下四个模块:

  • Hadoop Common:这些是其他Hadoop模块所需的Java库和实用程序。这些库提供文件系统和操作系统级抽象,并包含启动Hadoop所需的必要Java文件和脚本。
  • Hadoop YARN:这是作业调度和集群资源管理框架
  • Hadoop分布式文件系统(HDFS):提供对应用程序数据的高吞吐量访问的分布式文件系统。
  • Hadoop MapReduce: 这是基于YARN的大型数据集并行处理系统。

我们可以使用下图来描述Hadoop框架中可用的这四个组件。

  自2012年以来,术语“Hadoop”通常不仅指向上述基本模块,而且还指向可以安装在Hadoop之上或之外的其他软件包,例如Apache Pig,Apache Hive,Apache HBase,Apache火花等

1.3 Hadoop如何工作?

(1)阶段1

  用户/应用程序可以通过指定以下项目向Hadoop(hadoop作业客户端)提交所需的进程:

  • 分布式文件系统中输入和输出文件的位置。
  • java类以jar文件的形式包含了map和reduce功能的实现。
  • 通过设置作业特定的不同参数来进行作业配置。

(2)阶段2

  然后,Hadoop作业客户端将作业(jar /可执行文件等)和配置提交给JobTracker,JobTracker负责将软件/配置分发到从站,调度任务和监视它们,向作业客户端提供状态和诊断信息。

(3)阶段3

  不同节点上的TaskTrackers根据MapReduce实现执行任务,并将reduce函数的输出存储到文件系统的输出文件中。

1.4 Hadoop的优点

  • Hadoop框架允许用户快速编写和测试分布式系统。它是高效的,它自动分配数据并在机器上工作,反过来利用CPU核心的底层并行性。
  • Hadoop不依赖硬件提供容错和高可用性(FTHA),而是Hadoop库本身被设计为检测和处理应用层的故障。
  • 服务器可以动态添加或从集群中删除,Hadoop继续运行而不会中断。
  • Hadoop的另一大优点是,除了是开放源码,它是所有平台兼容的,因为它是基于Java的。

1.5 HBase介绍

  Hbase全称为Hadoop Database,即hbase是hadoop的数据库,是一个分布式的存储系统。Hbase利用Hadoop的HDFS作为其文件存储系统利用Hadoop的MapReduce来处理Hbase中的海量数据利用zookeeper作为其协调工具

1.6 HBase体系架构

Client

  • 包含访问HBase的接口并维护cache来加快对HBase的访问

Zookeeper

  • 保证任何时候,集群中只有一个master
  • 存贮所有Region的寻址入口。
  • 实时监控Region server的上线和下线信息。并实时通知Master
  • 存储HBase的schema和table元数据

Master

  • 为Region server分配region
  • 负责Region server的负载均衡
  • 发现失效的Region server并重新分配其上的region
  • 管理用户对table的增删改操作

RegionServer

  • Region server维护region,处理对这些region的IO请求
  • Region server负责切分在运行过程中变得过大的region 

HLog(WAL log)

  • HLog文件就是一个普通的Hadoop Sequence File,Sequence File 的Key是 HLogKey对象,HLogKey中记录了写入数据的归属信息,除了table和 region名字外,同时还包括sequence number和timestamp,timestamp是” 写入时间”,sequence number的起始值为0,或者是最近一次存入文件系 统中sequence number。
  • HLog SequeceFile的Value是HBase的KeyValue对象,即对应HFile中的 KeyValue

Region

  • HBase自动把表水平划分成多个区域(region),每个region会保存一个表 里面某段连续的数据;每个表一开始只有一个region,随着数据不断插 入表,region不断增大,当增大到一个阀值的时候,region就会等分会 两个新的region(裂变);
  • 当table中的行不断增多,就会有越来越多的region。这样一张完整的表 被保存在多个Regionserver上。

Memstore 与 storefile

  • 一个region由多个store组成,一个store对应一个CF(列族)
  • store包括位于内存中的memstore和位于磁盘的storefile写操作先写入 memstore,当memstore中的数据达到某个阈值,hregionserver会启动 flashcache进程写入storefile,每次写入形成单独的一个storefile
  • 当storefile文件的数量增长到一定阈值后,系统会进行合并(minor、 major compaction),在合并过程中会进行版本合并和删除工作 (majar),形成更大的storefile。
  • 当一个region所有storefile的大小和超过一定阈值后,会把当前的region 分割为两个,并由hmaster分配到相应的regionserver服务器,实现负载均衡。
  • 客户端检索数据,先在memstore找,找不到再找storefile
  • HRegion是HBase中分布式存储和负载均衡的最小单元。最小单元就表 示不同的HRegion可以分布在不同的HRegion server上。
  • HRegion由一个或者多个Store组成,每个store保存一个columns family。
  • 每个Strore又由一个memStore和0至多个StoreFile组成。

2、安装搭建hadoop

2.1 配置说明

本次集群搭建共三台机器,具体说明下:

主机名 IP 说明
hadoop01 192.168.10.101 DataNode、NodeManager、ResourceManager、NameNode
hadoop02 192.168.10.102 DataNode、NodeManager、SecondaryNameNode
hadoop03 192.168.10.106 DataNode、NodeManager

2.2 安装前准备

2.2.1 机器配置说明

?


1

2

3

4

$ cat /etc/redhat-release

CentOS Linux release 7.3.1611 (Core)  

$ uname -r

3.10.0-514.el7.x86_64

注:本集群内所有进程均由clsn用户启动;要在集群所有服务器都进行操作。

2.2.2 关闭selinux、防火墙

?


1

2

3

4

5

6

7

8

[[email protected] ~]$ sestatus

SELinux status:                 disabled

[[email protected] ~]$ iptables -F

[[email protected] ~]$ systemctl status firewalld.service

● firewalld.service - firewalld - dynamic firewall daemon

   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)

   Active: inactive (dead)

     Docs: man:firewalld(1)

  

2.2.3 准备用户

?


1

2

$ id along

uid=1000(along) gid=1000(along) groups=1000(along)

  

2.2.4 修改hosts文件,域名解析

?


1

2

3

4

5

6

7

$ cat /etc/hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4

::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.10.101 hadoop01

192.168.10.102 hadoop02

192.168.10.103 hadoop03

  

2.2.5 同步时间

?


1

2

$ yum -y install ntpdate

$ sudo ntpdate cn.pool.ntp.org

  

2.2.6 ssh互信配置

(1)生成密钥对,一直回车即可

?


1

[[email protected] ~]$ ssh-keygen

(2)保证每台服务器各自都有对方的公钥

?


1

2

3

4

5

6

7

8

9

10

---along用户

[[email protected] ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub 127.0.0.1

[[email protected] ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop01

[[email protected] ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop02

[[email protected] ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop03

---root用户

[[email protected] ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub 127.0.0.1

[[email protected] ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop01

[[email protected] ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop02

[[email protected] ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop03

注:要在集群所有服务器都进行操作

(3)验证无秘钥认证登录

?


1

2

3

[[email protected] ~]$ ssh [email protected]

[[email protected] ~]$ ssh [email protected]

[[email protected] ~]$ ssh [email protected]

  

2.3 配置jdk

在三台机器上都需要操作

?


1

2

3

4

5

6

7

8

9

10

11

[[email protected] ~]# tar -xvf jdk-8u201-linux-x64.tar.gz -C /usr/local

[[email protected] ~]# chown along.along -R /usr/local/jdk1.8.0_201/

[[email protected] ~]# ln -s /usr/local/jdk1.8.0_201/ /usr/local/jdk

[[email protected] ~]# cat /etc/profile.d/jdk.sh

export JAVA_HOME=/usr/local/jdk

PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH

[[email protected] ~]# source /etc/profile.d/jdk.sh

[[email protected] ~]$ java -version

java version "1.8.0_201"

Java(TM) SE Runtime Environment (build 1.8.0_201-b09)

Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)

  

2.4 安装hadoop

?


1

2

3

4

[[email protected] ~]# wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.2.0/hadoop-3.2.0.tar.gz

[[email protected] ~]# tar -xvf hadoop-3.2.0.tar.gz -C /usr/local/

[[email protected] ~]# chown along.along -R /usr/local/hadoop-3.2.0/

[[email protected] ~]# ln -s /usr/local/hadoop-3.2.0/  /usr/local/hadoop

  

3、配置启动hadoop

3.1  hadoop-env.sh 配置hadoop环境变量

?


1

2

3

4

5

[[email protected] ~]$ cd /usr/local/hadoop/etc/hadoop/

[[email protected] hadoop]$ vim hadoop-env.sh

export JAVA_HOME=/usr/local/jdk

export HADOOP_HOME=/usr/local/hadoop

export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop

  

3.2 core-site.xml 配置HDFS

?


1

2

3

4

5

6

7

8

9

10

11

12

13

14

[[email protected] hadoop]$ vim core-site.xml

<configuration>

    <!-- 指定HDFS默认(namenode)的通信地址 -->

    <property>

        <name>fs.defaultFS</name>

        <value>hdfs://hadoop01:9000</value>

    </property>

    <!-- 指定hadoop运行时产生文件的存储路径 -->

    <property>

        <name>hadoop.tmp.dir</name>

        <value>/data/hadoop/tmp</value>

    </property>

</configuration>

[[email protected] ~]# mkdir /data/hadoop

  

3.3 hdfs-site.xml 配置namenode

?


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

[[email protected] hadoop]$ vim hdfs-site.xml

<configuration>

    <!-- 设置namenode的http通讯地址 -->

    <property>

        <name>dfs.namenode.http-address</name>

        <value>hadoop01:50070</value>

    </property>

    <!-- 设置secondarynamenode的http通讯地址 -->

    <property>

        <name>dfs.namenode.secondary.http-address</name>

        <value>hadoop02:50090</value>

    </property>

    <!-- 设置namenode存放的路径 -->

    <property>

        <name>dfs.namenode.name.dir</name>

        <value>/data/hadoop/name</value>

    </property>

    <!-- 设置hdfs副本数量 -->

    <property>

        <name>dfs.replication</name>

        <value>2</value>

    </property>

    <!-- 设置datanode存放的路径 -->

    <property>

        <name>dfs.datanode.data.dir</name>

        <value>/data/hadoop/datanode</value>

    </property>

    <property>

        <name>dfs.permissions</name>

        <value>false</value>

    </property>

</configuration>

[[email protected] ~]# mkdir /data/hadoop/name -p

[[email protected] ~]# mkdir /data/hadoop/datanode -p

  

3.4 mapred-site.xml 配置框架

?


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

[[email protected] hadoop]$ vim mapred-site.xml

<configuration>

    <!-- 通知框架MR使用YARN -->

    <property>

        <name>mapreduce.framework.name</name>

        <value>yarn</value>

    </property>

    <property>

        <name>mapreduce.application.classpath</name>

        <value>

        /usr/local/hadoop/etc/hadoop,

        /usr/local/hadoop/share/hadoop/common/*,

        /usr/local/hadoop/share/hadoop/common/lib/*,

        /usr/local/hadoop/share/hadoop/hdfs/*,

        /usr/local/hadoop/share/hadoop/hdfs/lib/*,

        /usr/local/hadoop/share/hadoop/mapreduce/*,

        /usr/local/hadoop/share/hadoop/mapreduce/lib/*,

        /usr/local/hadoop/share/hadoop/yarn/*,

        /usr/local/hadoop/share/hadoop/yarn/lib/*

        </value>

    </property>

</configuration>

  

3.5 yarn-site.xml 配置resourcemanager

?


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

[[email protected] hadoop]$ vim yarn-site.xml

<configuration>

    <property>

        <name>yarn.resourcemanager.hostname</name>

        <value>hadoop01</value>

    </property>

    <property>

        <description>The http address of the RM web application.</description>

        <name>yarn.resourcemanager.webapp.address</name>

        <value>${yarn.resourcemanager.hostname}:8088</value>

    </property>

    <property>

        <description>The address of the applications manager interface in the RM.</description>

        <name>yarn.resourcemanager.address</name>

        <value>${yarn.resourcemanager.hostname}:8032</value>

    </property>

    <property>

        <description>The address of the scheduler interface.</description>

        <name>yarn.resourcemanager.scheduler.address</name>

        <value>${yarn.resourcemanager.hostname}:8030</value>

    </property>

    <property>

        <name>yarn.resourcemanager.resource-tracker.address</name>

        <value>${yarn.resourcemanager.hostname}:8031</value>

    </property>

    <property>

        <description>The address of the RM admin interface.</description>

        <name>yarn.resourcemanager.admin.address</name>

        <value>${yarn.resourcemanager.hostname}:8033</value>

    </property>

</configuration>

  

3.6 配置masters & slaves

?


1

2

[[email protected] hadoop]$ echo ‘hadoop02‘ >> /usr/local/hadoop/etc/hadoop/masters

[[email protected] hadoop]$ echo ‘hadoop03 hadoop01‘  >> /usr/local/hadoop/etc/hadoop/slaves

  

3.7 启动前准备

3.7.1 准备启动脚本

启动脚本文件全部位于 /usr/local/hadoop/sbin 文件夹下:

(1)修改 start-dfs.sh stop-dfs.sh 文件添加:

?


1

2

3

4

5

6

[[email protected] ~]$ vim /usr/local/hadoop/sbin/start-dfs.sh

[[email protected] ~]$ vim /usr/local/hadoop/sbin/stop-dfs.sh

HDFS_DATANODE_USER=along

HADOOP_SECURE_DN_USER=hdfs

HDFS_NAMENODE_USER=along

HDFS_SECONDARYNAMENODE_USER=along

(2)修改start-yarn.sh 和 stop-yarn.sh文件添加:

?


1

2

3

4

5

[[email protected] ~]$ vim /usr/local/hadoop/sbin/start-yarn.sh

[[email protected] ~]$ vim /usr/local/hadoop/sbin/stop-yarn.sh

YARN_RESOURCEMANAGER_USER=along

HADOOP_SECURE_DN_USER=yarn

YARN_NODEMANAGER_USER=along

  

3.7.2 授权

?


1

2

[[email protected] ~]# chown -R along.along /usr/local/hadoop-3.2.0/

[[email protected] ~]# chown -R along.along /data/hadoop/

  

3.7.3 配置hadoop命令环境变量

?


1

2

3

4

[[email protected] ~]# vim /etc/profile.d/hadoop.sh

[[email protected] ~]# cat /etc/profile.d/hadoop.sh

export HADOOP_HOME=/usr/local/hadoop

PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

  

3.7.4 集群初始化

?


1

2

3

4

5

6

7

8

9

10

11

12

13

[[email protected] ~]# vim /data/hadoop/rsync.sh

#在集群内所有机器上都创建所需要的目录

for i in hadoop02 hadoop03

    do

         sudo rsync -a /data/hadoop $i:/data/

done

#复制hadoop配置到其他机器

for i in hadoop02 hadoop03

    do

         sudo rsync -a  /usr/local/hadoop-3.2.0/etc/hadoop $i:/usr/local/hadoop-3.2.0/etc/

done

[[email protected] ~]# /data/hadoop/rsync.sh

  

3.8 启动hadoop集群

3.8.1 第一次启动前需要格式化,集群所有服务器都需要;

?


1

2

3

4

5

6

7

8

9

10

11

12

13

[[email protected] ~]$ hdfs namenode -format

... ...

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at hadoop01/192.168.10.101

************************************************************/

[[email protected] ~]$ hdfs namenode -format

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at hadoop02/192.168.10.102

************************************************************/

[[email protected] ~]$ hdfs namenode -format

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at hadoop03/192.168.10.103

************************************************************/

  

3.8.2 启动并验证集群

(1)启动namenode、datanode

?


1

2

3

4

5

6

7

8

9

10

11

12

13

14

[[email protected] ~]$ start-dfs.sh

[[email protected] ~]$ start-dfs.sh

[[email protected] ~]$ start-dfs.sh

[[email protected] ~]$ jps

4480 DataNode

4727 Jps

4367 NameNode

[[email protected] ~]$ jps

4082 Jps

3958 SecondaryNameNode

3789 DataNode

[[email protected] ~]$ jps

2689 Jps

2475 DataNode

(2)启动YARN

?


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

[[email protected] ~]$ start-yarn.sh

[[email protected] ~]$ start-yarn.sh

[[email protected] ~]$ start-yarn.sh

[[email protected] ~]$ jps

4480 DataNode

4950 NodeManager

5447 NameNode

5561 Jps

4842 ResourceManager

[[email protected] ~]$ jps

3958 SecondaryNameNode

4503 Jps

3789 DataNode

4367 NodeManager

[[email protected] ~]$ jps

12353 Jps

12226 NodeManager

2475 DataNode

  

3.9 集群启动成功

(1)网页访问:http://hadoop01:8088

该页面为ResourceManager 管理界面,在上面可以看到集群中的三台Active Nodes。

(2)网页访问:http://hadoop01:50070/dfshealth.html#tab-datanode

该页面为NameNode管理页面

到此hadoop集群已经搭建完毕!!!

4、安装配置Hbase

4.1 安装Hbase

?


1

2

3

4

[[email protected] ~]# wget https://mirrors.tuna.tsinghua.edu.cn/apache/hbase/1.4.9/hbase-1.4.9-bin.tar.gz

[[email protected] ~]# tar -xvf hbase-1.4.9-bin.tar.gz -C /usr/local/

[[email protected] ~]# chown -R along.along /usr/local/hbase-1.4.9/

[[email protected] ~]# ln -s /usr/local/hbase-1.4.9/ /usr/local/hbase

注:当前时间2018.03.08,hbase-2.1版本有问题;也可能是我配置的问题,hbase会启动失败;所以,我降级到了hbase-1.4.9版本。

4.2 配置Hbase

4.2.1 hbase-env.sh 配置hbase环境变量

?


1

2

3

4

[[email protected] ~]# cd /usr/local/hbase/conf/

[[email protected] conf]# vim hbase-env.sh

export JAVA_HOME=/usr/local/jdk

export HBASE_CLASSPATH=/usr/local/hbase/conf

  

4.2.2 hbase-site.xml 配置hbase

?


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

[[email protected] conf]# vim hbase-site.xml

<configuration>

<property>

    <name>hbase.rootdir</name>

    <!-- hbase存放数据目录 -->

    <value>hdfs://hadoop01:9000/hbase/hbase_db</value>

    <!-- 端口要和Hadoop的fs.defaultFS端口一致-->

</property>

<property>

    <name>hbase.cluster.distributed</name>

    <!-- 是否分布式部署 -->

    <value>true</value>

</property>

<property>

    <name>hbase.zookeeper.quorum</name>

    <!-- zookooper 服务启动的节点,只能为奇数个 -->

    <value>hadoop01,hadoop02,hadoop03</value>

</property>

<property>

    <!--zookooper配置、日志等的存储位置,必须为以存在 -->

    <name>hbase.zookeeper.property.dataDir</name>

    <value>/data/hbase/zookeeper</value>

</property>

<property>

    <!--hbase master -->

    <name>hbase.master</name>

    <value>hadoop01</value>

</property>

<property>

    <!--hbase web 端口 -->

    <name>hbase.master.info.port</name>

    <value>16666</value>

</property>

</configuration>

注:zookeeper有这样一个特性:

  • 集群中只要有过半的机器是正常工作的,那么整个集群对外就是可用的。
  • 也就是说如果有2个zookeeper,那么只要有1个死了zookeeper就不能用了,因为1没有过半,所以2个zookeeper的死亡容忍度为0;
  • 同理,要是有3个zookeeper,一个死了,还剩下2个正常的,过半了,所以3个zookeeper的容忍度为1;
  • 再多列举几个:2->0 ; 3->1 ; 4->1 ; 5->2 ; 6->2 会发现一个规律,2n和2n-1的容忍度是一样的,都是n-1,所以为了更加高效,何必增加那一个不必要的zookeeper

4.2.3 指定集群节点

?


1

2

3

4

[[email protected] conf]# vim regionservers

hadoop01

hadoop02

hadoop03

  

5、启动Hbase集群

5.1 配置hbase命令环境变量

?


1

2

3

[[email protected] ~]# vim /etc/profile.d/hbase.sh

export HBASE_HOME=/usr/local/hbase

PATH=$HBASE_HOME/bin:$PATH

  

5.2 启动前准备

?


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

[[email protected] ~]# mkdir -p /data/hbase/zookeeper

[[email protected] ~]# vim /data/hbase/rsync.sh

#在集群内所有机器上都创建所需要的目录

for i in hadoop02 hadoop03

    do

         sudo rsync -a /data/hbase $i:/data/

         sudo scp -p /etc/profile.d/hbase.sh $i:/etc/profile.d/

done

#复制hbase配置到其他机器

for i in hadoop02 hadoop03

    do

         sudo rsync -a  /usr/local/hbase-2.1.3 $i:/usr/local/

done

[[email protected] conf]# chown -R along.along /data/hbase

[[email protected] ~]# /data/hbase/rsync.sh

hbase.sh                                                        100%   62     0.1KB/s   00:00   

hbase.sh                                                        100%   62     0.1KB/s   00:00   

  

5.3 启动hbase

注:只需在hadoop01服务器上操作即可。

(1)启动

?


1

2

3

4

5

[[email protected] ~]$ start-hbase.sh

hadoop03: running zookeeper, logging to /usr/local/hbase/logs/hbase-along-zookeeper-hadoop03.out

hadoop01: running zookeeper, logging to /usr/local/hbase/logs/hbase-along-zookeeper-hadoop01.out

hadoop02: running zookeeper, logging to /usr/local/hbase/logs/hbase-along-zookeeper-hadoop02.out

... ...

(2)验证

?


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

---主hbase

[[email protected] ~]$ jps

4480 DataNode

23411 HQuorumPeer       # zookeeper进程

4950 NodeManager

24102 Jps

5447 NameNode

23544 HMaster           # hbase master进程

4842 ResourceManager

23711 HRegionServer

---2个从

[[email protected] ~]$ jps

12948 HRegionServer     # hbase slave进程

3958 SecondaryNameNode

13209 Jps

12794 HQuorumPeer       # zookeeper进程

3789 DataNode

4367 NodeManager

[[email protected] ~]$ jps

12226 NodeManager

19559 Jps

19336 HRegionServer     # hbase slave进程

19178 HQuorumPeer       # zookeeper进程

2475 DataNode

  

5.4 页面查看hbase状态

网页访问http://hadoop01:16666

6、简单操作Hbase

6.1 hbase shell基本操作命令


名称


命令表达式


创建表


create ‘表名称‘,‘列簇名称1‘,‘列簇名称2‘.......


添加记录


put ‘表名称‘, ‘行名称‘,‘列簇名称:‘,‘值‘


查看记录


get ‘表名称‘,‘行名称‘


查看表中的记录总数


count ‘表名称‘


删除记录


delete ‘表名‘,行名称‘,‘列簇名称‘


删除表


①disable ‘表名称‘ ②drop ‘表名称‘


查看所有记录


scan ‘表名称‘


查看某个表某个列中所有数据


scan ‘表名称‘,[‘列簇名称:‘]


更新记录


即重写一遍进行覆盖

6.2 一般操作

(1)启动hbase 客户端

?


1

2

3

4

5

6

7

8

9

10

11

12

[[email protected] ~]$ hbase shell    #需要等待一些时间

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/local/hbase-1.4.9/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/local/hadoop-3.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

HBase Shell

Use "help" to get list of supported commands.

Use "exit" to quit this interactive shell.

Version 1.4.9, rd625b212e46d01cb17db9ac2e9e927fdb201afa1, Wed Dec  5 11:54:10 PST 2018

hbase(main):001:0>

  

(2)查询集群状态

?


1

2

hbase(main):001:0> status

1 active master, 0 backup masters, 3 servers, 0 dead, 0.6667 average load

  

(3)查询hive版本

?


1

2

hbase(main):002:0> version

1.4.9, rd625b212e46d01cb17db9ac2e9e927fdb201afa1, Wed Dec  5 11:54:10 PST 2018

  

6.3 DDL操作

(1)创建一个demo表,包含 id和info 两个列簇

?


1

2

3

4

hbase(main):001:0> create ‘demo‘,‘id‘,‘info‘

0 row(s) in 23.2010 seconds

=> Hbase::Table - demo

  

(2)获得表的描述

?


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

hbase(main):002:0> list

TABLE                                                                                            

demo                                                                                             

1 row(s) in 0.6380 seconds

=> ["demo"]

---获取详细描述

hbase(main):003:0> describe ‘demo‘

Table demo is ENABLED                                                                            

demo                                                                                             

COLUMN FAMILIES DESCRIPTION                                                                      

{NAME => ‘id‘, BLOOMFILTER => ‘ROW‘, VERSIONS => ‘1‘, IN_MEMORY => ‘false‘, KEEP_DELETED_CELLS =>

‘FALSE‘, DATA_BLOCK_ENCODING => ‘NONE‘, TTL => ‘FOREVER‘, COMPRESSION => ‘NONE‘, MIN_VERSIONS => ‘

0‘, BLOCKCACHE => ‘true‘, BLOCKSIZE => ‘65536‘, REPLICATION_SCOPE => ‘0‘}                        

{NAME => ‘info‘, BLOOMFILTER => ‘ROW‘, VERSIONS => ‘1‘, IN_MEMORY => ‘false‘, KEEP_DELETED_CELLS =

> ‘FALSE‘, DATA_BLOCK_ENCODING => ‘NONE‘, TTL => ‘FOREVER‘, COMPRESSION => ‘NONE‘, MIN_VERSIONS =>

 ‘0‘, BLOCKCACHE => ‘true‘, BLOCKSIZE => ‘65536‘, REPLICATION_SCOPE => ‘0‘}                      

2 row(s) in 0.3500 seconds

  

(3)删除一个列簇

注:任何删除操作,都需要先disable表

?


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

hbase(main):004:0> disable ‘demo‘

0 row(s) in 2.5930 seconds

hbase(main):006:0> alter ‘demo‘,{NAME=>‘info‘,METHOD=>‘delete‘}

Updating all regions with the new schema...

1/1 regions updated.

Done.

0 row(s) in 4.3410 seconds

hbase(main):007:0> describe ‘demo‘

Table demo is DISABLED                                                                             

demo                                                                                               

COLUMN FAMILIES DESCRIPTION                                                                        

{NAME => ‘id‘, BLOOMFILTER => ‘ROW‘, VERSIONS => ‘1‘, IN_MEMORY => ‘false‘, KEEP_DELETED_CELLS => ‘F

ALSE‘, DATA_BLOCK_ENCODING => ‘NONE‘, TTL => ‘FOREVER‘, COMPRESSION => ‘NONE‘, MIN_VERSIONS => ‘0‘,

BLOCKCACHE => ‘true‘, BLOCKSIZE => ‘65536‘, REPLICATION_SCOPE => ‘0‘}                              

1 row(s) in 0.1510 seconds

  

(4)删除一个表

要先disable表,再drop

?


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

hbase(main):008:0> list

TABLE                                                                                              

demo                                                                                               

1 row(s) in 0.1010 seconds

=> ["demo"]

hbase(main):009:0> disable ‘demo‘

0 row(s) in 0.0480 seconds

hbase(main):010:0> is_disabled ‘demo‘   #判断表是否disable

true

0 row(s) in 0.0210 seconds

hbase(main):013:0> drop ‘demo‘

0 row(s) in 2.3270 seconds

hbase(main):014:0> list   #已经删除成功

TABLE                                                                                              

0 row(s) in 0.0250 seconds

=> []

hbase(main):015:0> is_enabled ‘demo‘   #查询是否存在demo表

ERROR: Unknown table demo!

  

6.4 DML操作

(1)插入数据

?


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

hbase(main):024:0> create ‘demo‘,‘id‘,‘info‘

0 row(s) in 10.0720 seconds

=> Hbase::Table - demo

hbase(main):025:0> is_enabled ‘demo‘

true

0 row(s) in 0.1930 seconds

hbase(main):030:0> put ‘demo‘,‘example‘,‘id:name‘,‘along‘

0 row(s) in 0.0180 seconds

hbase(main):039:0> put ‘demo‘,‘example‘,‘id:sex‘,‘male‘

0 row(s) in 0.0860 seconds

hbase(main):040:0> put ‘demo‘,‘example‘,‘id:age‘,‘24‘

0 row(s) in 0.0120 seconds

hbase(main):041:0> put ‘demo‘,‘example‘,‘id:company‘,‘taobao‘

0 row(s) in 0.3840 seconds

hbase(main):042:0> put ‘demo‘,‘taobao‘,‘info:addres‘,‘china‘

0 row(s) in 0.1910 seconds

hbase(main):043:0> put ‘demo‘,‘taobao‘,‘info:company‘,‘alibaba‘

0 row(s) in 0.0300 seconds

hbase(main):044:0> put ‘demo‘,‘taobao‘,‘info:boss‘,‘mayun‘

0 row(s) in 0.1260 seconds

  

(2)获取demo表的数据

?


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

hbase(main):045:0> get ‘demo‘,‘example‘

COLUMN                     CELL                                                                    

 id:age                    timestamp=1552030411620, value=24                                       

 id:company                timestamp=1552030467196, value=taobao                                   

 id:name                   timestamp=1552030380723, value=along                                    

 id:sex                    timestamp=1552030392249, value=male                                     

1 row(s) in 0.8850 seconds

hbase(main):046:0> get ‘demo‘,‘taobao‘

COLUMN                     CELL                                                                    

 info:addres               timestamp=1552030496973, value=china                                    

 info:boss                 timestamp=1552030532254, value=mayun                                    

 info:company              timestamp=1552030520028, value=alibaba                                  

1 row(s) in 0.2500 seconds

hbase(main):047:0> get ‘demo‘,‘example‘,‘id‘

COLUMN                     CELL                                                                    

 id:age                    timestamp=1552030411620, value=24                                       

 id:company                timestamp=1552030467196, value=taobao                                   

 id:name                   timestamp=1552030380723, value=along                                    

 id:sex                    timestamp=1552030392249, value=male                                     

1 row(s) in 0.3150 seconds

hbase(main):048:0> get ‘demo‘,‘example‘,‘info‘

COLUMN                     CELL                                                                    

0 row(s) in 0.0200 seconds

hbase(main):049:0> get ‘demo‘,‘taobao‘,‘id‘

COLUMN                     CELL                                                                    

0 row(s) in 0.0410 seconds

hbase(main):053:0> get ‘demo‘,‘taobao‘,‘info‘

COLUMN                     CELL                                                                    

 info:addres               timestamp=1552030496973, value=china                                    

 info:boss                 timestamp=1552030532254, value=mayun                                    

 info:company              timestamp=1552030520028, value=alibaba                                  

1 row(s) in 0.0240 seconds

hbase(main):055:0> get ‘demo‘,‘taobao‘,‘info:boss‘

COLUMN                     CELL                                                                    

 info:boss                 timestamp=1552030532254, value=mayun                                    

1 row(s) in 0.1810 seconds

  

(3)更新一条记录

?


1

2

3

4

5

6

7

hbase(main):056:0> put ‘demo‘,‘example‘,‘id:age‘,‘88‘

0 row(s) in 0.1730 seconds

hbase(main):057:0> get ‘demo‘,‘example‘,‘id:age‘

COLUMN                     CELL                                                                    

 id:age                    timestamp=1552030841823, value=88                                       

1 row(s) in 0.1430 seconds

  

(4)获取时间戳数据

大家应该看到timestamp这个标记

?


1

2

3

4

5

6

7

8

9

hbase(main):059:0> get ‘demo‘,‘example‘,{COLUMN=>‘id:age‘,TIMESTAMP=>1552030841823}

COLUMN                     CELL                                                                    

 id:age                    timestamp=1552030841823, value=88                                       

1 row(s) in 0.0200 seconds

hbase(main):060:0> get ‘demo‘,‘example‘,{COLUMN=>‘id:age‘,TIMESTAMP=>1552030411620}

COLUMN                     CELL                                                                    

 id:age                    timestamp=1552030411620, value=24                                       

1 row(s) in 0.0930 seconds

  

(5)全表显示

?


1

2

3

4

5

6

7

8

9

10

hbase(main):061:0> scan ‘demo‘

ROW                        COLUMN+CELL                                                             

 example                   column=id:age, timestamp=1552030841823, value=88                        

 example                   column=id:company, timestamp=1552030467196, value=taobao                

 example                   column=id:name, timestamp=1552030380723, value=along                    

 example                   column=id:sex, timestamp=1552030392249, value=male                      

 taobao                    column=info:addres, timestamp=1552030496973, value=china                

 taobao                    column=info:boss, timestamp=1552030532254, value=mayun                  

 taobao                    column=info:company, timestamp=1552030520028, value=alibaba             

2 row(s) in 0.3880 seconds

  

(6)删除id为example的‘id:age‘字段

?


1

2

3

4

5

6

7

8

hbase(main):062:0> delete ‘demo‘,‘example‘,‘id:age‘

0 row(s) in 1.1360 seconds

hbase(main):063:0> get ‘demo‘,‘example‘

COLUMN                     CELL                                                                                                          

 id:company                timestamp=1552030467196, value=taobao                                   

 id:name                   timestamp=1552030380723, value=along                                    

 id:sex                    timestamp=1552030392249, value=male

  

(7)删除整行

?


1

2

3

4

5

6

hbase(main):070:0> deleteall ‘demo‘,‘taobao‘

0 row(s) in 1.8140 seconds

hbase(main):071:0> get ‘demo‘,‘taobao‘

COLUMN                     CELL                                                                    

0 row(s) in 0.2200 seconds

  

(8)给example这个id增加‘id:age‘字段,并使用counter实现递增

?


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

hbase(main):072:0> incr ‘demo‘,‘example‘,‘id:age‘

COUNTER VALUE = 1

0 row(s) in 3.2200 seconds

hbase(main):073:0> get ‘demo‘,‘example‘,‘id:age‘

COLUMN                     CELL                                                                    

 id:age                    timestamp=1552031388997, value=\x00\x00\x00\x00\x00\x00\x00\x01         

1 row(s) in 0.0280 seconds

hbase(main):074:0> incr ‘demo‘,‘example‘,‘id:age‘

COUNTER VALUE = 2

0 row(s) in 0.0340 seconds

hbase(main):075:0> incr ‘demo‘,‘example‘,‘id:age‘

COUNTER VALUE = 3

0 row(s) in 0.0420 seconds

hbase(main):076:0> get ‘demo‘,‘example‘,‘id:age‘

COLUMN                     CELL                                                                    

 id:age                    timestamp=1552031429912, value=\x00\x00\x00\x00\x00\x00\x00\x03         

1 row(s) in 0.0690 seconds

hbase(main):077:0> get_counter ‘demo‘,‘example‘,‘id:age‘   #获取当前count值

COUNTER VALUE = 3

  

(9)清空整个表

?


1

2

3

4

5

hbase(main):078:0> truncate ‘demo‘

Truncating ‘demo‘ table (it may take a while):

 - Disabling table...

 - Truncating table...

0 row(s) in 33.0820 seconds

可以看出hbase是先disable掉该表,然后drop,最后重新create该表来实现清空该表。

原文地址:https://www.cnblogs.com/dengbingbing/p/12200281.html

时间: 2024-10-17 17:48:43

Hadoop+Hbase分布式集群架构“完全篇的相关文章

[精华]Hadoop,HBase分布式集群和solr环境搭建

1. 机器准备 1.1 准备了2台机器,安装win7系统(64位) 两台windows物理主机: 192.168.131.44 administrator/(密码是一个空格) 192.168.131.67 administrator/(密码是一个空格) 1.2. 每台机器上安装VMwareWorkstation8.0.3 VMwareWorkstation8.0.3序列号:JG2TD-DJL95-FZZZ8-DU1NH-33GH7 1.3. 创建虚拟机CentOS-6.5-x86_64-bin-

Hadoop全分布式集群环境配置

Hadoop是一个由Apache基金会所开发的分布式系统基础架构. 用户可以在不了解分布式底层细节的情况下,开发分布式程序.充分利用集群的威力进行高速运算和存储. Hadoop实现了一个分布式系统(Hadoop Distributed File System),简称HDFS.HDFS有高容错的特点,并且设计用来部署在低廉的(low-cost)硬件上:而且它提供高吞吐量(high throughput)来访问应用程序的数据,适合那些有着超大数据集(large data set)的应用程序.HDFS

[推荐]Hadoop+HBase+Zookeeper集群的配置

Hadoop+HBase+Zookeeper集群的配置  http://wenku.baidu.com/view/991258e881c758f5f61f67cc.html?re=view HDFS+MapReduce+Hive+HBase十分钟快速入门   http://wenku.baidu.com/view/7db2fcd276eeaeaad1f33055.html Hadoop+Zookeeper+HBase部署指南  http://wenku.baidu.com/view/02e4ad

RabbitMQ分布式集群架构

RabbitMQ分布式集群架构和高可用性(HA) https://blog.csdn.net/woogeyu/article/details/51119101 (一) 功能和原理 设计集群的目的 允许消费者和生产者在RabbitMQ节点崩溃的情况下继续运行 通过增加更多的节点来扩展消息通信的吞吐量 1 集群配置方式 RabbitMQ可以通过三种方法来部署分布式集群系统,分别是:cluster,federation,shovel cluster: 不支持跨网段,用于同一个网段内的局域网 可以随意的

Hadoop伪分布式集群搭建总结

Hadoop伪分布式集群搭建总结 一.所需软件VMware15!CentOS6.5JDK1.8Hadoop2.7.3二.安装注意:对文件进行编辑:输入a,表示对该文件进行编辑,最后保存该文件,操作为:点击键盘上的Esc按钮,然后输入英文的:字符,再输入wq,点击回车,完成文件的保存.1.关闭防火墙和禁用SELINUX(1).永久关闭防火墙,重启Linux系统(2) .禁用SELINUX:修改文件参数 重启Linux使其生效(3).检查防火墙是否运行,显示下图即为关闭2.配置hostname与IP

Hadoop完全分布式集群搭建

Hadoop的运行模式 Hadoop一般有三种运行模式,分别是: 单机模式(Standalone Mode),默认情况下,Hadoop即处于该模式,使用本地文件系统,而不是分布式文件系统.,用于开发和调试. 伪分布式模式(Pseudo Distrubuted Mode),使用的是分布式文件系统,守护进程运行在本机机器,模拟一个小规模的集群,在一台主机模拟多主机,适合模拟集群学习. 完全分布式集群模式(Full Distributed Mode),Hadoop的守护进程运行在由多台主机搭建的集群上

安装hbase分布式集群出现的报错- ERROR:org.apache.hadoop.hbase.PleaseHoldException: Master is initializing

可能的原因如下: 1. 时间没有同步 HBase需要结点间的时间必须是同步的,可以使用date命令在Linux查看时间(同步时间命令:ntpdate 1.cn.pool.ntp.org) 2. 底层采用的不是hdfs协议 这个可以通过查看hbase-site.xml中参数hbase.rootdir的值来群定,一些其它的协议比如file协议等等, HBase也 是支持的 3.  ZooKeeper 1) 查看zookeeper的状态是否正常,可以使用 zkServer.sh status 查看状态

Hadoop伪分布式集群搭建-此文章在个人51.cto转载

一.HDFS伪分布式环境搭建 Hadoop分布式文件系统(HDFS)被设计成适合运行在通用硬件(commodity hardware)上的分布式文件系统.它和现有的分布式文件系统有很多共同点.但同时,它和其他的分布式文件系统的区别也是很明显的.HDFS是一个高度容错性的系统,适合部署在廉价的机器上.HDFS能提供高吞吐量的数据访问,非常适合大规模数据集上的应用.HDFS放宽了一部分POSIX约束,来实现流式读取文件系统数据的目的.HDFS在最开始是作为Apache Nutch搜索引擎项目的基础架

hadoop伪分布式集群搭建与安装(ubuntu系统)

1:Vmware虚拟软件里面安装好Ubuntu操作系统之后使用ifconfig命令查看一下ip; 2:使用Xsheel软件远程链接自己的虚拟机,方便操作.输入自己ubuntu操作系统的账号密码之后就链接成功了: 3:修改主机的名称vi /etc/hostname和域名和主机映射对应的关系 vi /etc/hosts,改过之后即生效,自己可以ping一下,我这里ip对应master,比如ping master之后发现可以ping通即可: 4:修改过主机名称和主机名与ip对应的关系之后:开始上传jd