Hadoop2.3、 Hbase0.98、 Hive0.13架构中Hive的安装部署配置以及数据测试

简介:

Hive是基于Hadoop的一个数据仓库工具,可以将结构化的数据文件映射为一张数据库表,并提供简单的sql查询功能,可以将sql语句转换为MapReduce任务进行运行。 其优点是学习成本低,可以通过类SQL语句快速实现简单的MapReduce统计,不必开发专门的MapReduce应用,十分适合数据仓库的统计分析。

1, 适用场景

Hive 构建在基于静态批处理的Hadoop 之上,Hadoop 通常都有较高的延迟并且在作业提交和调度的时候需要大量的开销。因此,Hive 并不能够在大规模数据集上实现低延迟快速的查询,例如,Hive 在几百MB 的数据集上执行查询一般有分钟级的时间延迟。因此,

Hive 并不适合那些需要低延迟的应用,例如,联机事务处理(OLTP)。Hive 查询操作过程严格遵守Hadoop MapReduce 的作业执行模型,Hive 将用户的HiveQL语句通过解释器转换为MapReduce 作业提交到Hadoop 集群上,Hadoop 监控作业执行过程,然后返回作业执行结果给用户。Hive 并非为联机事务处理而设计,Hive 并不提供实时的查询和基于行级的数据更新操作。Hive 的最佳使用场合是大数据集的批处理作业,例如,网络日志分析。

 

2,下载安装
前期hadoop安装准备,参考:http://blog.itpub.net/26230597/viewspace-1257609/

下载地址

wget http://mirror.bit.edu.cn/apache/hive/hive-0.13.1/apache-hive-0.13.1-bin.tar.gz

解压安装

tar zxvf apache-hive-0.13.1-bin.tar.gz  -C /home/hadoop/src/

PS:Hive只需要在一个节点上安装即可,本例安装在name节点上面的虚拟机上面,与hadoop的name节点复用一台虚拟机器。

3,配置hive环境变量

vim hive-env.sh

export HIVE_HOME=/home/hadoop/src/hive-0.13.1

export PATH=$PATH:$HIVE_HOME/bin

4,配置hadoop以及hbase参数

vim hive-env.sh

# Set HADOOP_HOME to point to a specific hadoop install directory

HADOOP_HOME=/home/hadoop/src/hadoop-2.3.0/

# Hive Configuration Directory can be controlled by:

export HIVE_CONF_DIR=/home/hadoop/src/hive-0.13.1/conf

# Folder containing extra ibraries required for hive compilation/execution can be controlled by:

export HIVE_AUX_JARS_PATH=/home/hadoop/src/hive-0.13.1/lib

5,验证安装:

启动hive命令行模式,出现hive,说明安装成功了

[[email protected] lib]$ hive --service cli

15/01/09 00:20:32 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* instead

Logging initialized using configuration in jar:file:/home/hadoop/src/hive-0.13.1/lib/hive-common-0.13.1.jar!/hive-log4j.properties

创建表,执行create命令,出现OK,说明命令执行成功,也说明hive安装成功。

hive> create table test(key string);

OK

Time taken: 8.749 seconds

hive>

6,验证可用性

启动hive

[[email protected] root]$hive --service metastore &

查看后台hive运行进程

[[email protected] root]$ ps -eaf|grep hive

hadoop    4025  2460  1 22:52 pts/0    00:00:19 /usr/lib/jvm/jdk1.7.0_60/bin/java -Xmx256m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/hadoop/src/hadoop-2.3.0/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/home/hadoop/src/hadoop-2.3.0 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Djava.library.path=/home/hadoop/src/hadoop-2.3.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx512m -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /home/hadoop/src/hive-0.13.1/lib/hive-service-0.13.1.jar org.apache.hadoop.hive.metastore.HiveMetaStore

hadoop    4575  4547  0 23:14 pts/1    00:00:00 grep hive

[[email protected] root]$

6.1在hive下执行命令,创建2个字段的表,字段间隔用’,’隔开:

hive> create table test(key string);

OK

Time taken: 8.749 seconds

hive> create table tim_test(id int,name string) row format delimited fields terminated by ‘,‘;

OK

Time taken: 0.145 seconds

hive>

6.2准备导入到数据库的txt文件,并输入值:

[[email protected] hive-0.13.1]$ more tim_hive_test.txt

123,xinhua

456,dingxilu

789,fanyulu

903,fahuazhengroad

[[email protected] hive-0.13.1]$

6.4 再打开一个xshell端口,进入服务器端启动hive:

[[email protected] root]$ hive --service metastore

Starting Hive Metastore Server

6.5 再打开一个xshell端口,进入hive客户端录入数据:

[[email protected] hive-0.13.1]$ hive

Logging initialized using configuration in jar:file:/home/hadoop/src/hive-0.13.1/lib/hive-common-0.13.1.jar!/hive-log4j.properties

hive> load data local inpath  ‘/home/hadoop/src/hive-0.13.1/tim_hive_test.txt‘   into table tim_test;

Copying data from file:/home/hadoop/src/hive-0.13.1/tim_hive_test.txt

Copying file: file:/home/hadoop/src/hive-0.13.1/tim_hive_test.txt

Loading data to table default.tim_test

[Warning] could not update stats.

OK

Time taken: 7.208 seconds

hive>

6.6 验证录入数据是否成功,看到dfs出来有tim_test

hive> dfs -ls /home/hadoop/hive/warehouse;

Found 2 items

drwxr-xr-x   - hadoop supergroup          0 2015-01-12 01:47 /home/hadoop/hive/warehouse/hive_hbase_mapping_table_1

drwxr-xr-x   - hadoop supergroup          0 2015-01-12 02:11 /home/hadoop/hive/warehouse/tim_test

hive>

7,安装部署中的报错记录:
报错
1:

[[email protected] conf]$ hive --service metastore

Starting Hive Metastore Server

javax.jdo.JDOFatalInternalException: Error creating transactional connection factory

Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "BONECP" plugin to create a ConnectionPool gave an error : The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.

缺少mysql的jar包,copy到hive的lib目录下面,OK。

报错2:

[[email protected] conf]$ hive --service metastore

Starting Hive Metastore Server

javax.jdo.JDOFatalDataStoreException: Unable to open a test connection to the given database. JDBC url = jdbc:mysql://192.168.52.130:3306/hive_remote?createDatabaseIfNotExist=true, username = root. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------

java.sql.SQLException: null,  message from server: "Host ‘192.168.52.128‘ is not allowed to connect to this MySQL server"

将hadoop用户添加到mysql组:

[[email protected] mysql]# gpasswd -a hadoop mysql

Adding user hadoop to group mysql

[[email protected] mysql]#

^C[[email protected] conf]$ telnet 192.168.52.130 3306

Trying 192.168.52.130...

Connected to 192.168.52.130.

Escape character is ‘^]‘.

G


Host ‘192.168.52.128‘ is not allowed to connect to this MySQL serverConnection closed by foreign host.

[[email protected] conf]$

解决办法:修改mysql账号

mysql> update user set user = ‘hadoop‘ where user = ‘root‘ and host=‘%‘;

Query OK, 1 row affected (0.04 sec)

Rows matched: 1  Changed: 1  Warnings: 0

mysql> flush privileges;

Query OK, 0 rows affected (0.09 sec)

mysql>

报错3:

[[email protected] conf]$ hive --service metastore

Starting Hive Metastore Server

javax.jdo.JDOException: Exception thrown calling table.exists() for hive_remote.`SEQUENCE_TABLE`

at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:596)

at org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:732)

at org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:752)

……

NestedThrowablesStackTrace:

com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)

at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

解决,去远程mysql库上修改字符集从utf8mb4修改成utf8

mysql> alter database hive_remote /*!40100 DEFAULT CHARACTER SET utf8 */;

Query OK, 1 row affected (0.03 sec)

mysql>

然后在data01上面配置hive client端

scp -r hive-0.13.1/ data01:/home/hadoop/src/

报错3:

继续启动,查看日志信息:

[[email protected] conf]$ hive --service metastore

Starting Hive Metastore Server

卡在这里不动,去看日志信息

[[email protected] hadoop]$ tail -f hive.log

2015-01-09 03:46:27,692 INFO  [main]: metastore.ObjectStore (ObjectStore.java:setConf(229)) - Initialized ObjectStore

2015-01-09 03:46:27,892 WARN  [main]: metastore.ObjectStore (ObjectStore.java:checkSchema(6295)) - Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 0.13.0

2015-01-09 03:46:30,574 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles(551)) - Added admin role in metastore

2015-01-09 03:46:30,582 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles(560)) - Added public role in metastore

2015-01-09 03:46:31,168 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:addAdminUsers(588)) - No user is added in admin role, since config is empty

2015-01-09 03:46:31,473 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(5178)) - Starting DB backed MetaStore Server

2015-01-09 03:46:31,481 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(5190)) - Started the new metaserver on port [9083]...

2015-01-09 03:46:31,481 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(5192)) - Options.minWorkerThreads = 200

2015-01-09 03:46:31,482 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(5194)) - Options.maxWorkerThreads = 100000

2015-01-09 03:46:31,482 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(5196)) - TCP keepalive = true

在hive-site.xml上添加如下:

<property>

<name>hive.metastore.uris</name>

<value>thrift://192.168.52.128:9083</value>

</property>

报错4:

2015-01-09 04:01:43,053 INFO  [main]: metastore.ObjectStore (ObjectStore.java:setConf(229)) - Initialized ObjectStore

2015-01-09 04:01:43,540 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles(551)) - Added admin role in metastore

2015-01-09 04:01:43,546 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles(560)) - Added public role in metastore

2015-01-09 04:01:43,684 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:addAdminUsers(588)) - No user is added in admin role, since config is empty

2015-01-09 04:01:44,041 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(5178)) - Starting DB backed MetaStore Server

2015-01-09 04:01:44,054 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(5190)) - Started the new metaserver on port [9083]...

2015-01-09 04:01:44,054 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(5192)) - Options.minWorkerThreads = 200

2015-01-09 04:01:44,054 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(5194)) - Options.maxWorkerThreads = 100000

2015-01-09 04:01:44,054 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(5196)) - TCP keepalive = true

2015-01-09 04:24:13,917 INFO  [Thread-3]: metastore.HiveMetaStore (HiveMetaStore.java:run(5073)) - Shutting down hive metastore.

解决:

查了好久,No user is added in admin role, since config is empty没有查到问题所在,碰到此类情况的一起交流下,欢迎留言。

----------------------------------------------------------------------------------------------------------------
<版权所有,文章允许转载,但必须以链接方式注明源地址,否则追究法律责任!>
原博客地址:     http://blog.itpub.net/26230597/viewspace-1400379/
原作者:黄杉 (mchdba)
----------------------------------------------------------------------------------------------------------------

时间: 2024-10-28 16:13:42

Hadoop2.3、 Hbase0.98、 Hive0.13架构中Hive的安装部署配置以及数据测试的相关文章

Kubernetes二进制方式v1.13.2生产环境的安装与配置(HTTPS+RBAC) ?

Kubernetes二进制方式v1.13.2生产环境的安装与配置(HTTPS+RBAC) 一 背景 由于众所周知的原因,在国内无法直接访问Google的服务.二进制包由于其下载方便.灵活定制而深受广大kubernetes使用者喜爱,成为企业部署生产环境比较流行的方式之一,Kubernetes v1.13.2是目前的最新版本.安装部署过程可能比较复杂.繁琐,因此在安装过程中尽可能将操作步骤脚本话.文中涉及到的脚本已经通过本人测试. 二 环境及架构图 2.1 软件环境 OS(最小化安装版): cat

Ubuntu中Nginx的安装与配置

Ubuntu中Nginx的安装与配置 1.Nginx介绍 Nginx是一个非常轻量级的HTTP服务器,Nginx,它的发音为“engine X”, 是一个高性能的HTTP和 反向代理服务器,同时也是一个IMAP/POP3/SMTP 代理服务器. 2.对PHP支持 目前各种web 服务器对PHP的支持一共有三种: (1)通过web 服务器内置的模块来实现,例如Apache的mod_php5,类似的Apache内置的mod_perl 可以对perl支持. (2)通过CGI来实现,这个就好比之前per

OEMCC 13.2 集群版本安装部署

之前测试部署过OEMCC 13.2单机,具体可参考之前随笔: OEMCC 13.2 安装部署 当时环境:两台主机,系统RHEL 6.5,分别部署OMS和OMR: OMS,也就是OEMCC的服务端 IP:192.168.1.88 内存:12G+ 硬盘:100G+ OMR,也就是OEM底层的资料库 IP:192.168.1.89 内存:8G+ 硬盘:100G+ 相当于OMS和OMR都是单机版,然后有些客户对监控系统的要求也很高,这就需要集群来提升高可用性. 对于OMR来说,可以搭建对应版本的RAC来

CentOs中mysql的安装与配置(转)

在linux中安装数据库首选MySQL,Mysql数据库的第一个版本就是发行在Linux系统上,其他选择还可以有postgreSQL,oracle等 在Linux上安装mysql数据库,我们可以去其官网上下载mysql数据库的rpm包,http://dev.mysql.com/downloads/mysql/5.6.html#downloads,大家可以根据自己的操作系统去下载对应的数据库文件 这里我是通过yum来进行mysql数据库的安装的,通过这种方式进行安装,可以将跟mysql相关的一些服

linux中jdk的安装与配置

一.卸载系统已有的JDK 1.查看已安装的jdk rpm -qa|grep jdk 2.卸载jdk rpm -e --nodeps java-1.6.0-openjdk-1.6.0.0-1.66.1.13.0.el6.x86_64 二.安装jdk a. 下载相应版本的jdk包:https://www.oracle.com(这里是我的jdk1.8:链接: https://pan.baidu.com/s/1TpM4vzXNd9hGHI6Rw-84cw密码: iucs) b. 将压缩包放在 /usr/

vs2012中opencv的安装与配置

配置:win10专业版 vs2012专业版的下载参考网站: http://www.nocang.com/visual-studio-professional-2012/ 开发环境配置参考网站: http://blog.csdn.net/dcrmg/article/details/51809614 配置完后,测试代码:(http://blog.csdn.net/panshun888/article/details/53039294) 1 //#include "stdafx.h" 2 /

Ubuntu中Nginx的安装与配置全过程

1. 在终端运行命令:$sudo apt-get install nginx ubuntu安装Nginx之后的文件结构大致为: 所有的配置文件都在/etc/nginx下,并且每个虚拟主机已经安排在了/etc/nginx/sites-available下 启动程序文件在/usr/sbin/nginx 日志放在了/var/log/nginx中,分别是access.log和error.log 并已经在/etc/init.d/下创建了启动脚本nginx 默认的虚拟主机的目录设置在了/usr/share/

Ubuntu14中supervisor的安装及配置

supervisor是一款很好用的进程管理工具,其命令也很简单,其安装过程如下: Ubuntu14: 首先保证本地的Python环境是OK的,并且已经安装supervisor包,如果没有安装可以用easy_install: easy_install supervisor 接下来安装supervisor: apt-get install supervisor 安装好之后,不出问题的话supervisor服务已经启动完成. supervisor管理进程的配置文件,这里就简单举例: [program:

Ubuntu中sendmail的安装、配置

因为项目需要一个邮件服务器功能,用已有的企业邮箱又有各种限制,就来捣鼓了下和这个相关的一些东西.一般是有好几个选择,比如Postfix,sendmail,qmail,第一个我之前用过,但是项目需求只有发邮件,也不知怎的就选择了sendmail,事实证明还是不要作,废话不说,结合自己的一些经验总结一下,希望能让大家少踩坑. 一.安装 必装: sudo apt-get install sendmail sudo apt-get install sendmail-cf sudo apt-get ins