【原创】大叔经验分享（23）hive metastore的几种部署方式

hive及其他组件（比如spark、impala等）都会依赖hive metastore，依赖的配置文件位于hive-site.xml

hive metastore重要配置

hive.metastore.warehouse.dir
hive2及之前版本默认为/user/hive/warehouse/，创建数据库或表时会在该目录下创建对应的目录

javax.jdo.option.ConnectionURL
javax.jdo.option.ConnectionDriverName
javax.jdo.option.ConnectionUserName
javax.jdo.option.ConnectionPassword
默认为derby

hive.metastore.uris
默认为空

hive metastore部署方式

1 全部默认配置

使用的是内置的derby库，hdfs目录为/user/hive/warehouse/

2 只配置javax.jdo.option.*相关配置

使用配置的数据库（比如mysql），hdfs目录为/user/hive/warehouse/

3 只配置hive.metastore.uris

所有元数据操作都通过远程metastore交互（注意此时需要启动一个独立的hive metastore进程），hdfs目录为/user/hive/warehouse/

ps：注意1和2不需要任何的hive进程，但3种方式都需要依赖hdfs

总结

综上，在只需要使用impala或spark而不需要hive的时候，其实只需要一个远程数据库（比如mysql）即可，不需要单独启动hive metastore进程；

如何初始化一个hive元数据库？

$ ls $HIVE_HOME/scripts/metastore/upgrade
derby mssql mysql oracle postgres

在hive目录下有各种数据库各个版本的初始化脚本，比如impala依赖hive1.2，只需要安装hive1.2的元数据库即可，对应的sql文件为：

$HIVE_HOME/scripts/metastore/upgrade/mysql/hive-schema-1.2.0.mysql.sql

其中会依赖

$HIVE_HOME/scripts/metastore/upgrade/mysql/hive-txn-schema-0.13.0.mysql.sql

初始化完成之后直接在/etc/impala/conf/hive-site.xml中配置javax.jdo.option.*指向mysql库即可；

impala的最小安装为：hdfs + mysql（hive元数据库） + impala

spark的最小安装为：hdfs + mysql（hive元数据库） + spark

参考：https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+Administration

原文地址：https://www.cnblogs.com/barneywill/p/10300238.html

时间： 2024-11-05 20:42:37

【原创】大叔经验分享（23）hive metastore的几种部署方式的相关文章

hive命令的三种执行方式

hive命令的3种调用方式方式1:hive –f /root/shell/hive-script.sql(适合多语句) hive-script.sql类似于script一样,直接写查询命令就行不进入交互模式,执行一个hive script 这里可以和静音模式-S联合使用,通过第三方程序调用,第三方程序通过hive的标准输出获取结果集. $HIVE_HOME/bin/hive -S -f /home/my/hive-script.sql (不会显示mapreduct的操作过程) 那么问题来了

【原创】大叔经验分享（100）Atlas导入hive元数据

首先要有HIVE_HOME环境变量, 如果是apache,直接配置为解压目录:如果是CDH,设置如下: # export HIVE_HOME=/opt/cloudera/parcels/CDH-5.16.1-1.cdh5.16.1.p0.3/lib/hive 执行导入 # bin/import-hive.sh...Failed to import Hive Meta Data!!! 报错,查看日志 # more logs/import-hive.log 2020-01-11 14:42:38,9

【原创】大叔经验分享（35）lzo格式支持

建表语句 CREATE EXTERNAL TABLE `my_lzo_table`(`something` string)ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputForma

【原创】大叔经验分享（85）ssh秘钥之创建和使用

一创建秘钥 1 Macbook $ ssh-keygen -t rsaGenerating public/private rsa key pair.Enter file in which to save the key ($HOME/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in $HOME/.

【原创】大叔经验分享（102）lua cjson数字处理成科学计数法

在处理json时,有一个字段是数字并且位数很长,结果被处理为科学计数法 ...,"tradeId":101200111072902276000243,... 经过json.decode之后取到的tradeId是1.012001110729e+23 尝试各种方法之后只能通过字符串替换解决,利用正则表达式 if not (string.find(str, '"tradeId"') == nil) then str=string.gsub(str, '("tra

【原创】大叔经验分享（39）spark cache unpersist级联操作

问题:spark中如果有两个DataFrame(或者DataSet),DataFrameA依赖DataFrameB,并且两个DataFrame都进行了cache,将DataFrameB unpersist之后,DataFrameA的cache也会失效,官方解释如下: When invalidating a cache, we invalid other caches dependent on this cache to ensure cached data is up to date. For

【原创】大叔经验分享（52）ClouderaManager修改配置报错

Cloudera Manager中修改配置可能报错: Incorrect string value: '\xE7\xA8\x8B\xE5\xBA\x8F...' for column 'MESSAGE' at row 1 这是一个mysql的字符集问题,极有可能创建scm数据库时使用默认的latin1编码导致,涉及的表为: CREATE TABLE `REVISIONS` ( `REVISION_ID` bigint(20) NOT NULL, `OPTIMISTIC_LOCK_VERSION`

【原创】大叔经验分享（53）kudu报错unable to find SASL plugin: PLAIN

kudu安装后运行不正常,master中找不到任何tserver,查看tserver日志发现有很多报错: Failed to heartbeat to master:7051: Invalid argument: Failed to ping master at master:7051: Client connection negotiation failed: client connection to master:7051: unable to find SASL plugin: PLAIN

【原创】大叔经验分享（55）hue导出行数限制

/opt/cloudera/parcels/CDH/lib/hue/apps/beeswax/src/beeswax/conf.py # Deprecated DOWNLOAD_CELL_LIMIT = Config( key='download_cell_limit', default=10000000, type=int, help=_t('A limit to the number of cells (rows * columns) that can be downloaded from