Spark Sql数仓报-Metastore contains multiple versions

Spark版本为2.1.0,Hadoop版本为2.7.1,元数据存储在mysql中,异常信息如下:

Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:346)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
    at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1412)
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:62)
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72)
    at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2453)
    at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2465)
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:340)
    ... 7 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1410)
    ... 12 more
Caused by: MetaException(message:Metastore contains multiple versions)
    at org.apache.hadoop.hive.metastore.ObjectStore.getMSchemaVersion(ObjectStore.java:6368)
    at org.apache.hadoop.hive.metastore.ObjectStore.getMetaStoreSchemaVersion(ObjectStore.java:6330)
    at org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore.java:6289)
    at org.apache.hadoop.hive.metastore.ObjectStore.verifySchema(ObjectStore.java:6277)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:108)
    at com.sun.proxy.$Proxy9.verifySchema(Unknown Source)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:476)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:523)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:397)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.<init>(HiveMetaStore.java:356)
    at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:54)
    at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:59)
    at org.apache.hadoop.hive.metastore.HiveMetaStore.newHMSHandler(HiveMetaStore.java:4944)
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:171)
    ... 17 more

报错提示,hive metastore出现了多个版本,去hive的元数据库查看表VERSION,发现多了一条记录

select * from VERSION;

1    1.1.0    Set by MetaStore [email protected]10.252.97.244
2    1.1.0    Set by MetaStore [email protected]10.252.97.244 #这条是多出的

解决过程

查资料

google了一些资料,网上也有人提过,如HIVE-9543,网上大家说的解决方法有如下

设置datanucleus.autoCreateSchema=false
此配置官网介绍:
    Default Value: true
    Added In: Hive 0.7.0
    Removed In: Hive 2.0.0 with HIVE-6113, replaced by datanucleus.schema.autoCreateAll
    Creates necessary schema on a startup if one does not exist. Set this to false, after creating it once.
#意思就是这个参数再hive元数据初始化的时候用到,之后就可以设置为false禁用

设置此参数为false后,继续观察,错误还是会再次出现

查看日志及报错

#查看hive运行日志发现多版本出现的时候,有如下日志
Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version
#意思是说在metastore中未找到版本信息,hive.metastore.schema.verification未禁用,因此记录下版本信息,也就是往版本表中插入一条记录
#接着结合之前的报错,看看报错类如下:
Caused by: MetaException(message:Metastore contains multiple versions)
    at org.apache.hadoop.hive.metastore.ObjectStore.getMSchemaVersion(ObjectStore.java:6368)

异常是因为在启动hive命令时会检查hive源数据中有一张VERSION表,如果元数据版本信息获取不到(原因可能是元数据库异常||网络异常||短期内作业量较多操作都会造成查询不到版本信息),这种情况下会判断hive.metastore.schema.verification属性是true还是false,为true时直接抛出MetaException,为false时打出warn警告然后插入一条version数据(这种情况下会造成多条version记录后面的作业会受影响),下面为hive-metastore包中ObjectStore类代码。

查看源代码

通过查看源代码发现,相关代码如下

##同步方法checkSchema()
private synchronized void checkSchema() throws MetaException {
    // recheck if it got verified by another thread while we were waiting
    if (isSchemaVerified.get()) {
      return;
    }

    //获取hive配置,也就是hive.metastore.schema.verification的值
    boolean strictValidation =
      HiveConf.getBoolVar(getConf(), HiveConf.ConfVars.METASTORE_SCHEMA_VERIFICATION);
    // read the schema version stored in metastore db
    //读取metastore的版本信息
    String schemaVer = getMetaStoreSchemaVersion();
    if (schemaVer == null) {
      //如果版本信息未找到,这个时候strictValidation为true的时候直接抛出异常
      if (strictValidation) {
        throw new MetaException("Version information not found in metastore. ");
      } else {
        //否则,调用方法插入版本信息,也就是之前日志所说的
        LOG.warn("Version information not found in metastore. "
            + HiveConf.ConfVars.METASTORE_SCHEMA_VERIFICATION.toString() +
            " is not enabled so recording the schema version " +
            MetaStoreSchemaInfo.getHiveSchemaVersion());
        setMetaStoreSchemaVersion(MetaStoreSchemaInfo.getHiveSchemaVersion(),
          "Set by MetaStore " + USER + "@" + HOSTNAME);
      }
    }

##setMetaStoreSchemaVersion方法如下
public void setMetaStoreSchemaVersion(String schemaVersion, String comment) throws MetaException {
    MVersionTable mSchemaVer;
    boolean commited = false;
    //此参数控制了记录version信息
    boolean recordVersion =
      HiveConf.getBoolVar(getConf(), HiveConf.ConfVars.METASTORE_SCHEMA_VERIFICATION_RECORD_VERSION);
    //参数为false,则返回,不记录版本信息,否则将插入版本信息
    if (!recordVersion) {
      LOG.warn("setMetaStoreSchemaVersion called but recording version is disabled: " +
        "version = " + schemaVersion + ", comment = " + comment);
      return;
    }

    try {
      mSchemaVer = getMSchemaVersion();
    } catch (NoSuchObjectException e) {
      // if the version doesn‘t exist, then create it
      mSchemaVer = new MVersionTable();
    }

##查看HiveConf中METASTORE_SCHEMA_VERIFICATION_RECORD_VERSION可知hive.metastore.schema.verification.record.version默认为true,则允许记录版本信息    

解决方案

通过以上的源码查看,解决方案已经出来了,其实方法有几种,我选取的做法是将hive.metastore.schema.verification.record.version设置为fals 
当然你也可以关闭版本校验

遗留问题

    1. 看网上说的hive多版本问题似乎是并发、网络引起的,源代码中为什么没有获取到metastore的版本schema信息,这是一个问题,还有待源码探究
    2. 几个参数都可以起到在代码流程中阻断记录版本信息的操作,哪种是无风险的,还有待深究

原文地址:https://www.cnblogs.com/itboys/p/10134288.html

时间: 2024-10-25 07:15:00

Spark Sql数仓报-Metastore contains multiple versions的相关文章

启动hive命令报错 “Metastore contains multiple versions”

错误日志:  Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient       at org.apache.hadoop.hive.ql.session.SessionState.start(SessionSt

Hive的Metastore contains multiple versions

hive 客户端报错:Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient 咋一看以为是Mysql连接不上,结果发现Mysql正常 错误日志: 1 Exception in thread "main" jav

Spark SQL 报错总结

报错一: 启动spark-shell后查询hive中的表信息,报错 $SPARK_HOME/bin/spark-shell spark.sql("select * from student.student ").show() Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.ap

spark sql 的metastore 对接 postgresql

本教程记录 spark 1.3.1 版本的thriftserver 的metastore 对接 postgresql postgresql 的编译,参考:http://www.cnblogs.com/chenfool/p/4530925.html 一 启动postgresql 服务 1 首先需要将postgresql 的lib 加载到 LD_LIBRARY_PATH 环境变量中 export LD_LIBRARY_PATH=/opt/sequoiadb/lib:${LD_LIBRARY_PATH

spark sql metastore 配置 mysql

本文主要介绍如何为 spark sql 的 metastore 配置成 mysql . spark 的版本 2.4.0 版本 hive script 版本为 hive 1.2.2 mysql 为 5.7.18 mysql 的安装部署就不在这里介绍了. 首先为 mysql 的root 用户设置密码 mysql -uroot > set password= password('mysql'); 设置mysql 允许其他机器登录 > GRANT ALL PRIVILEGES ON *.* TO 'r

Spark SQL笔记——技术点汇总

目录 · 概述 · 原理 · 组成 · 执行流程 · 性能 · API · 应用程序模板 · 通用读写方法 · RDD转为DataFrame · Parquet文件数据源 · JSON文件数据源 · Hive数据源 · 数据库JDBC数据源 · DataFrame Operation · 性能调优 · 缓存数据 · 参数调优 · 案例 · 数据准备 · 查询部门职工数 · 查询各部门职工工资总数,并排序 · 查询各部门职工考勤信息 概述 1. Spark SQL是Spark的结构化数据处理模块.

Apache Spark 2.2.0 中文文档 - Spark SQL, DataFrames and Datasets Guide | ApacheCN

Spark SQL, DataFrames and Datasets Guide Overview SQL Datasets and DataFrames 开始入门 起始点: SparkSession 创建 DataFrames 无类型的Dataset操作 (aka DataFrame 操作) Running SQL Queries Programmatically 全局临时视图 创建Datasets RDD的互操作性 使用反射推断Schema 以编程的方式指定Schema Aggregatio

第57课:Spark SQL on Hive配置及实战

1,首先需要安装hive,参考http://lqding.blog.51cto.com/9123978/1750967 2,在spark的配置目录下添加配置文件,让Spark可以访问hive的metastore. [email protected]:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf# vi hive-site.xml <configuration> <property>   <name>hive.metast

Spark SQL入门案例之人力资源系统数据处理

通过该案例,给出一个比较完整的.复杂的数据处理案例,同时给出案例的详细解析. 人力资源系统的管理内容组织结构图 1) 人力资源系统的数据库与表的构建. 2) 人力资源系统的数据的加载. 3) 人力资源系统的数据的查询. 职工基本信息 职工姓名,职工id,职工性别,职工年龄,入职年份,职位,所在部门id Michael,1,male,37,2001,developer,2Andy,2,female,33,2003,manager,1Justin,3,female,23,2013,recruitin