【原创】问题定位分享（18）beeline连接spark thrift有时会卡住

spark 2.1.1

beeline连接spark thrift之后，执行use database有时会卡住，而use database 在server端对应的是 setCurrentDatabase，

经过排查发现当时spark thrift正在执行insert操作，

org.apache.spark.sql.hive.execution.InsertIntoHiveTable

  protected override def doExecute(): RDD[InternalRow] = {
    sqlContext.sparkContext.parallelize(sideEffectResult.asInstanceOf[Seq[InternalRow]], 1)
  }
...
  @transient private val externalCatalog = sqlContext.sharedState.externalCatalog

  protected[sql] lazy val sideEffectResult: Seq[InternalRow] = {
  ...
        externalCatalog.loadDynamicPartitions(
          externalCatalog.getPartitionOption(
          externalCatalog.loadPartition(
      externalCatalog.loadTable(

可见insert操作中可能会调用loadDynamicPartitions、getPartitionOption、loadPartition、loadTable等方法，

org.apache.spark.sql.hive.client.HiveClientImpl

  def loadTable(
      loadPath: String, // TODO URI
      tableName: String,
      replace: Boolean,
      holdDDLTime: Boolean): Unit = withHiveState {
...
  def loadPartition(
      loadPath: String,
      dbName: String,
      tableName: String,
      partSpec: java.util.LinkedHashMap[String, String],
      replace: Boolean,
      holdDDLTime: Boolean,
      inheritTableSpecs: Boolean): Unit = withHiveState {
...
  override def setCurrentDatabase(databaseName: String): Unit = withHiveState {

而HiveClientImpl中对应的方法都会执行withHiveState，而withHiveState有synchronized，所以insert操作中的部分代码（比如loadPartition）和use database操作会被同步执行，当insert执行很慢时就会卡住所有的其他操作；

spark thrift中实现原理详见 https://www.cnblogs.com/barneywill/p/10137672.html

原文地址：https://www.cnblogs.com/barneywill/p/10145427.html

时间： 2024-08-19 22:03:23

【原创】问题定位分享（18）beeline连接spark thrift有时会卡住的相关文章

【原创】问题定位分享（16）spark写数据到hive外部表报错ClassCastException: org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat cannot be cast to org.apache.hadoop.hive.ql.io.HiveOutputFormat

spark 2.1.1 spark在写数据到hive外部表(底层数据在hbase中)时会报错 Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat cannot be cast to org.apache.hadoop.hive.ql.io.HiveOutputFormat at org.apache.spark.sql.hive.SparkHiveWrit

【原创】问题定位分享（18）beeline连接spark thrift有时会卡住

【原创】问题定位分享（18）beeline连接spark thrift有时会卡住的相关文章

【原创】问题定位分享（16）spark写数据到hive外部表报错ClassCastException: org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat cannot be cast to org.apache.hadoop.hive.ql.io.HiveOutputFormat

【原创】问题定位分享（19）spark task在executors上分布不均

【原创】经验分享（15）spark sql limit实现原理

Spark Thrift JDBCServer应用场景解析与实战案例

由“Beeline连接HiveServer2后如何使用指定的队列（Yarn）运行Hive SQL语句”引发的一系列思考

SCOM 2012知识分享-18：替代管理包

mac下安装mysql5.7.18，连接出现Access denied for user 'root'@'localhost' (using password: YES)

hiveserver2启动成功但无法通过beeline连接

【原创】大叔问题定位分享（29）datanode启动报错：50020端口被占用