实现hive proxy2-hive操作hadoop时使用用户的地方

hive权限有两层，hive本身的验证和hadoop的验证。自定义hive的proxy功能时，hive层面的相关验证更改在
http://caiguangguang.blog.51cto.com/1652935/1587251

中已经提过，这里说下hive和hadoop以及本地文件系统相关的几个出现用户的地方：
1.job的log文件

session初始化时会初始化日志文件，主要在SessionState的start方法中：

    public static SessionState start(SessionState startSs) {
    setCurrentSessionState(startSs);
    if(startSs.hiveHist == null){
      if (startSs.getConf().getBoolVar(HiveConf.ConfVars.HIVE_SESSION_HISTORY_ENABLED)) { 
      // 如果设置hive.session.history.enabled为true，则会初始化日志文件,默认为false
        startSs.hiveHist = new HiveHistoryImpl(startSs);   // 由HiveHistoryImpl 产生日志文件
      }else {
        //Hive history is disabled, create a no-op proxy
        startSs.hiveHist = HiveHistoryProxyHandler.getNoOpHiveHistoryProxy();
      }
    }
    ...

再来看org.apache.hadoop.hive.ql.history.HiveHistoryImpl类的构造函数，定义了日志的路径，如果日志目录不存在，则创建目录：

public HiveHistoryImpl(SessionState ss) {
  try {
    console = new LogHelper(LOG);
    String conf_file_loc = ss.getConf().getVar(
        HiveConf.ConfVars.HIVEHISTORYFILELOC); 
 //HIVEHISTORYFILELOC("hive.querylog.location", System.getProperty("java.io.tmpdir") + File.separator + System.getProperty("user.name")),
 默认值是/tmp/${user.name}/目录
    if ((conf_file_loc == null) || conf_file_loc.length() == 0) {
      console.printError("No history file location given");
      return;
    }
    // Create directory
    File histDir = new File(conf_file_loc);
    if (!histDir.exists()) { //创建日志目录
      if (!histDir.mkdirs()) {
        console.printError("Unable to create log directory " + conf_file_loc);
        return;
      }
    }
    do {
      histFileName = conf_file_loc + File.separator + "hive_job_log_" + ss.getSessionId() + "_"
        + Math.abs(randGen.nextInt()) + ".txt"; 
// 日志文件的完整路径 比如 /tmp/hdfs/hive_job_log_4f96f470-a6c1-41ae-9d30-def308e5412f_564454280.txt
/tmp/hdfs/hive_job_log_sessionid_随机数.txt
    } while (! new File(histFileName).createNewFile());
    console.printInfo("Hive history file=" + histFileName);
    histStream = new PrintWriter(histFileName);
    HashMap<String, String> hm = new HashMap<String, String>();
    hm.put(Keys.SESSION_ID.name(), ss.getSessionId());
    log(RecordTypes.SessionStart, hm);
  } catch (IOException e) {
    console.printError("FAILED: Failed to open Query Log : " + histFileName
        + " " + e.getMessage(), "\n"
        + org.apache.hadoop.util.StringUtils.stringifyException(e));
  }
}

2.job的中间文件
hive执行过程中保存在hdfs的路径，由hive.exec.scratchdir和hive.exec.local.scratchdir定义
scratch文件是在org.apache.hadoop.hive.ql.Context类的构造方法中获取
关于scratch目录的相关配置：

SCRATCHDIR("hive.exec.scratchdir", "/tmp/hive-" + System.getProperty("user.name")),  
//默认值为/tmp/hive-当前登录用户
LOCALSCRATCHDIR("hive.exec.local.scratchdir", System.getProperty("java.io.tmpdir") + File.separator + System.etProperty("user.name")),
SCRATCHDIRPERMISSION("hive.scratch.dir.permission", "700"),

在org.apache.hadoop.hive.ql.Context类的构造方法中

 // scratch path to use for all non-local (ie. hdfs) file system tmp folders
  private final Path nonLocalScratchPath;
  // scratch directory to use for local file system tmp folders
  private final String localScratchDir ;
  // the permission to scratch directory (local and hdfs )
  private final String scratchDirPermission ;
...
public Context(Configuration conf, String executionId)  {
  this.conf = conf;
  this.executionId = executionId;
  // local & non-local tmp location is configurable. however it is the same across
  // all external file systems
  nonLocalScratchPath =
    new Path(HiveConf.getVar(conf, HiveConf.ConfVars.SCRATCHDIR),
             executionId);
  localScratchDir = new Path(HiveConf.getVar(conf, HiveConf.ConfVars.LOCALSCRATCHDIR),
          executionId).toUri().getPath();
  scratchDirPermission= HiveConf.getVar(conf, HiveConf.ConfVars.SCRATCHDIRPERMISSION);
}

在Driver的compile方法中会初始化这个对象。

3.job提交的用户

JobClient的init方法
  UserGroupInformation clientUgi;
....
  public void init( JobConf conf) throws IOException {
    setConf(conf);
    cluster = new Cluster(conf);
    clientUgi = UserGroupInformation.getCurrentUser();
  }

这里增加proxy比较容易，用UserGroupInformation的createRemoteUser方法即可：
比如把init方法改为：

public void init(JobConf conf) throws IOException {
  setConf(conf);
  cluster = new Cluster(conf);
  if (conf.getBoolean("use.custom.proxy",false))
  {
      String proxyUser = conf.get("custom.proxy.user");
      clientUgi = UserGroupInformation.createRemoteUser(proxyUser);
  }else{
      clientUgi = UserGroupInformation.getCurrentUser();
  }
  LOG.warn("clientUgi is " + clientUgi);
}

时间： 2024-10-05 01:00:22

实现hive proxy2-hive操作hadoop时使用用户的地方的相关文章

Hive集成Mysql作为元数据时，提示错误：Specified key was too long; max key length is 767 bytes

在进行Hive集成Mysql作为元数据过程中,做完所有安装配置工作后,进入到hive模式,执行show databases:执行正常,接着执行show tables:时却报错. 关键错误信息如下: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes) 具体操作信息如下: hive> show databases; OK

Hive最新数据操作详解（超级详细）

数据操作能力是大数据分析至关重要的能力.数据操作主要包括:更改(exchange),移动(moving),排序(sorting),转换(transforming).Hive提供了诸多查询语句,关键字,操作和方法来进行数据操作. 一. 数据更改数据更改主要包括:LOAD, INSERT, IMPORT, and EXPORT 1. LOAD DATA load关键字的作用是将数据移动到hive中.如果是从HDFS加载数据,则加载成功后会删除源数据:如果是从本地加载,则加载成功后不会删除源数

Hive学习之路（六）Hive的DDL操作

库操作 1.创建库语法结构 CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name [COMMENT database_comment] //关于数据块的描述 [LOCATION hdfs_path] //指定数据库在HDFS上的存储位置 [WITH DBPROPERTIES (property_name=property_value, ...)]; //指定数据块属性默认地址:/user/hive/warehouse/db_name.d

Hive学习之路（十八）Hive的Shell操作

一.Hive的命令行 1.Hive支持的一些命令 Command Description quit Use quit or exit to leave the interactive shell. set key=value Use this to set value of particular configuration variable. One thing to note here is that if you misspell the variable name, cli will no

如何监听对 HIVE 元数据的操作?

目录简介 HIVE 基本操作获取 HIVE 源码编译 HIVE 源码启动 HIVE 停止 HIVE 监听对 HIVE 元数据的操作参考文档简介公司有个元数据管理平台,会定期同步 HIVE 中的元数据.但这样做有个问题,就是如果在 HIVE 中插入了一张新表或者新库等 HIVE 元数据变更的操作,元数据管理平台不能及时与 HIVE 表中的数据进行同步.因此需要调研下 HIVE 中有没有类似的监听机制,可以实现 HIVE 中有元数据更改时,能及时发通知给元数据平台.整体的需求图如下所

（三）Hive的DDL操作

Hive基础之Hive表常用操作

本案例使用的数据均来源于Oracle自带的emp和dept表创建表语法: CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name [(col_name data_type [COMMENT col_comment], ...)] [COMMENT table_comment] [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] [CLUSTERED

Hive Shell常用操作

1.本文命令的两种模式: 交互模式,即hive的shell环境:hive > …. 非交互模式:普通的Linux命令模式:%..... 2.Hive Shell常用操作 1) hive -e:从命令行执行指定的HQL,不需要分号: % hive -e 'select * from dummy' > a.txt 2) hive –f: 执行HQL脚本 % hive -f /home/my/hive-script.sql 3) hive -i:在进入交互模式之前,执行初始化sql文件 % hive

HIVE的高级操作

二.视图 1.Hive 的视图和关系型数据库的视图区别和关系型数据库一样,Hive 也提供了视图的功能,不过请注意,Hive 的视图和关系型数据库的数据还是有很大的区别: (1)只有逻辑视图,没有物化视图: (2)视图只能查询,不能 Load/Insert/Update/Delete 数据: (3)视图在创建时候,只是保存了一份元数据,当查询视图的时候,才开始执行视图对应的那些子查询 2.Hive视图的创建语句 create view view_cdt as select * from cd