前言
hive 0.13开始增加了permanent function;允许用户自定义的function无需往.hiverc文件中添加create temporary function,提高hive的启动时间(无需预先执行创建临时函数命令);并且可以将udf jar包放置于hdfs上,方便管理,无需向hive client端推送udf;但是permanent function有一个问题,就是,需要在function名前添加database名称,即[database].[function];如果不填写database,这会取当前database自动补全function。
参照传统的关系型数据库,一般存在默认的schema,在搜索function时,优先搜索默认的schema;如iopostgresql的pg_catalog等。因此想着为Hive添加default database这个特性。
设计
添加两个参数
hive.function.default.function.enabled 默认为:false,表示禁用此特性;设置为true,表示启动该特性;搜索函数时,优先查找默认database。
hive.function.default.function 默认为: default;当hive.function.default.function.enabled=true时生效;默认函数搜索路径。
实现
需要添加这个功能,需要了解permanent function在什么时候会用当前database,补全function。
FunctionRegistry.java
private static FunctionInfo getFunctionInfoFromMetastore(String functionName) { FunctionInfo ret = null; try { String dbName; String fName; if (FunctionUtils.isQualifiedFunctionName(functionName)) { String[] parts = FunctionUtils.splitQualifiedFunctionName(functionName); dbName = parts[0]; fName = parts[1]; } else { // otherwise, qualify using current db dbName = SessionState.get().getCurrentDatabase().toLowerCase(); fName = functionName; } // Try looking up function in the metastore HiveConf conf = SessionState.get().getConf(); Function func = Hive.get(conf).getFunction(dbName, fName); if (func != null) { // Found UDF in metastore - now add it to the function registry // At this point we should add any relevant jars that would be needed for the UDf. try { FunctionTask.addFunctionResources(func.getResourceUris()); } catch (Exception e) { LOG.error("Unable to load resources for " + dbName + "." + fName + ":" + e.getMessage(), e); return null; } Class<?> udfClass = Class.forName(func.getClassName(), true, Utilities.getSessionSpecifiedClassLoader()); if (registerTemporaryFunction(functionName, udfClass)) { ret = mFunctions.get(functionName); } else { LOG.error(func.getClassName() + " is not a valid UDF class and was not registered."); } } } catch (HiveException e) { if (!((e.getCause() != null) && (e.getCause() instanceof MetaException)) && (e.getCause().getCause() != null) && (e.getCause().getCause() instanceof NoSuchObjectException)) { LOG.info("Unable to lookup UDF in metastore: " + e); } } catch (ClassNotFoundException e) { // Lookup of UDf class failed LOG.error("Unable to load UDF class: " + e); } return ret; }
public static String getNormalizedFunctionName(String fn) { // Does the same thing as getFunctionInfo, except for getting the function info. fn = fn.toLowerCase(); return (FunctionUtils.isQualifiedFunctionName(fn) || mFunctions.get(fn) != null) ? fn : FunctionUtils.qualifyFunctionName( fn, SessionState.get().getCurrentDatabase().toLowerCase()); } private static <T extends CommonFunctionInfo> T getFunctionInfo( Map<String, T> mFunctions, String functionName) { functionName = functionName.toLowerCase(); T functionInfo = null; if (FunctionUtils.isQualifiedFunctionName(functionName)) { functionInfo = getQualifiedFunctionInfo(mFunctions, functionName); } else { // First try without qualifiers - would resolve builtin/temp functions. // Otherwise try qualifying with current db name. functionInfo = mFunctions.get(functionName); if (functionInfo == null && !FunctionUtils.isQualifiedFunctionName(functionName)) { String qualifiedName = FunctionUtils.qualifyFunctionName(functionName, SessionState.get().getCurrentDatabase().toLowerCase()); functionInfo = getQualifiedFunctionInfo(mFunctions, qualifiedName); } } return functionInfo; }
FunctionUtils.java
public static String[] getQualifiedFunctionNameParts(String name) throws HiveException { if (isQualifiedFunctionName(name)) { return splitQualifiedFunctionName(name); } String dbName = SessionState.get().getCurrentDatabase(); return new String[] { dbName, name }; }
在这些代码上添加一个判断hive.function.default.function.enabled是否为true,如果为true,则将默认dbName调整为hive.function.default.function。