doris: shell invoke .sql script for doris and passing values for parameters in sql script.

1. background

in most cases, we want to execute sql script  in doris  routinely. using azkaban, to load data,etc.And we want to pass parameters to the sql script file. we can easily handle such situation in hive.

1.1 hive usage:

using -hiveconf:  or -hivevar:

shell file:

we want to pass 2 parameters into hive sql script: p_partition_dp_partition_to_delete: which pass two parameters into hive.sql file using  -hivevar   {variable_name}={variable_value}

#!/bin/bash
CURRENT_DIR=$(cd `dirname $0`; pwd)
echo "CURRENT_DIR:"${CURRENT_DIR}

APPLICATION_ROOT_DIR=$(cd ${CURRENT_DIR}/..;pwd)
echo "APPLICATION_ROOT_DIR:"${APPLICATION_ROOT_DIR}

source ${APPLICATION_ROOT_DIR}/globle_config.shif [ $# = 0 ]; then
    p_partition_d=$(date -d "0 days" +%Y%m%d)
    p_partition_to_delete=`date -d "-8 days" +%Y%m%d`

fi

if [ $# = 1 ]; then
    p_partition_d=$(date -d "$1" +%Y%m%d)
    p_partition_to_delete=`date -d "$1 -8 days" +%Y%m%d`
fi

echo  "p_partition_d: "${p_partition_d}
echo  "p_partition_to_delete: "${p_partition_to_delete}

$HIVE_HOME_BIN/hive -hivevar p_partition_d="${p_partition_d}"                     -hivevar p_partition_to_delete="${p_partition_to_delete}"                     -f  ${CURRENT_DIR}/abc_incremental.sql

if [ $? != 0 ];then
    exit -1
fi

the conresponding hive sql script shows as follows:

abc_incremental.sql

--handles the inserted data.
-- points_core.tb_acc_rdm_rel is append only, so no update and delete is related!!!!
INSERT OVERWRITE TABLE ods.abc_incremental PARTITION(pt_log_d = ‘${hivevar:p_partition_d}‘)
SELECT
    id,
    last_update_timestamp
FROM staging.staging_abc AS a
WHERE
     pt_log_d = ‘${hivevar:p_partition_d}‘;

---------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
-- delete hive table partition, the delete file job would be done in shell script,since this table is external table.
ALTER TABLE staging.staging_abc DROP IF EXISTS PARTITION(pt_log_d=‘${hivevar:p_partition_to_delete}‘);

---------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------

in the hql file ,we reference the passing parameter using  ${hivevar:variable_name}.

Note: we can also using hiveconf instead of hivevar, but for each parameter, we should use it as the style the parameter passed in. the difference between hivevar and hiveconf is:

  • hivevar: only contains user parameters.
  • hiveconf: contains both the hive system variables and user parameters.

1.2 doris requerirements

common requierment like:

  • to load data into specified partition which the partition paramerter is to be passed in.
  • load data into doris table from external system, such hdfs, etc. In the load statement, the load label paramerter should be pass in to the sql script.

2. solution

2.1 doris common scripts impmentation

I have implemented a common shell script, by calling such shell script, we can pass in parameter values just as the style we used in hive.

the implementation code shows as follows:

globle_config.sh

#!/bin/bash

OP_HOME_BIN=/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/bin
HIVE_HOME_BIN=/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/bin
MYSQL_HOME_BIN=/usr/local/mysql/bin

DORIS_HOST=192.168.1.101
DORIS_PORT=9030
DORIS_USER_NAME=dev_readonly
DORIS_PASSWORD=dev_readonly#

# this function provide functionality to copy the provided file
# input parameter:
# $1: the working directory
# $2: the absolute path of the file to be copyed.
# result: the absolute path the copyed file, the copyed file was located at the working folder.
function func_copy_file() {
    if [ $# != 2 ]; then
        echo "missing parameter, type in like: repaceContent  /opt/  /opt/a.sql"
        exit -1
    fi

    working_dir=$1
    source_file=$2

    # check(s) if the file to be copyed exists.
    if [ ! -f $source_file ]; then
        echo "file : " $source_file " to be copied does not exist"
        exit -1
    fi

    # check(s) if the working dir exists.
    if [ ! -d "$working_dir" ]; then
        echo "the working directory : " $source_file " does not exist"
        exit -1
    fi

    # checks if the file already exists, $result holds the copied file name(absolute path)
    result=${working_dir}/$(generate_datatime_random)
    while [ -f $result ]; do
        result=${working_dir}/$(generate_datatime_random)
    done

    # copy file
    cp ${source_file} ${result}

    echo ${result}
}

# this function provide functionality to generate a ramdom string based on current system timestamp.
# input parameter:
# N/A
# result: ramdom string based on current system timestamp.
function generate_datatime_random() {
    # date=$(date -d -0days +%Y%m%d)
    # #随机数以时间戳纳秒用于防止目录冲突
    # randnum=$(date +%s%N)
    echo $(date -d -0days +%Y%m%d)_$(date +%s%N)
}

#replace the specifed string to the target string in the provided file
# $1: the absolute path of the file to be replaced.
# $2: the source_string for the replacement.
# $3: the target string for the replacement.
# result: none
function func_repace_content() {
    if [ $# != 3 ]; then
        echo "missing parameter, type in like: repaceContent  /opt/a.sql  @name   ‘lenmom‘"
        exit -1
    fi

    echo "begin replacement"

    file_path=$1
    #be careful of regex expression.
    source_content=$2
    replace_content=$3

    if [ ! -f $file_path ]; then
        echo "file : " $file_path " to be replaced does not exist"
        exit -1
    fi

    echo "repalce all ["${source_content} "] in file: "${file_path} " to [" ${replace_content}"]"
    sed -i "s/${source_content}/${replace_content}/g" $file_path
}

# this function provide(s) functionality to execute doris sql script file
# Input parameters:
# $1: the absolute path of the .sql file to be executed.
# other paramer(s) are optional, of provided, it‘s the parameters paire to pass in the script file before execution.
# result:  0, if execute success; otherwise, -1.
function func_execute_doris_sql_script() {
    echo "imput parameters: "[email protected]

    parameter_number=$#
    if [ $parameter_number -lt 1 ]; then
        echo "missing parameter, must contain the script file to be executed. other parameters are optional,such as"
        echo "func_execute_doris_sql_script /opt/a.sql @name ‘lenmom‘"
        exit -1
    fi

    # copy the file to be executed and wait for parameter replacement.
    working_dir="$(
        cd $(dirname $0)
        pwd
    )"
    file_to_execute=$(func_copy_file "${working_dir}" "$1")
    if [ $? != 0 ]; then
        exit -1
    fi

    if [ $parameter_number -gt 1 ]; then
        for ((i = 2; i <= $parameter_number; i += 2)); do
            case $i in
            2)
                func_repace_content "$file_to_execute" "$2" "$3"
                ;;
            4)
                func_repace_content "$file_to_execute" "$4" "$5"
                ;;
            6)
                func_repace_content "$file_to_execute" "$6" "$7"
                ;;
            8)
                func_repace_content "$file_to_execute" "$8" "$9"
                ;;
            esac
        done
    fi

    if [ $? != 0 ]; then
        exit -1
    fi

    echo "begin to execute script in doris, the content is:"
    cat $file_to_execute
    echo

    MYSQL_HOME="$MYSQL_HOME_BIN/mysql"
    if [ ! -f $MYSQL_HOME ]; then
        # `which is {app_name}` return code is 1, so we should ignore it.
        MYSQL_HOME=$(which is mysql)
        # print mysql location in order to override the globle shell return code to 0 ($?)
        echo "mysql location is: "$MYSQL_HOME
    fi

    $MYSQL_HOME -h $DORIS_HOST -P $DORIS_PORT -u$DORIS_USER_NAME -p$DORIS_PASSWORD <"$file_to_execute"

    if [ $? != 0 ]; then
        rm -f $file_to_execute
        echo execute failed
        exit -1
    else
        rm -f $file_to_execute
        echo execute success
        exit 0
    fi
}

# this function provide(s) functionality to load data into doris by execute the specified load sql script file.
# Input parameters:
# $1: the absolute path of the .sql file to be executed.ll
# $2: the label holder to be replaced.
# result:  0, if execute success; otherwise, -1.
function doris_load_data() {
    if [ $# -lt 2 ]; then
        echo "missing parameter, type in like: doris_load_data  /opt/a.sql  label_place_holder"
        exit -1
    fi

    if [ ! -f $1 ]; then
        echo "file : " $1 " to execute does not exist"
        exit -1
    fi

    func_execute_doris_sql_script [email protected] $(generate_datatime_random)
}

2.2 usage

sql script file wich name load_user_label_from_hdfs.sql

LOAD LABEL user_label.fct_usr_label_label_place_holder
( DATA INFILE("hdfs://nameservice1/user/hive/warehouse/usr_label.db/usr_label/*")
INTO TABLE fct_usr_label
COLUMNS TERMINATED BY "\\x01"
FORMAT AS "parquet"
(member_id ,mobile ,corp ,province ,channel_name ,new_usr_type ,gender ,age_type ,last_login_type)
)
WITH BROKER ‘doris-hadoop‘
(
"dfs.nameservices"="nameservice1",
"dfs.ha.namenodes.nameservice1"="namenodexxx,namenodexxx1",
"dfs.namenode.rpc-address.nameservice1.namenodexxx"="hadoop-datanode06:8020",
"dfs.namenode.rpc-address.nameservice1.namenodexxx1"="hadoop-namenode01:8020",
"dfs.client.failover.proxy.provider"="org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
)
PROPERTIES ( "timeout"="3600", "max_filter_ratio"="0");

in this file, the load label has a place holder named  label_place_holder of which the  value should be passed in  by the invoking shell file.

shell file:
user_label_load.sh

#!/bin/bash
CURRENT_DIR=$(cd `dirname $0`; pwd)
echo "CURRENT_DIR:"${CURRENT_DIR}

APPLICATION_ROOT_DIR=$(cd ${CURRENT_DIR}/..;pwd)
echo "APPLICATION_ROOT_DIR:"${APPLICATION_ROOT_DIR}

source ${APPLICATION_ROOT_DIR}/globle_config.sh
#load doris data by calling common shell function
doris_load_data  $CURRENT_DIR/load_user_label_from_hdfs.sql  "label_place_holder" 

or we can also call using function like:

func_execute_doris_sql_script   "label_place_holder"  $(generate_datatime_random)

if you have mutiple parameter to pass in , just use

func_execute_doris_sql_script  {full_path_of_sql_file} "{parameter0_name}"    "{parameter0_value}"                                                        "{parameter1_name}"   "{parameter1_value}"                                                         "{parameter2_name}"   "{parameter2_value}"                                                         ......

2.3 execute

in the shell terminal, just execute the shell file would be fine.

sh user_label_load.sh 

the shell file include the passed in parameters for invoking the sql script in doris.

query the load result in doris:

原文地址:https://www.cnblogs.com/lenmom/p/11793751.html

时间: 2024-10-12 23:19:44

doris: shell invoke .sql script for doris and passing values for parameters in sql script.的相关文章

Bash+R: howto pass parameters from bash script to R(转)

From original post @ http://analyticsblog.mecglobal.it/analytics-tools/bashr/ In the world of data analysis, the term automation runs hand in hand with the term “scripting”. There’s not the best programming language, only the most suitable to perform

SQL Server on Red Hat Enterprise Linux——RHEL上的SQL Server(全截图)

本文从零开始一步一步介绍如何在Red Hat Enterprise Linux上搭建SQL Server 2017,包括安装系统.安装SQL等相关步骤和方法(仅供测试学习之用,基础篇). 一.   创建RHEL系统(Create Red Hat Enterprise Linux System) 1.      前提准备 由于本文主要研究SQL Server 2017在Linux上的搭建方法,从Install SQL Server on Linux中得知当前SQL Server 2017 CTP

【SQL Server数据迁移】64位的机器:SQL Server中查询ORACLE的数据

从SQL Server中查询ORACLE中的数据,可以在SQL Server中创建到ORACLE的链接服务器来实现的,但是根据32位 .64位的机器和软件,需要用不同的驱动程序来实现. 在64位的机器上,通过访问接口:OracleProvide for OLE DB,来实现. 1.机器环境和软件环境 操作系统是:windows 7旗舰版 64位,SQL Server 20008R2  64  位,Oracle 11g 11.2.0.1.0   64 位. 2.ORACLE环境的设置 连接orac

sql索引碎片产生的原理 解决碎片的办法(sql碎片整理)

本文讲述了SQL SERVER中碎片产生的原理,内部碎片和外部碎片的概念.以及解决碎片的办法和填充因子.在数据库中,往往每一个对于某一方面性能增加的功能也会伴随着另一方面性能的减弱.系统的学习数据库知识,从而根据具体情况进行权衡,是dba和开发人员的必修课 本文需要你对索引和SQL中数据的存储方式有一定了解 在SQL Server中,存储数据的最小单位是页,每一页所能容纳的数据为8060字节.而页的组织方式是通过B树结构(表上没有聚集索引则为堆结构,不在本文讨论之列)如下图: 在聚集索引B树中,

“java.sql.SQLException: Value &#39;0000-00-00&#39; can not be represented as java.sql.Timestamp”

最近在项目中使用hibernate查询时,总报错“java.sql.SQLException: Value '0000-00-00' can not be represented as java.sql.Timestamp”,但是直接在Navicat Premium中运行sql可以正常得到值,无任何错误,后来仔细分析下日志信息,发现返回的数据中含有字段值为0000-00-00 00:00:00,而在hibernate将结果映射成实体时,会认为此种格式不是正确的java.sql.Timestamp

Configure Always On Availability Group for SQL Server on RHEL——Red Hat Enterprise Linux上配置SQL Server Always On Availability Group

下面简单介绍一下如何在Red Hat Enterprise Linux上一步一步创建一个SQL Server AG(Always On Availability Group),以及配置过程中遇到的坑的填充方法. 之前发表过一篇类似的文章是Configure Always On Availability Group for SQL Server on Ubuntu——Ubuntu上配置SQL Server Always On Availability Group,有对Ubuntu感兴趣的请看那一篇

转 一篇关于sql server 三种恢复模式的文章,从sql server 的机制上来写的,感觉很不错,转了

简介 SQL Server中的事务日志无疑是SQL Server中最重要的部分之一.因为SQL SERVER利用事务日志来确保持久性(Durability)和事务回滚(Rollback).从而还部分确保了事务的ACID属性.在SQL Server崩溃时,DBA还可以通过事务日志将数据恢复到指定的时间点.当SQL Server运转良好时,多了解一些事务日志的原理和概念显得并不是那么重要.但是,一旦SQL SERVER发生崩溃时,了解事务日志的原理和概念对于快速做出正确的决策来恢复数据显得尤为重要.

SQL Server 环形缓冲区(Ring Buffer) -- RING_BUFFER_SCHEDULER_MONITOR 获取SQL

SQL Server 环形缓冲区(Ring Buffer) -- RING_BUFFER_SCHEDULER_MONITOR 获取SQL进程的CPU利用率 环形缓冲区存储了有关CPU利用率的信息.这些信息每分钟更新一次.所以你可以跟踪到4小时15分钟内给定时间点的CPU利用率.下面的输出显示了SQL实例的CPU利用率和其他活动进程的CPU利用率.这将帮助我们分析是否SQL Server进程占用大量CPU. 对于SQL Server 2005: declare @ts_now bigint sel

java.sql.SQLException:value;&#39;0000-00-00&#39;can not be represented as java.sql.date

问题描述: 数据表中有记录的time字段(属性为timestamp)其值为:"0000-00-00 00:00:00" 程序使用select 语句从中取数据时出现以下异常: java.sql.SQLException:Value '0000-00-00' can not be represented as java.sql.Date 后查资料发现 "0000-00-00 00:00:00"在mysql中是作为一个特殊值存在的 但 java.sql.Date 将其视为