基于sparksql调用shell脚本运行SQL

[Author]: kwu

基于sparksql调用shell脚本运行SQL，sparksql提供了类似hive中的 -e , -f ,-i的选项

1、定时调用脚本

#!/bin/sh
# upload logs to hdfs  

yesterday=`date --date=‘1 days ago‘ +%Y%m%d`  

/opt/modules/spark/bin/spark-sql -i /opt/bin/spark_opt/init.sql --master spark://10.130.2.20:7077 --executor-memory 6g --total-executor-cores 45 --conf spark.ui.port=4075   -e "insert overwrite table st.stock_realtime_analysis PARTITION (DTYPE=‘01‘ )
  select t1.stockId as stockId,
         t1.url as url,
         t1.clickcnt as clickcnt,
         0,
         round((t1.clickcnt / (case when t2.clickcntyesday is null then   0 else t2.clickcntyesday end) - 1) * 100, 2) as LPcnt,
         ‘01‘ as type,
         t1.analysis_date as analysis_date,
         t1.analysis_time as analysis_time
    from (select stock_code stockId,
                 concat(‘http://stockdata.stock.hexun.com/‘, stock_code,‘.shtml‘) url,
                 count(1) clickcnt,
                 substr(from_unixtime(unix_timestamp(),‘yyyy-MM-dd HH:mm:ss‘),1,10) analysis_date,
                 substr(from_unixtime(unix_timestamp(),‘yyyy-MM-dd HH:mm:ss‘),12,8) analysis_time
            from dms.tracklog_5min
           where stock_type = ‘STOCK‘
             and day =
                 substr(from_unixtime(unix_timestamp(), ‘yyyyMMdd‘), 1, 8)
           group by stock_code
           order by clickcnt desc limit 20) t1
    left join (select stock_code stockId, count(1) clickcntyesday
                 from dms.tracklog_5min a
                where stock_type = ‘STOCK‘
                  and substr(datetime, 1, 10) = date_sub(from_unixtime(unix_timestamp(),‘yyyy-MM-dd HH:mm:ss‘),1)
                  and substr(datetime, 12, 5) <substr(from_unixtime(unix_timestamp(),‘yyyy-MM-dd HH:mm:ss‘), 12, 5)
                  and day = ‘${yesterday}‘
                group by stock_code) t2
      on t1.stockId = t2.stockId;
  "
sqoop export  --connect jdbc:mysql://10.130.2.245:3306/charts   --username guojinlian  --password Abcd1234  --table stock_realtime_analysis  --fields-terminated-by ‘\001‘ --columns "stockid,url,clickcnt,splycnt,lpcnt,type" --export-dir /dw/st/stock_realtime_analysis/dtype=01;

init.sql内容为载入udf:

add jar /opt/bin/UDF/hive-udf.jar;
create temporary function udtf_stockidxfund as ‘com.hexun.hive.udf.stock.UDTFStockIdxFund‘;
create temporary function udf_getbfhourstime as ‘com.hexun.hive.udf.time.UDFGetBfHoursTime‘;
create temporary function udf_getbfhourstime2 as ‘com.hexun.hive.udf.time.UDFGetBfHoursTime2‘;
create temporary function udf_stockidxfund as ‘com.hexun.hive.udf.stock.UDFStockIdxFund‘;
create temporary function udf_md5 as ‘com.hexun.hive.udf.common.HashMD5UDF‘;
create temporary function udf_murhash as ‘com.hexun.hive.udf.common.HashMurUDF‘;
create temporary function udf_url as ‘com.hexun.hive.udf.url.UDFUrl‘;
create temporary function url_host as ‘com.hexun.hive.udf.url.UDFHost‘;
create temporary function udf_ip as ‘com.hexun.hive.udf.url.UDFIP‘;
create temporary function udf_site as ‘com.hexun.hive.udf.url.UDFSite‘;
create temporary function udf_UrlDecode as ‘com.hexun.hive.udf.url.UDFUrlDecode‘;
create temporary function udtf_url as ‘com.hexun.hive.udf.url.UDTFUrl‘;
create temporary function udf_ua as ‘com.hexun.hive.udf.useragent.UDFUA‘;
create temporary function udf_ssh as ‘com.hexun.hive.udf.useragent.UDFSSH‘;
create temporary function udtf_ua as ‘com.hexun.hive.udf.useragent.UDTFUA‘;
create temporary function udf_kw as ‘com.hexun.hive.udf.url.UDFKW‘;
create temporary function udf_chdecode as ‘com.hexun.hive.udf.url.UDFChDecode‘;

设置ui的port

--conf spark.ui.port=4075

默觉得4040，会与其它正在跑的任务冲突，这里改动为4075

设定任务使用的内存与CPU资源

--executor-memory 6g --total-executor-cores 45

原来的语句是用hive
-e 运行的，改动为spark后速度大加快了。

原来为15min，提升速度后为 45s.

时间： 2024-11-08 19:08:25

基于sparksql调用shell脚本运行SQL的相关文章

crontab 调用shell 脚本不运行

最近用crontab 调用一个shell 脚本,并且shell脚本里面调用一个python脚本其实就是嫌弃python 写一个判断麻烦,就用shell 写了一个if判断为真的情况下运行这个python脚本. 但是部署以后直接sh test.sh 脚本运行没有问题,但是放到crontab中死活就是不运行在此记录下原因原因一:shell脚本中调用了一个命令 ip addr |egrep 这种情况下,ip这个名声是在/sbin下,但是crontab 调用的环境变量目录是 /bin 所

[Shell]crontab 执行任务调用shell脚本，相对路径无法找到

问题出现的场景大概就是 1 cron调用一个python脚本 2 python脚本中调用一个shell脚本(对日志分析)获取shell输出然后发送邮件类似一个监控任务. 直接执行python脚本没有问题,但是写在cron中之后,shell脚本中的相对路径就找不到了,总是提示无法找到某些文件后来解决的问题就是使用log文件的绝对路径,运行就正常了. google了一些,发现很多人也遇到过类似的问题,可能和cron的机制有关系. 大部分人的问题都是因为环境变量造成的,因为cron是一个独立进

python学习之--调用shell脚本

python调用Shell脚本,有很多种方法,下面给出了三个python中执行shell命令的方法第一种方案:os.system os.system返回脚本的退出状态码现有一个shell脚本1.sh <span style="font-size:14px;"><span style="font-size:18px;"><span style="font-size:12px;">#!/bin/sh echo

Java 调用 shell 脚本详解

这一年的项目中,有大量的场景需要Java 进程调用 Linux的bash shell 脚本实现相关功能. 从之前的项目中拷贝的相关模块和网上的例子来看,有个别的"陷阱"造成调用shell 脚本在某些特殊的场景下,有一些奇奇怪怪的bug. 大家且听我一一道来. 先看看网上搜索到的例子: [java] view plain copy package someTest; import java.io.BufferedReader; import java.io.IOException; im

[Python]在python中调用shell脚本,并传入参数-02python操作shell实例

首先创建2个shell脚本文件,测试用. test_shell_no_para.sh 运行时,不需要传递参数 test_shell_2_para.sh 运行时,需要传递2个参数 test_shell_no_para.sh 内容如下: test_shell_2_para.sh内容如下注意含有变量的字符串要用双引号括起来直接在命令行运行 test_shell_2_para.sh 执行结果如下: [email protected]348-G4:~$ sh test_shell_2_para

Java代码调用Shell脚本并传入参数实现DB2数据库表导出到文件

本文通过Java代码调用Shell脚本并传入参数实现DB2数据库表导出到文件,代码如下: import java.io.File; import java.io.IOException; import java.io.InputStreamReader; import java.io.LineNumberReader; import java.util.HashMap; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import

windows下建立文件的换行符^M导致linux下的shell脚本运行错误的解决方案

经常在windows下编辑的文件远程传送到linux下的时候每行末尾都会出现^M,这将导致shell脚本运行错误,主要是因为dos下的编辑器和linux下的编辑器对文件末行的回车符处理不一致导致. 主要解决如下: (1)在VI编辑器中将^M删除: 将VI编辑器切换到命令模式下,输入 :%s/^M//g (注意^M 不是shift ^ +M 而是ctrl+v 加上ctrl+m) s///g是shell的替换命令此命令必须是手动打上,不可复制. (2)dos2unix 命令 dos2unix f

java调用shell脚本且传递参数

在最近的工作中,需要用到Java要调用shell脚本的情况.总结如下: @RequestMapping("/changePermission") public String changePermission(){ String returnCode = ""; try { Process process = Runtime.getRuntime().exec("chmod 755 /tmp/upgrade.sh"); process.waitFo

shell脚本运行报错$'\r': command not found

执行个别shell测试脚本运行报错$'\r': command not found 考虑到可能是windows与Linux的换行符不同的原因(windows是\r\n,Linux是\n)造成的,但是又不想一个个替换. 可以使用如下命令来解决: # vi 脚本名命令行模式下输入: :set ff=unix :wq 退出即可 shell脚本运行报错$'\r': command not found 原文地址:https://www.cnblogs.com/abclife/p/12604441.htm