Oozie coordinator 作业自定义的配置的一些方法

Oozie的coordinator有啥用？

The Oozie Coordinator system allows the user to define and execute recurrent and interdependent workflow jobs (data application pipelines).

说白了就是可以把各个 workflow作业组织起来。比如，A作业执行完成之后，会有输出，该输出触发B作业的执行。那么 A B 这两个workflow作业就可以通过一个coordinator作业组织起来。

什么是coordinator作业？

Coordinator Job: A coordinator job
is an executable instance of a coordination definition. A job submission is
done by submitting a job configuration that resolves all parameters in the
application definition.

这说明coordinator作业也是需要配置相应的参数的。与提交workflow作业时配置 workflow.xml类似，coordinator作业也有一个名为coordinator.xml的配置文件。

什么是coordinator action？

Coordinator
Action: A coordinator action is a
workflow job that is started when a set of conditions are met (input dataset
instances are available).

coordinator action本质上是一个workflow 作业！

Coordinator
Application: A coordinator application
defines the conditions under which coordinator actions should be created (the
frequency) and when the actions can be started. The coordinator application
also defines a start and an end time. Normally, coordinator applications are
parameterized. A Coordinator application is written in XML.

coordinator application 负责管理各个coordinator action。有start time 和 end time，负责其中定义的action的起动与终止。

前面一直在纠结这个问题：oozie coordinator 作业如何配置？？？
现在记录如下：
Oozie提供的一个官方的关于定时作业配置文件，内容如下：

<coordinator-app name="cron-coord" frequency="${coord:minutes(10)}" start="${start}" end="${end}" timezone="UTC"
                 xmlns="uri:oozie:coordinator:0.2">
        <action>
        <workflow>
            <app-path>${workflowAppUri}</app-path>
            <configuration>
                <property>
                    <name>jobTracker</name>
                    <value>${jobTracker}</value>
                </property>
                <property>
                    <name>nameNode</name>
                    <value>${nameNode}</value>
                </property>
            </configuration>
        </workflow>
    </action>
</coordinator-app>

从上面可以看出， frequency已经写死了，指定为每十分钟执行一次。其中启动时间和结束时间以变量的形式给出，如start="${start}" end="${end}"
这两个变量可以通过job.properties 文件或者在命令行提交时指定参数即可。此外，还可以通过HTTP POST请求的形式，带着相关的参数进行作业提交(Oozie 提供了WebService API。)

其实，frequency和 start、end 一样，也可以用变量来代替，这样就可以实现我前面帖子里面的问题了---定时提交作业，且只运行一次。
但是，需要注意的是 frequency的格式问题：它只能是 cron expression。否则就会报以下的错误：
Invalid
coordinator application attributes, parameter [frequency] = [10 * ? ?
?] must be an integer or a cron syntax. Parsing error For input string:
"10 * ? ? ?"
关于 cron expression可以参考Quartz，因为Oozie的定时功能是基于它实现的。
此外，我还碰到了一个这样的问题：
Coordinator job with frequency ‘10 * * * *‘ materializes no actions between start and end time.
从Oozie的源代码可以看出，抛出该异常的程序代码如下：
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
if (nextTime == null) {
        throw new IllegalArgumentException("Invalid coordinator cron frequency: " + coordJob.getFrequency());
     }
        if (!nextTime.before(coordJob.getEndTime())) {
            throw new IllegalArgumentException("Coordinator job with frequency ‘" +
                coordJob.getFrequency() + "‘ materializes no actions between start and end time.");
}
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
这是因为我的start time 和 end time设置的不合理，下一次作业的运行时间在结束时间之前了，就会出现下面的错误。
Coordinator job with frequency ‘10 * * * *‘ materializes no actions between start and end time.

另外，相关的具体规则可参考Oozie官网文档

时间： 2024-10-19 13:50:37

Oozie coordinator 作业自定义的配置的一些方法

Oozie coordinator 作业自定义的配置的一些方法的相关文章

Apache Oozie Coordinator 作业自定义配置定时任务

oozie coordinator 定时调度

springCloud（8）：Ribbon实现客户端侧负载均衡-自定义Ribbon配置

手把手教你完成MaxCompute JDBC自定义日志配置

【.net 深呼吸】自定义缓存配置（非Web项目）

微信公众平台新增获取自动回复和自定义菜单配置接口

SpringCloud系列五：Ribbon 负载均衡（Ribbon 基本使用、Ribbon 负载均衡、自定义 Ribbon 配置、禁用 Eureka 实现 Ribbon 调用）

Feign自定义编程配置

详解Springboot中自定义SpringMVC配置