AirFlow常见问题汇总

airflow常见问题的排查记录如下:


airflow的scheduler进程在执行一个任务后就挂起进入假死状态

出现这个情况的一般原因是scheduler调度器生成了任务,但是无法发布出去。而日志中又没有什么错误信息。

可能原因是Borker连接依赖库没安装:

如果是redis作为broker则执行pip install apache‐airflow[redis]

如果是rabbitmq作为broker则执行pip install apache-airflow[rabbitmq]

还有要排查scheduler节点是否能正常访问rabbitmq。


当定义的dag文件过多的时候,airflow的scheduler节点运行效率缓慢

airflow的scheduler默认是起两个线程,可以通过修改配置文件airflow.cfg改进:

[scheduler]
# The scheduler can run multiple threads in parallel to schedule dags.
# This defines how many threads will run.
#默认是2这里改为100
max_threads = 100

AirFlow: jinja2.exceptions.TemplateNotFound

? 这是由于airflow使用了jinja2作为模板引擎导致的一个陷阱,当使用bash命令的时候,尾部必须加一个空格:

  • Described here : see below. You need to add a space after the script name in cases where you are directly calling a bash scripts in the bash_command attribute of BashOperator - this is because the Airflow tries to apply a Jinja template to it, which will fail.
t2 = BashOperator(
task_id=‘sleep‘,
bash_command="/home/batcher/test.sh", // This fails with `Jinja template not found` error
#bash_command="/home/batcher/test.sh ", // This works (has a space after)
dag=dag)

参考链接:

https://stackoverflow.com/questions/42147514/templatenotfound-error-when-running-simple-airflow-bashoperator

https://cwiki.apache.org/confluence/display/AIRFLOW/Common+Pitfalls


AirFlow: Task is not able to be run

任务执行一段时间后突然无法执行,后台worker日志显示如下提示:

[2018-05-25 17:22:05,068] {jobs.py:2508} INFO - Task is not able to be run

查看任务对应的执行日志:

cat /home/py/airflow-home/logs/testBashOperator/print_date/2018-05-25T00:00:00/6.log
...
[2018-05-25 17:22:05,067] {models.py:1190} INFO - Dependencies not met for <TaskInstance: testBashOperator.print_date 2018-05-25 00:00:00 [success]>,
dependency ‘Task Instance State‘ FAILED: Task is in the ‘success‘ state which is not a valid state for execution. The task must be cleared in order to be run.

根据错误提示,说明依赖任务状态失败,针对这种情况有两种解决办法:

  • 使用airflow run运行task的时候指定忽略依赖task:

    $ airflow run -A dag_id task_id execution_date
  • 使用命令airflow clear dag_id进行任务清理:
    $ airflow clear -u testBashOperator

CELERY: PRECONDITION_FAILED - inequivalent arg ‘x-expires‘ for queue ‘[email protected]‘ in vhost ‘‘

在升级celery 4.x以后使用rabbitmq为broker运行任务抛出如下异常:

[2018-06-29 09:32:14,622: CRITICAL/MainProcess] Unrecoverable error: PreconditionFailed(406, "PRECONDITION_FAILED - inequivalent arg ‘x-expires‘ for queue ‘[email protected]
SZ-L01395.celery.pidbox‘ in vhost ‘/‘: received the value ‘10000‘ of type ‘signedint‘ but current is none", (50, 10), ‘Queue.declare‘)
Traceback (most recent call last):
  File "c:\programdata\anaconda3\lib\site-packages\celery\worker\worker.py", line 205, in start
    self.blueprint.start(self)
.......
  File "c:\programdata\anaconda3\lib\site-packages\amqp\channel.py", line 277, in _on_close
    reply_code, reply_text, (class_id, method_id), ChannelError,
amqp.exceptions.PreconditionFailed: Queue.declare: (406) PRECONDITION_FAILED - inequivalent arg ‘x-expires‘ for queue ‘[email protected]‘ in vhost ‘/‘
: received the value ‘10000‘ of type ‘signedint‘ but current is none

出现该错误的原因一般是因为rabbitmq的客户端和服务端参数不一致导致的,将其参数保持一致即可。

? 比如这里提示是x-expires 对应的celery中的配置是control_queue_expires。因此只需要在配置文件中加上control_queue_expires = None即可

? 在celery 3.x中是没有这两项配置的,在4.x中必须保证这两项配置的一致性,不然就会抛出如上的异常。

我这里遇到的了两个rabbitmq的配置与celery配置的映射关系如下表:

rabbitmq celery4.x
x-expires control_queue_expires
x-message-ttl control_queue_ttl

CELERY: The AMQP result backend is scheduled for deprecation in version 4.0 and removal in version v5.0.Please use RPC backend or a persistent backend

celery升级到4.x之后运行抛出如下异常:

/anaconda/anaconda3/lib/python3.6/site-packages/celery/backends/amqp.py:67: CPendingDeprecationWarning:
    The AMQP result backend is scheduled for deprecation in     version 4.0 and removal in version v5.0.     Please use RPC backend or a persistent backend.
  alternative=‘Please use RPC backend or a persistent backend.‘)

原因解析:

在celery 4.0中 rabbitmq 配置result_backbend方式变了:

以前是跟broker一样:

result_backend = ‘amqp://guest:[email protected]:5672//‘

现在对应的是rpc配置:

result_backend = ‘rpc://‘

参考链接:

http://docs.celeryproject.org/en/latest/userguide/configuration.html#std:setting-event_queue_prefix


CELERY: ValueError(‘not enough values to unpack (expected 3, got 0)‘,)

windows上运行celery 4.x抛出以下错误:

[2018-07-02 10:54:17,516: ERROR/MainProcess] Task handler raised error: ValueError(‘not enough values to unpack (expected 3, got 0)‘,)
Traceback (most recent call last):
    ......
    tasks, accept, hostname = _loc
ValueError: not enough values to unpack (expected 3, got 0)

celery 4.x暂时不支持windows平台,如果为了调试目的的话,可以通过替换celery的线程池实现以达到在windows平台上运行的目的:

pip install eventlet
celery -A <module> worker -l info -P eventlet

参考链接:

https://stackoverflow.com/questions/45744992/celery-raises-valueerror-not-enough-values-to-unpack

https://blog.csdn.net/qq_30242609/article/details/79047660


Airflow: ERROR - ‘DisabledBackend‘ object has no attribute ‘_get_task_meta_for‘

airflow运行中抛出以下异常:

Traceback (most recent call last):
  File "/anaconda/anaconda3/lib/python3.6/site-packages/airflow/executors/celery_executor.py", line 83, in sync
......
    return self._maybe_set_cache(self.backend.get_task_meta(self.id))
  File "/anaconda/anaconda3/lib/python3.6/site-packages/celery/backends/base.py", line 307, in get_task_meta
    meta = self._get_task_meta_for(task_id)
AttributeError: ‘DisabledBackend‘ object has no attribute ‘_get_task_meta_for‘
[2018-07-04 10:52:14,746] {celery_executor.py:101} ERROR - Error syncing the celery executor, ignoring it:
[2018-07-04 10:52:14,746] {celery_executor.py:102} ERROR - ‘DisabledBackend‘ object has no attribute ‘_get_task_meta_for‘

这种错误有两种可能原因:

  1. CELERY_RESULT_BACKEND属性没有配置或者配置错误;
  2. celery版本太低,比如airflow 1.9.0要使用celery4.x,所以检查celery版本,保持版本兼容;

airflow.exceptions.AirflowException dag_id could not be found xxxx. Either the dag did not exist or it failed to parse

查看worker日志airflow-worker.err

airflow.exceptions.AirflowException: dag_id could not be found: bmhttp. Either the dag did not exist or it failed to parse.
[2018-07-31 17:37:34,191: ERROR/ForkPoolWorker-6] Task airflow.executors.celery_executor.execute_command[181c78d0-242c-4265-aabe-11d04887f44a] raised unexpected: AirflowException(‘Celery command failed‘,)
Traceback (most recent call last):
  File "/anaconda/anaconda3/lib/python3.6/site-packages/airflow/executors/celery_executor.py", line 52, in execute_command
    subprocess.check_call(command, shell=True)
  File "/anaconda/anaconda3/lib/python3.6/subprocess.py", line 291, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command ‘airflow run bmhttp get_op1 2018-07-26T06:28:00 --local -sd /home/ignite/airflow/dags/BenchMark01.py‘ returned non-zero exit status 1.

? 通过异常日志中的Command信息得知, 调度节点在生成任务消息的时候同时也指定了要执行的脚本的路径(通过ds参数指定),也就是说调度节点(scheduler)和工作节点(worker)相应的dag脚本文件必须置于相同的路径下面,不然就会出现以上错误。

https://stackoverflow.com/questions/43235130/airflow-dag-id-could-not-be-found



AirFlow常见问题汇总

原文地址:https://www.cnblogs.com/cord/p/9397584.html

时间: 2024-11-08 23:58:43

AirFlow常见问题汇总的相关文章

Google AdMob 常见问题汇总

AdMob 常见问题汇总 五 09 **** 客服相关 **** 1. 请问 AdMob 有没有客服可以提供如帐号被封.付款信息.申诉渠道等的咨询? 有的:有关 AdMob 的问题,可以访问我们的帮助中心: http://support.google.com/admob/?hl=zh-Hans&hlrm=en. 如果没有找到相应解答,也可以通过 @AdMob开发者官方微博 与我们进行互动: http://www.weibo.com/googleAdMob. **** 使用 AdMob *****

destoon入门实例与常见问题汇总

destoon入门实例与常见问题 收集了一些destoon入门实例与常见问题,大家做个参考.转自:http://blog.csdn.net/vip_linux/article/details/37833963 链接如下: destoon忘记后台密码怎么办?destoon找回管理员密码 忘记destoon管理员后台账号密码怎么办?解决方法 destoon如何实现调用自增数字从1开始 destoon底部添加你是第几位访问者 destoon调用热门关键字的例子 destoon首页怎么调用求购供应信息的

IDE常见问题汇总

1.Oracle10gForVistaX64下载地址:http://download.oracle.com/otn/nt/oracle10g/10204/10204_vista_w2k8_x64_production_db.zip 2.Oracle 10g x64 for Vista 在Win7x64下安装需要修改三个文件:(1).\10204_vista_w2k8_x64_production_db\database\stage\prereq\db\refhost.xml (2).\10204

[转]H5项目常见问题汇总及解决方案

html { line-height: 1.6 } body { font-family: -apple-system-font, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", sans-serif; background-color: #f3f3f3; line-height: inherit } body.rich_media

CentOS安装Oracle数据库详细介绍及常见问题汇总

一.安装前准备 1.软件硬件要求 操作系统:CentOS 6.4(32bit)Oracle数据库版本:Oracle 10g(10201_database_linux32.zip)最小内存:1G(检查命令:#grep MemTotal /proc/meminfo)最小Swap:2G(检查命令:#grep SwapTotal /proc/meminfo) 2.安装依赖包 # rpm -q binutils compat-libstdc++-33 compat-libstdc++-33.i686 el

mysql几个常见问题汇总

mysql几个常见问题汇总 一. mysql下的清屏命令 \! clear:Ctrl+shift+L:Ctrl+L Linux下的清屏命令 Clear ; Ctrl+L; Shift+ctrl+L 二. 查看mysql版本的四种方法 1:在终端下:mysql -V. 以下是代码片段: [[email protected] ~]$ mysql -V mysql Ver 14.7 Distrib 4.1.10a, for redhat-linux-gnu (i686) 2:在mysql中:mysql

SVN集中式版本控制器的安装、使用与常见问题汇总

SVN是Subversion的简称,是一个开放源代码的版本控制系统,它采用了分支管理系统,集中式版本控制器 官方网站:https://www.visualsvn.com/ 下载右边的服务器端,左边的客户端收费,我们使用TortoiseSVN替代他即可 TortoiseSVN:TortoiseSVN 是 Subversion 版本控制系统的一个免费开源客户端,可以超越时间的管理文件和目录.文件保存在中央版本库,除了能记住文件和目录的每次修改以外,版本库非常像普通的文件服务器.你可以将文件恢复到过去

Altera的FPGA_常见问题汇总65

常见问题汇总:1.alt2gxb模块的每个发送端都需要一个高速:通常情况下一定要从FPGA外面引进来,首选是GX:2.如果我一个FPGA里面有多个alt2gxb模:3.gxb模块里面的Calibrationclk:校准内部匹配电阻用:4.用到gxb模块的bank的参考电压是否必须接:gxb用1.5V或1.2V,推荐客户用1.5V.:5.gxb模块的输入端如果 常见问题汇总 1. alt2gxb模块的每个发送端都需要一个高速的pll_inclk时钟(至少100M以上),请问这个时钟一定要从FPGA

Installshield脚本拷贝文件常见问题汇总

原文:Installshield脚本拷贝文件常见问题汇总 很多朋友经常来问:为什么我用CopyFile/XCopyFile函数拷贝文件无效?引起这种情况的原因有很多,今天略微总结了一下,欢迎各位朋友跟帖补充不完善的地方1:文件路径错误,比如将CopyFile/XCopyFile脚本放在OnFirstUIBefore里,但是引用的文件路径却是INSTALLDIR/TARGETDIR的,而这时候,安装程序还未将安装文件拷贝至用户安装路径,当然是找不到可拷贝的文件的调试和解决办法:在任何拷贝之前,用F