The best way to manage dependencies between DAGs in Airflow
Feb 11, 2019
Airflow provides a few different sensors and operators which enable you to coordinate scheduling between different DAGs, including:
Which one is the best to use?
I have previously written about how to use ExternalTaskSensor in Airflow but have since realized that this is not always the best tool for the job. Depending on your specific decision criteria, one of the other approaches may be more suitable to your problem.
I need the ability to sometimes run
dag_B independent of
dag_A, but I want to share state (history) between them.
SubDagOperator creates a tidy parent–child relationship between your DAGs. The sub-DAGs will not appear in the top-level UI of Airflow, but rather nested within the parent DAG, accessible via a Zoom into Sub DAG button. This is a nice feature if those DAGs are always run together. However if you need to sometimes run the sub-DAG alone, you will need to initialize it as it’s own top-level DAG, which will not share state with the sub-DAG.
In this scenario, you are better off using either
My local development or test environment uses SQLite rather than a Postgres DB.
SQLite does not support concurrent write operations, so it forces Airflow to use the
SequentialExecutor, meaning only one task can be active at any given time. Using ExternalTaskSensor will consume one worker slot spent “waiting” for the upstream task, and so your Airflow will be deadlocked.
dag_B to sometimes run depending on some conditional logic
If you want to include conditional logic, you can feed a python function to
TriggerDagRunOperator which determines which DAG is actually triggered (if at all).