nested operators airflow, good practice?
My goal is to pass a string of a file path from one @task into Emailoperators , so I can apply logic with the dataset that I will read from the file path to build up my operators that will send emails. My code looks like this:
Airflow dag, wait for certain period if triggered at a certain time
I have an airflow DAG dag-A that is triggered from another DAG, sometimes, this dag-A is triggered at 4 pm UTC (midnight EST), and when it gets triggered at midnight EST (4PM UTC), I want it to wait for 30 minutes and then start running at 16:30 UTC.
Apache Airflow Unable To Run It
I have been trying to install Apache Airflow on my local Mac machine for weeks. No tutorial have been helpful, I’ve been literally reading the Airflow website docs through and through, copying step by step processes, used Claude and ChatGPT but I don’t seem to get it running.
TypeError: process_file() takes 1 positional argument but 5 were given in Airflow DAG
I’m working on a Airflow DAG where I want to process a list of files sequentially but allow multiple files to be processed in parallel. I created the following DAG, but I encounter a TypeError when running it:
Delay evaluation of CLI arg for airflow DAG’s job to runtime?
I have a DAG defined via airflow. For simplicity, let’s assume this DAG has 2 nodes, A, B, and the edge A->B.
Non-critical errors in logs cluttering output
I am working with an Airflow environment managed through AWS Managed Workflows for Apache Airflow (MWAA). In my log outputs, I have been noticing non-critical errors that are cluttering the logs, which I believe is a code quality issue. These errors are related to the SecretsManagerBackend, specifically when trying to retrieve variables from AWS Secrets Manager.
How to make DAG running past midnight look at last 3 days of data?
I have a DAG running overnight where I need to run a python function on the past 3 days of data.
Parallel execution of chosen list of tasks
I have a DAG that has as parameter a customizable list of tasks to execute, so that I can choose for example only to execute tasks ['prod_1','prod_2','prod_5']
. I do this via a BranchPythonOperator()
that allows me to run only the tasks I set as input when running the dag:
Set which nodes to execute in parallel
I have a list of products, let’s say “product_1”, “product_2”, etc. for which I want to run a node that using a for
loop basically runs a Python Jupyter notebook CustomPythonNotebookOperator()
no more than 3 nodes at the same time as specified by the parameter parallel
. This kind of node comes from my organization and cannot be changed.
Airflow – set which nodes to execute in parallel
I have a list of products, lets say “product_1”, “product_2”, etc… for which I want to run a node that using a for
loop basically runs a python jupyter notebook CustomPythonNotebookOperator()
no more than 3 nodes at the same time as specified by the parameter parallel
. This kind of node comes form my organization and cannot be changed.