i have a created a DAG with following schedule 0 3,6 * * * . The composer is running in UTC timezone and i have scheduled my dag in EST(business requirement). During the Day Light Savings that happened on 3rd november i can see there were multiple runs with 2 am rather than 3 AM.
Why are there multiple runs happening with 2AM ?
Dag :
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime as dt
import pendulum
Define timezone-aware start date
local_tz = pendulum.timezone(“America/New_York”)
start_date = dt(2024, 11, 1, tzinfo=local_tz)
Define default arguments
default_args = {
“start_date”: start_date,
“max_active_runs”: 1,
“region”: ‘us-east4’,
“retries”: 0,
}
Define the DAG
with DAG(
dag_id=“dst_check”,
default_args=default_args,
schedule_interval=“0 3,6 * * *”, # Scheduled for 3:00 AM and 6:00 AM daily
catchup=True,
) as dag:
Define a Python function to print dates
def print_dates(**kwargs ![]()
Get instance date, data interval start, and data interval end from kwargs
instance_date = kwargs[‘execution_date’]
data_interval_start = kwargs[‘data_interval_start’]
data_interval_end = kwargs[‘data_interval_end’]
print(f"Instance Date (Execution Date): {instance_date}“)
print(f"Data Interval Start: {data_interval_start}”)
print(f"Data Interval End: {data_interval_end}")
Define the PythonOperator task to call print_dates
print_dates_task = PythonOperator(
task_id=“print_dates_task”,
python_callable=print_dates,
provide_context=True,
)
Add the task to the DAG
print_dates_task
Multiple instances observed :
scheduled__2024-11-02T10:00:00+00:00
[2024-11-07, 10:02:40 EST] {logging_mixin.py:154} INFO - Instance Date (Execution Date): 2024-11-02T10:00:00+00:00
[2024-11-07, 10:02:40 EST] {logging_mixin.py:154} INFO - Data Interval Start: 2024-11-02T10:00:00+00:00
[2024-11-07, 10:02:40 EST] {logging_mixin.py:154} INFO - Data Interval End: 2024-11-03T07:00:00+00:00
scheduled__2024-11-02T10:00:00+00:00
[2024-11-07, 10:02:41 EST] {logging_mixin.py:154} INFO - Instance Date (Execution Date): 2024-11-02T11:00:00+00:00
[2024-11-07, 10:02:41 EST] {logging_mixin.py:154} INFO - Data Interval Start: 2024-11-02T11:00:00+00:00
[2024-11-07, 10:02:41 EST] {logging_mixin.py:154} INFO - Data Interval End: 2024-11-03T07:00:00+00:00
scheduled__2024-11-03T07:00:00+00:00
[2024-11-07, 10:02:41 EST] {logging_mixin.py:154} INFO - Instance Date (Execution Date): 2024-11-03T07:00:00+00:00
[2024-11-07, 10:02:41 EST] {logging_mixin.py:154} INFO - Data Interval Start: 2024-11-03T07:00:00+00:00
[2024-11-07, 10:02:41 EST] {logging_mixin.py:154} INFO - Data Interval End: 2024-11-03T08:00:00+00:00