Scheduling a Simple Dagster Job to Run Daily

Answered using GPT 40 by GPT40

Question:

I want to schedule quite a simple dagster job which needs to run everyday

Answer:

Scheduling a Dagster job to run every day is a straightforward process. Below are the steps to create and schedule a simple Dagster job:

  1. Define the Job: First, you need to define the job that you want to run. This involves creating the necessary solids (or ops in newer versions) and then combining them into a job.

  2. Create a Schedule: Next, you create a schedule that specifies when the job should run.

  3. Add the Schedule to the Repository: Finally, you add the schedule to your Dagster repository so that it can be picked up by the Dagster scheduler.

Here is an example to illustrate these steps:

Step 1: Define the Job

from dagster import job, op

@op
def my_simple_op(context):
    context.log.info("Hello, Dagster!")

@job
def my_simple_job():
    my_simple_op()

Step 2: Create a Schedule

from dagster import schedule

@schedule(cron_schedule="0 0 * * *", job=my_simple_job, execution_timezone="UTC")
def daily_schedule(_context):
    return {}

In this example, the cron_schedule="0 0 * * *" specifies that the job should run every day at midnight UTC.

Step 3: Add the Schedule to the Repository

from dagster import repository

@repository
def my_repository():
    return [my_simple_job, daily_schedule]

Putting It All Together

Here is the complete code in one file:

from dagster import job, op, schedule, repository

@op
def my_simple_op(context):
    context.log.info("Hello, Dagster!")

@job
def my_simple_job():
    my_simple_op()

@schedule(cron_schedule="0 0 * * *", job=my_simple_job, execution_timezone="UTC")
def daily_schedule(_context):
    return {}

@repository
def my_repository():
    return [my_simple_job, daily_schedule]

Running the Scheduler

To run the scheduler, you need to start the Dagster Daemon. You can do this by running:

dagster-daemon run

Make sure that the Dagster Daemon is running in the background to pick up the schedules and execute the jobs at the specified times.

Additional Configuration

  • Execution Timezone: You can specify the execution_timezone parameter in the @schedule decorator to ensure the job runs at the correct time in your desired timezone.
  • Dagster Instance: Ensure that your Dagster instance is properly configured to use a persistent storage backend (like PostgreSQL) if you are running this in a production environment.

That's it! Your Dagster job is now scheduled to run every day.