Retries with Quartz.NET

This is the first in a series of posts.

Whenever you have a task that might take a while to complete it is usually a good idea to run it in the background and not block your application.

What makes up “a while” and how far “in the background” you should run your tasks is usually up for discussion.

The scenario here is an ASP.NET web application that needs to perform tasks and might fail because some objects that need to be modified are locked. If a task hits such a lock it should be run again later after waiting a little. And again after waiting some more. And again. And then again sometime in the middle of the night when it is rather improbable that the objects are still locked. And then give up eventually sending a notification to a human to resolve the problem.

For in process retries Microsoft p&p offers the Transient Fault Handling Application Block (infos on MSDN and CodePlex). But I needed a persistent retry strategy that would kick in if the application goes down for one reason or another.

There are a lot of things to keep in mind when running recurring tasks inside ASP.NET and a lot of really good reasons why you don’t want to write your own scheduler. Luckily there are quite a few alternatives to that.

I decided to solve my problem using Quartz.NET. It is free. It is battle hardened. It is widely used so the search engine of my choice will be able to find a substantial amount of blog posts and questions on popular Q&A sites. And last but not least: I wanted to take a look at it for quite some time.

I don’t want to give an introduction to Quartz.NET here. Their website does a fairly good job (no pun intended) at that.

So I will jump right into the thick of things and show you my approach to how to implement retries using Quartz.NET.

After some web search I decided to use this post as a starting point but make some adjustments along the way.

To implement custom tasks (or jobs as Quartz calls them) you have to implement the IJob interface. It doesn’t get any more straightforward. Implement your logic inside the single method of the interface and you are done. The only thing to keep in mind is that your job must only throw exceptions of a specific type. I outline my solution to that problem in another post of this series.

To be able to test my solution I created a job that would always fail. That turned out to be really simple.

public class AlwaysFails : IJob
{
  private int counter;
  public AlwaysFails()
  {
    this.counter = 0;
  }
  public int Counter
  {
    get { return this.counter; }
  }
  public void Execute(IJobExecutionContext context)
  {
    this.counter++;
    throw new NotImplementedException();
  }
}

For every call to the Execute()-method I increase a counter by one and throw an exception. Quartz requires that you only throw JobExecutionException inside your jobs. There is a simple solution to that problem that allows me to ignore this requirement here.

Now to the interesting part. As mentioned by the post linked above a custom IJobListener is the best place to put your retry logic. But I don’t want to hard-wire the logic that calculates whether the job should run again or when the next attempt should be made into the handler itself.

The listener only looks at whether the job failed. Then it asks a retry strategy (we will come to that in a moment) if the job should run again. And if the answer is yes it asks the strategy when the next attempt should be made and reschedules the job accordingly.

public class RetryJobListener : JobListenerSupport
{
  private readonly IRetryStrategy retryStrategy;
  public RetryJobListener(IRetryStrategy retryStrategy)
  {
    this.retryStrategy = retryStrategy;
  }
  public override string Name { get { return "Retry"; } }
  public override void JobWasExecuted(IJobExecutionContext context, JobExecutionException jobException)
  {
    if (JobFailed(jobException) && this.retryStrategy.ShouldRetry(context))
    {
      ITrigger trigger = this.retryStrategy.GetTrigger(context);
      bool unscheduled = context.Scheduler.UnscheduleJob(context.Trigger.Key);
      DateTimeOffset nextRunAt = context.Scheduler.ScheduleJob(context.JobDetail, trigger);
    }
  }
  public override void JobToBeExecuted(IJobExecutionContext context)
  {
  }
  private static bool JobFailed(JobExecutionException jobException)
  {
    return jobException != null;
  }
}

Nothing fancy so far. The retry strategy that contains the logic to determine whether or not the job should run again consists of the simple interface IRetryStrategy

public interface IRetryStrategy
{
  bool ShouldRetry(IJobExecutionContext context);
  ITrigger GetTrigger(IJobExecutionContext context);
}

and (as of now) one implementation for an exponential back-off strategy.

public class ExponentialBackoffRetryStrategy : IRetryStrategy
{
  private const string Retries = "Retries";
  private readonly IRetrySettings settings;
  public ExponentialBackoffRetryStrategy(IRetrySettings settings)
  {
    this.settings = settings;
  }
  public bool ShouldRetry(IJobExecutionContext context)
  {
    int retries = GetAlreadyPerformedRetries(context);
    return retries < this.settings.MaxRetries;
  }
  public ITrigger GetTrigger(IJobExecutionContext context)
  {
    int retries = GetAlreadyPerformedRetries(context);
    long factor = (long)Math.Pow(2, retries);
    TimeSpan backoff = new TimeSpan(this.settings.BackoffBaseInterval.Ticks * factor);

    ITrigger trigger = TriggerBuilder.Create()
      .StartAt(DateTimeOffset.UtcNow + backoff)
      .WithSimpleSchedule(x => x.WithRepeatCount(0))
      .WithIdentity(context.Trigger.Key)
      .ForJob(context.JobDetail)
      .Build();

    context.JobDetail.JobDataMap[Retries] = ++retries;
    return trigger;
  }
  private static int GetAlreadyPerformedRetries(IJobExecutionContext context)
  {
    int retries = 0;
    object o;
    if (context.JobDetail.JobDataMap.TryGetValue(Retries, out o) && o is int)
    {
      retries = (int)o;
    }
    return retries;
  }
}

Like in the example we use the JobDataMap of the job to store the retry counter. That map is automatically persisted (if you configured a persistent IJobStore) between runs so we won’t loose that counter. We can configure how many retries should be performed using the IRetrySettings. There is an in-memory implementation and one that hooks up with my playground’s configuration system. The details don’t matter here so I will just show you the interface declaration.

public interface IRetrySettings
{
  int MaxRetries { get; }
  TimeSpan BackoffBaseInterval { get; }
}

Until we hit the upper boundary of MaxRetries we calculate the time the job should wait and create a new trigger for the next run using Quartz’ fluent TriggerBuilder. Don’t forget to increase the counter for the retries!

Well and that’s the implementation part. The next post will show you how to bring it all together for a test run.

Advertisement

5 Responses to Retries with Quartz.NET

  1. Pingback: Quartz and other gems | Outlawtrail - .NET Development

  2. Pingback: Testing Retries in Quartz.NET | Outlawtrail - .NET Development

  3. Pingback: Quartz.NET meets Design Patterns | Outlawtrail - .NET Development

  4. Dave says:

    Nice work! The only problem I see is the retry logic deletes the previous trigger when it schedules a retry. The issue here is that if you have a say daily job which fails it may retry todays job on failure and eventually work but the original schedule has been deleted so it wont run tomorrow. To do this correctly is quite complicated. The easiest way is to create a seperate one off trigger for retries and don’t delete the original one?

    • weberse says:

      Hi @Dave,
      you are right, that might be a problem. It wasn’t in my case because I didn’t schedule recurring tasks in combination with retries but “backgrounded” workload intensive one-off tasks. But it shouldn’t be too hard to make the necessary adjustments you proposed.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: