Upgrade Your Drupal Skills

We trained 1,000+ Drupal Developers over the last decade.

See Advanced Courses NAH, I know Enough

Of Cicadas and cron jobs

Parent Feed: 

The cicada is a flying insect found world-wide. It's loud but not particularly threatening. It's most famous attribute, though, is that many species of cicada (particularly in North America) are periodic, only emerging every 13 or 17 years depending on the species. When it does emerge, a huge brood reaches maturity all at once, mates, lays eggs, and then dies. The eggs hatch and the offspring spend the next 13 or 17 years living deep underground and burrowing before repeating the cycle again.

But why 13 and 17 years? That's a rather odd set of numbers... And that's actually the point. Those lifespans are both prime numbers, that is, they are divisible only by themselves and one. Many cicada predators also have multi-year life cycles rather than emerging every year. So what are the odds of a large number of cicada predators emerging in the same year as a large number of cicadas?

Very low, in fact. That's the point. Because a prime number is only divisible by 1 and itself, a smaller number sequence will overlap with it only when those two are multiplied. That is, a 4 year cycle predator and a 13 year cicada will only emerge at the same time every 4 * 13 = 52 years. If the cicada emerged every 12 years, however, the 4 year predator would have a veritable buffet every third generation and the cicadas would have a bad time every time.

Over time, evolutionary pressure weeded out the many-common-divisor periodic species of cicada, leaving only those that have overlapping generations every year and those that have a huge all-at-once generation on a prime-number schedule.

What can we learn from the little cicada? If you have two repeating events, and you want them to happen at the same time as rarely as possible, have them repeat on prime numbers.

But what does that have to do with web development?

A website frequently has background tasks that it needs to run from time to time; sometimes every few minutes, sometimes every few hours, sometimes every few days. Most often these are run using a cron task.

Generally speaking it's a bad idea to run more than one cron job at once. Even if they don't interfere with each other they may use a lot of CPU, and you don't want them to slam the system all at once. In fact, on Platform.sh we don't allow that to happen: If a cron task tries to start but there's another already running, we force the new one to pause and wait for the first to complete.

That can sometimes cause issues if, say, a nightly backup process wants to start while a routine every-few-minutes cron task is running. The snapshot will start but block waiting for the other cron task to finish, which if it's a long running task could result in a brief period of site outage while the snapshot waits its turn.

Avoiding predatory cron jobs

So how do we make sure one cron job runs at the same time as another as little as possible? The same way cicadas avoid predators: Prime numbers!

More specifically, say we have a cron task that runs normal system maintenance every 20 minutes. Then we have an import process that periodically reads data from an external system every 10 minutes, and another that runs every 5 minutes to send out pending emails.

The result will be that every 10 minutes we have two cron tasks competing to run at the same time, and every 20 minutes we have three cron tasks competing. That's no good at all!

Instead, let's set the system maintenance to run every 23 minutes, the import to run every 11 minutes, and the email runner every 7 minutes. It's almost the same schedule, but because the numbers are prime they will only very rarely overlap. (Every 77 minutes in the shortest case.) That spreads the load out far better and avoids any process blocking on another.

Now if we want to add a nightly backup, we can have it run at, say, 17 minutes past 4:00 am. It will be extremely rare for the other cron tasks to hit at the 17 minute mark exactly, so our snapshot will almost never need to block on another cron task and our site won't freeze while it waits.

Isn't it nice when bugs end up helping your software run faster?

Author: 
Original Post: 

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web