Drupal Queues
Steven Jones
Queues are a wonderful way of separating different parts of a system. Once you have separated those parts you can do lots of interesting things, like be more fault tolerant or have a more responsive front end for your users.
For example, lets suppose that we have a website on which we can book a holiday. We can choose lots of different options and at the end of the process when we've booked the holiday we'd like to send the customer a nice PDF detailing all the options they've chosen.
Typically, in a PHP application, you'd have to process this as part of a page request, so say on the final confirmation page, it might take 10 seconds to load, just because it's spending 9 seconds generating a PDF and emailing it out.
This wouldn't be a great user experience, especially as they might have just put lots of credit card information into the previous page and hit submit...and then waited for a long time.
We'd ideally keep the page request short and sweet and return it immediately and then send the PDF out later.
Queue it up
We can pop the work to generate the PDF and send out the email into a queue, and then return the confirmation page to user as soon as it's rendered. Then we can process the queue in some way so that the user gets the PDF as soon as it's ready.
Let assume that we have a function that takes a 'booking reference number' and a 'email' address and will build a PDF for that booking and email it to the given address:
function send_booking_pdf($booking_reference, $email_address) {
// Expensive code here, takes about 10 seconds to run.
}
We could just call that function during our page callback for the booking confirmation page:
function booking_confirmation_page($booking_reference, $customer_email) {
// Build up the rest of the page.
$html_page = theme('booking_confirmation_page', $booking_reference);
// Send the customer a booking PDF.
send_booking_pdf($booking_reference, $customer_email);
// Print the page to the visitor
return $html_page;
}
But this would be really slow for the visitor, we want to use a queue.
Queues in Drupal
Queues in Drupal are relatively simple, you have a unit of work, that's called an 'item' and you have a function that accepts that item and does the work.
So the first step to getting a queue in Drupal is to pick a name for your queue. You probably want to pop this name in a constant or function, so if you decide to change it later you don't have to hunt down all the places you used it. It needs to be a Drupal machine name, so letters and underscores basically:
function booking_pdf_queue_name() {
return 'booking_pdf_queue';
}
Then you need to declare to Drupal that such a queue exists.
/**
* Implements hook_cron_queue_info().
*/
function booking_cron_queue_info() {
return array(
booking_pdf_queue_name() => array(
'worker callback' => 'booking_pdf_queue_worker',
'skip on cron' => FALSE,
'time' => 30,
),
);
}
We're just declaring the queue to Drupal, telling it the function that is going to do the work for the queue and that we do want items from our queue processed on cron, and that we'd like Drupal to spend a maximum of 30 seconds processing those items on cron. We'll come back to other ways to process the items later.
Adding an item
So, instead of calling our send_booking_pdf
function directly, we'll add an item to our queue that will do so, so our page callback becomes:
function booking_confirmation_page($booking_reference, $customer_email) {
// Build up the rest of the page.
$html_page = theme('booking_confirmation_page', $booking_reference);
// Send the customer a booking PDF.
// Create a item for the queue.
$item = array(
'booking_reference' => $booking_reference,
'email_address' => $customer_email,
);
// Add the item to the queue.
DrupalQueue::get(booking_pdf_queue_name())->createItem($item);
// Print the page to the visitor
return $html_page;
}
So we've got a bit more code there, but this code will run as quickly as the item can be pushed into the queue. By default, Drupal will store the items for the queue in the Database, so it'll be pretty quick, but one can configure Drupal to store items in an external queue system like Redis or RabbitMQ if performance is an issue.
Processing an item
The next step is to actually process the queue item into our booking pdf with email, this again is quite simple:
function booking_pdf_queue_worker($item) {
// Send the customer a booking PDF.
send_booking_pdf($item['booking_reference'], $item['email_address']);
}
That's it!
The $item
is the same item that we pushed into the queue, and it's arrived at our worker callback ready to go.
The key thing here is that the worker callback has been called with the data in the $item
later on, in fact, it could be hours later, it could even be called on a completely different machine.
Processing the queue
Right, so, we're done, sort of.
We now have a booking confirmation page that will load up quickly, and then people will get their confirmation PDFs emailed to them on cron, 'later'.
However, later might be a bit too late, because maybe we only run the cron process once a day, and maybe we get 100 bookings a day. But we're only going to process the queue for 30 seconds a day, and it takes about 10 seconds to generate a PDF, very quickly we're going to have a backlog! (Aside, but this is a nice backlog, in that we can fix the scaling problem when we hit it, not before, so maybe this simple process is fine for your little site, but when you hit the big time you can easily scale up.)
So, in grown up systems you'd just have some process running on your webserver the whole time, looking for new items in the queue and processing them as soon as they arrive. If you needed to, you'd have two processes doing this and so on.
If you want to do this for a Drupal site, then I can highly recommend the waiting queue project which provides a Drush command to process items in your queue.
It'll sit there and then process the items as soon as they show up. If you can set this sort of thing up, then this is the best solution for an entirely Drupal based queue.
However, what if we're on a server where we don't have access to use Drush? Well we can process the items in some other way. See this is one of the cool things about a queue, you have to assume that they can be processed from some totally different context, so you can easily introduce a totally different context in which your items are processed.
So, in my example, we have a booking confirmation page, that we want to load super fast, we've done that, but now the customer is waiting for hours for their booking PDF. A cheap and cheerful solution is to have a second Drupal page callback, that processes an item from the queue, then we can get our customer to visit that URL by placing a small, almost invisible iframe (or even an image) on the booking confirmation page.
What does that page callback look like, well, something like this:
function booking_run_single_queue() {
// Allow execution to continue even if the request gets canceled.
@ignore_user_abort(TRUE);
// Prevent session information from being saved while queue is running.
$original_session_saving = drupal_save_session();
drupal_save_session(FALSE);
// Force the current user to anonymous to ensure consistent permissions on
// queue runs.
$original_user = $GLOBALS['user'];
$GLOBALS['user'] = drupal_anonymous_user();
// Try to allocate enough time to run all the queue implementations.
drupal_set_time_limit(240);
// Grab only our cron queue.
$queues = booking_cron_queue_info();
foreach ($queues as $queue_name => $info) {
$function = $info['worker callback'];
$end = time() + (isset($info['time']) ? $info['time'] : 15);
$queue = DrupalQueue::get($queue_name);
while (time() < $end && ($item = $queue->claimItem())) {
try {
$function($item->data);
$queue->deleteItem($item);
}
catch (Exception $e) {
// In case of exception log it and leave the item in the queue
// to be processed again later.
watchdog_exception('booking_queue', $e);
}
}
}
// Restore the user.
$GLOBALS['user'] = $original_user;
drupal_save_session($original_session_saving);
}
Basically, that's just a copy of the bits of drupal_cron_run
that are important to us. We start by making sure that the environment is consistent and ready to go, and then we start processing items in our queue if there are any.
Now, when our customer hits the booking confirmation page, it'll add an item to the queue, the page will load in their browser, and it'll hit the iframe/picture URL and process an item in the queue. So although it'll look like the page is still loading to the user, the page will be visible and they'll be able to interact, but the queue will be being processed.
If for some reason, they have blocked iframes, or images or something goes awry, then the work is still in the queue, so it'll either get picked up by cron, or the next customer that books.
Note however, that this does mean that you could tie up many webserver processes doing this complicated work, and so care must be taken to ensure that there's sufficient security on the page callback for processing queue items. You may want to pop a random token on the end of the URL that changes every minute so that an attacker would only be able to make your site process the queue for a minute before needing to make another booking to generate another URL with the token for example. If the queue is empty though, this will be a pretty lightweight page request.
Error capture and reprocessing
One of the best things about database backed Drupal queues is that if there's an error processing the item in the queue, and an exception is raised, then it'll be logged, and the queue item will be reprocessed in 24 hours.
This means that if your server has a random error, or you have an unexpected bug in your code, then so long as an exception is thrown, you can fix the bug, and then let your queue processing catch right up.
So, for example, if we had an issue whereby when creating the PDFs we had some issue where we couldn't load all the data for the PDF we wanted to generate, or there was some error sending the email, we could just raise an exception. When we notice, and then fix the bug, we can deploy our fix and then those customers will get their booking PDF and be happy, and we don't need to work out which ones didn't get their PDF and need to be sent one manually.
Other uses
Queues are an industry standard thing, and have wide applications. If you need two systems to talk to each other, then doing so via a queue can be a great way to do it. Say you have a need to run a nodejs script against some data, pop it in a queue, and get nodejs to pop the results into another queue, that Drupal can then process the results of.
There are many ways that you could use a queue in Drupal, and once you have a Drush queue processor running on your server it's incredibly freeing. Many tasks that would otherwise take a long time to complete and slow down your page load can just be pushed into a queue and you know that they'll get processed 'later'.
Some ideas for things you could put in a queue:
- When adding some content, invalidating upstream Varnish caches.
- When adding a review, re-computing averages and updating content that contains those averages.
- Sending a large number of emails when updating content.
- Sending webhooks to third party services when a change to content is made.