Skip to main content

12 month contrib challenge: XML sitemap

CM Drupal Contribution Challenge 2020
An article from ComputerMinds - Building with Drupal in the UK since 2005
5th May 2020

James Williams

Senior Developer
James Williams
Hey, you seem to look at this article a lot! Why not Bookmark this article so you can find it easily in the future?

This year, a bunch of us at ComputerMinds are challenging ourselves to make a Drupal contribution every single month. We'll be making patches, helping solve problems, providing translations, discussing issues and more! In this article series, we'll be writing about our favourite and most-interesting contributions, and making some loud noises in support of the amazing Drupal community.

Read more

We've been busy recently, but that doesn't stop us at ComputerMinds contributing back to the Drupal community! For our latest multilingual website, we needed an XML sitemap with alternate links and hreflang attributes. This site uses separate domains for each language - for example, www.example.se (??) and www.example.no (??). Search engines need these alternate links to help them understand how to match up each translation of a page, which are distributed across these different domains. But this site is built on our existing Drupal 7 e-commerce platform that uses the XML sitemap project, which has no support for alternate links (nor entity translation).

Sometimes contributing is glamourous (think of getting new features into Drupal core, or creating new modules for the community), but other times it just involves stepping back to gain some perspective, and do some menial tasks. This was the latter! There have been two long-running issues on drupal.org about getting the functionality I needed, both created way back in 2012:

So my challenge was to wade into these two, and the 281 comments on them, to figure out how to make progress. It turned out that I'd actually dipped into the first one nearly 2 years ago, and my colleagues had used work from them before too. But a lot changes in that time! The patches needed updating ('re-rolling') to work with the most recent code of the XML sitemap project itself and to work with the latest versions of entity_translation. I particularly enjoyed spotting a comment from our very own Mike Dixon, who threw a spanner in the works with a patch that confused everyone!

Eventually I created updated patches, resolved some bugs, and incorporated some additional valuable work that hadn't yet been reviewed by anyone. These patches ensured our client would be satisfied, and hopefully someone else will come along to review & approve them some day too.

Perhaps the most interesting thing to have come out of the work was a snippet of PHP I relied on to process links that needed adding to the sitemap. The XML sitemap project provides a UI to rebuild things in a batch, but recent changes to respect access mean that even that does not quite build the sitemap entirely. Instead, the work is done via cron, including a queue. Queues are pretty brilliant for ensuring background processes happen at some point, without hitting timeouts, or having to code up boilerplate code in hook_cron() implementations. But there's no way to limit what runs on cron (without additional modules) - I didn't want other modules to go doing other things when I was working on this, I just wanted my sitemaps to be built! So I came up with this, which is otherwise almost entirely pinched from drupal_cron_run():

function limited_cron($cron_modules, $queue_keys = array()) {
  $queues = module_invoke_all('cron_queue_info');
  drupal_alter('cron_queue_info', $queues);

  // Limit to the queue(s) that we specifically want.
  $queues = array_intersect_key($queues, array_flip($queue_keys));

  foreach ($queues as $queue_name => $info) {
    DrupalQueue::get($queue_name)->createQueue();
  }

  $implementations = module_implements('cron');
  $implementations = array_intersect($implementations, $cron_modules);
  foreach ($implementations as $module) {
    // Do not let an exception thrown by one module disturb another.
    try {
      module_invoke($module, 'cron');
    }
    catch (Exception $e) {
      watchdog_exception('cron', $e);
    }
  }

  foreach ($queues as $queue_name => $info) {
    if (!empty($info['skip on cron'])) {
      // Do not run if queue wants to skip.
      continue;
    }
    $callback = $info['worker callback'];
    $end = time() + (isset($info['time']) ? $info['time'] : 15);
    $queue = DrupalQueue::get($queue_name);
    while (time() < $end && ($item = $queue->claimItem())) {
      try {
        call_user_func($callback, $item->data);
        $queue->deleteItem($item);
      }
      catch (Exception $e) {
        // In case of exception log it and leave the item in the queue
        // to be processed again later.
        watchdog_exception('cron', $e);
      }
    }
  }
}

// Only run the cron implementations and queues that the
// XML sitemap project provides.
limited_cron(array('xmlsitemap', 'xmlsitemap_node'), array('xmlsitemap_link_process'));

This allowed me to very easily get my sitemap built up quickly, by repeatedly calling limited_cron() with just the things that I needed to run. Note that it doesn't act exactly the same as the normal Drupal cron, which runs as the anonymous user and avoids updating the session. But I have found myself returning to use this on other projects for other modules' cron queues. Hopefully you might find it useful too :-)

Photo by Ian on Unsplash

Hi, thanks for reading

ComputerMinds are the UK’s Drupal specialists with offices in Bristol and Coventry. We offer a range of Drupal services including Consultancy, Development, Training and Support. Whatever your Drupal problem, we can help.