Scale static-content archives and tag pages with Nanoc

Nanoc is the static website generator that among other things powers this blog. Blogs often have taxonomy pages listing entries by topic and by publication date. Nanoc has some helper methods that can get these pages done quickly. But these methods don’t scale very well for large numbers of blog entries and taxonomies. Over time you’ll want to optimize beyond what Nanoc’s helper methods offer.

The problem with Nanoc’s helper methods is that they create a lot of unnecessary dependencies. The way most of the items-filtering focused helpers work ends up creating a dependency on every other site item, including static assets like images, to find the specific items you want. If any item changes, all items built with the default helpers need to be rebuilt even though no relevant items have changed.

Ideally, you want your pages to have as few dependencies as possible or you’ll slow down the site compilation significantly. Checking dependencies make up the brunt of processor-bottlenecks during compilation task for sites with many items.

Related: Speed up your static site generator and other I/O blocked tasks by using a performant file system.

Let’s take a simplified look at what the sorted_articles helper method does to return a list of articles sorted by creation date:

@items
  .select { |item|
    item[:kind] == "article"
  }
  .sort_by { |item|
    item[:created_at]
  }
  .reverse

Using @items.select in a layout template would create a dependency on every item in the project. Other helpers that work with items have the exact same problem, so let’s get rid of the helper methods and optimize!

Assuming you know where your articles are located (like /entry/) you can speed this up using the @items.find_all method to filter the dependencies. This method accepts one glob pattern, which will reduce the dependency tracking to just the items that match the pattern:

@items
  .find_all("/entry/*.html")
  .sort_by { |item|
    item[:created_at]
  }
  .reverse

For a blog, you’d probably use something like the above in at least three different templates to generate the front page, an XML syndication feed file, and an XML site map file. In other words, you’d be doing the same task three times.

You can speed up things again by moving the item filtering out of the template and over into to the pre-processor. We can build on the above to create yearly archives and tag pages while only doing a single pass through the full list of items. We’ve already got a list of entries sorted by date, so let’s filter out lists of items per year and tag as well:

# SPDX-License-Identifier: CC0-1.0

year_arch = Hash.new
tags_arch = Hash.new

@items
  .find_all("/entry/*.html")
  .each { |item|
    year = item[:created_at].utc.year
    unless year_arch.include?(year)
      year_arch[year] = Array.new
    end
    year_arch[year]
      .push(item.identifier.to_s)

    next if item[:tags].to_s.empty?

    item[:tags]
      .each do |tag|
        unless tags_arch.include?(tag)
          tags_arch[tag] = Array.new
        end
        tags_arch[tag]
          .push(item.identifier.to_s)
      end
  }
  .sort_by { |item|
    item[:created_at]
  }
  .reverse

You’ve then iterated through only the relevant items, just once, and built the hashes for each taxonomy along the way. The next step is to create the items and pass along the list of dependencies.

# SPDX-License-Identifier: CC0-1.0

year_arch
  .each do |year, year_items|
    identifier = "/archive/#{year}.html"
    attr = {
      archive_year: year,
      archive_items: year_items,
      identifier: identifier
    }
    @items.create("", attr, identifier)
  end

tags_arch
  .each do |tag, tag_items|
    identifier = "/tag/#{tag}.html"
    attr = {
      archive_tag: tag,
      archive_items: tag_items,
      identifier: identifier
    }
    @items.create("", attr, identifier)
  end

In the relevant layouts, you can then access the pre-processed list of dependencies by reading the archive_items attribute and passing it along as a glob list to @items.find_all:

pattern = "{#{@item[:archive_items].join(',')}}"
@items.find_all(pattern)

I’ll leave it as an exercise for the reader to apply the same optimizations to their sitemaps, syndication feeds, front page, and other items that depend on other items.

Pro tip: Divide up your XML sitemap into multiple files, e.g. by creation year, and combine them again with an XML sitemap index. That way you won’t have one massive slow-to-calculate item that depends on all items.

Using the optimizations discussed in this article, I was able to reduce my compilation time by almost three quarters. More importantly, creating a new item no longer requires every item’s dependency to be recalculated. Also, adding a new tag or another year no longer adds a big performance penalty to the compilation time.

I’ve found plenty of references to Nanoc getting slow over time in ten-year old blog posts. The thing is, not much have changed in terms of the default helpers over the years. The helpers may very well be responsible for Nanoc’s reputation for being fast in the beginning and then slowing down over time.