🅭

How to label a webpage as being out-of-date

You might not want search engines to direct people to your outdated product pages or blog posts. Especially when your site has other and more up-to-date webpages available. Here’s how to label webpages as being out-of-date for humans and machines.

I’ve previously outlined some technical and writing advice for when you’re linking to outdated or inaccurate information. In this article, I’ll take that concept one step further and explore how you can mark up your own webpages as being outdated.

The general advice for outdated pages is to update them with new information whenever possible. You can salvage an outdated but already popular webpage by bringing your readers the most up-to-date information on the topic. An alternate approach — often favored in the search engine optimization community — is to simply delete the pages. Updating a webpage takes a lot more time than deleting it. However, deleting webpages leads to link rot (dead internal and external links). Deletion throws away all the value a webpage has built for your website.

Sometimes, a webpage just can’t be brought forward without rewriting it entirely or its subject has lost all relevance. Some websites automatically label pages that haven’t been updated in a few weeks, months, or years as outdated. A few even manually label outdated articles. For this site, I’ve chosen to manually review old articles every few months to either update them or label them as outdated.

So how do you go about labeling a webpage as being out of date? For humans, it’s as simple as including a highly-visible notice near the headline saying it’s outdated. You could go the extra mile and expand on that and describe why and when it was made obsolete.

But how do you label a link as outdated for search engines and other machine-readers? To find the answer, I reviewed 110 pages on 110 different news websites. I selected pages where I could identify a human-readable notice saying the page was out-of-date. I also looked for the metadata I’ll discuss later in this article on over six million webpages. To my surprise, I couldn’t find a single website that had made any effort to label their own pages as outdated in a machine-readable format. I guess no one wants to invest one second in search-engine un-optimization?

There is an industry standard for metadata to label pages with their expiration date. The expiration date represents when the page’s content is no longer up-to-date. Expiration in this context must not be confused with HTTP cache expiration.

The two most relevant vocabularies for expressing this metadata is the Open Graph Protocol (used by social networks) and Schema.org (used by search engines). The below example shows HTML markup demonstrating labeling of a page that became outdated in October 2020 The metadata tags can be included anywhere in the document, but are typically part of either the <head> element or included immediately after the page’s headline.

<meta content="2020-10-01T00:00:00Z"
  property="http://schema.org/expires">
<meta content="2020-10-01T00:00:00Z"
  property="http://ogp.me/ns/article#expiration_time">

Unfortunately, you have to duplicate the tags instead of merging the property attributes (normally a space-separated list of values). I’ve previously covered how Facebook, Twitter, (and Bing) are bad at parsing metadata properly.

I’ve found no indication that any of the leading social media networks or search engines use this metadata. As far as I can tell, it hasn’t seen any adoption in the industry at all. The incentives of the web are to drive ever-more clicks to your own website, and applying negative labels hasn’t been a priority. Search engines may use this information to reduce the rank of outdated pages (except maybe for searches in a given date range).

Update (): There is also another meta option that Google has committed to support. However, this option explicitly tells it to exclude the page from their index after a given date. Here’s an example of this directive:

<meta name="robots"
  content="unavailable_after:2020-10-01T00:00:00Z">

Recording expiration times in your content management system (CMS) does have other advantages, though. You can use the same metadata to conditionally apply the human-readable label. You can also reuse the metadata in other contexts such as lowering the page’s visibility in the site’s content recommendation system.

Something that will affect search engines directly is to lower the priority score of outdated webpages in your XML sitemap file. The sitemap file lists every webpage on your site. It can also be used to assign pages a priority score relative to other pages on your website. You can apply a lower score to outdated pages based on the expiration date in your CMS. Here’s an example of a URL entry in an XML sitemap with a lowered priority (the default priority is 0.5).

<url>
  <loc>https://example.com/outdated</loc>
  <priority>0.1</priority>
</url>

I’ve left it up to the search engines to decide whether to exclude an outdated page from search results or not. If you want to do that, you can exclude it from your XML sitemap, add robots-exclusion metadata to the page, and disallow crawlers from accessing your outdated pages. Excluding robots is one step removed from outright deleting an old page.

I recommend you keep your old pages generally available, however. Sometimes an out-of-date page is precisely the thing someone was looking for. For business, a visit to a page for an outdated product or discontinued service is an opportunity to promote your new products or services. For blogs and news sites, you can recommend other related and more up-to-date articles through a related-reading widget.

You should probably remove outdated pages from your content recommendation system. Recommending out-of-date articles negatively affects reader-experience. There may be an exception to this rule when it comes to recommending outdated-but-highly-related pages from pages that are themselves out-of-date. This is a corner case, however. You can reuse the page metadata that tracks expiration dates to exclude out-of-date pages.

All the technical points in this article have evolved around a single data field that most content management systems don’t currently store. As you might have gathered by now, the intended audience for this article is CMS developers. You should forward this article to your CMS provider and ask them to add fields for tracking page expiration dates.