Why are ‘what to read next’ article recommendations so terrible?

On a lot of websites, you’ll find a list of “Recommended articles” or “Related articles” below the main content. But why are so many websites recommending you read the article that you’ve just finished reading?

Just a quick preamble to note that I’ll be talking about actual article recommendations and not crap-content-from-around-other-websites or other camouflaged ad units. Those things are the worst.

You’ve just read an article, and scroll down past the end of the main text. There you’ll find either a row of square images or other story headlines under the title “Recommended articles”. The first recommendation is the article you just read, and the second recommendation is for the article that linked to the one you’ve just read. The remanding recommendations are vaguely related but not that interesting. Why are these recommendations so often so bad?

I see this again and again on so many websites! From the smallest blogs to the largest news organizations: they all give lousy article recommendations for what to read next. These lists are designed to catch your interest as you reach the end of an activity (reading) as you start to wonder what to do next (continue reading!) However, despite being a means to retain visitors, many websites are doing a terrible job at giving recommendations.

First off, there are a ton of websites that recommend/promote the same article as the current page. The current article is indeed the most related to the current article, but this is just lazy programming. Excluding the current article should be a ten-second coding job, yet oh so many don’t do it. You can see this problem on the official Google Blog website.

The same goes for recommending the page that linked you to what you’re currently reading. On the web, information about which page referred you to the current page is transmitted in the Referer (sic) HTTP header. Like with excluding the current page from recommendations, excluding the referring page is another ten-second coding job. You can see this problem on the New York Times website.

These two issues alone demonstrate how little thought and effort are put into these recommendation systems.

Did you see the front page?

Say you’re visiting the front page of any news site. You scan the headlines and then click through and read one of the articles. In the “recommended article” section below the article, you read is everything you just skimmed and didn’t find interesting enough to read from the front page. (More often than not including the article you just read!)

This is okay when it’s labeled as “latest articles” or something similar, but calling all the newest articles recommended doesn’t show great care in selection. Not everything that’s published by any website is pure gold and deserves to be labeled as recommended.

This is by far the laziest and one of the most common fillers for the “recommended articles” section. It’s what you’ll find on websites where a manager or designer said there had to be such a section, but no one took the time to specify how it should work.

Relevancy – to whom?

“Relevant articles” is an alternative form to “Recommended articles”, but it’s meant to serve the same purpose. Instead of arbitrary “recommendations”, visitors are shown somehow-relevant articles. I’d say this is a distinction without a difference, as these headlines are often used interchangeably.

List of relevant/interlinked articles is normally generated by looking at similarities in the keywords found in the article text, or sometimes tags/labels, to find similar stories.

The problem with trying to automatically produce a list of relevant articles is that everyone gets something different from reading the same article. Visitor A may be interested in one tidbit of information, whereas Visitor B might have been interested in some entirely different aspects of the same article. Blindly keyword-matching against other articles means that, unless the website has an absolutely enormous amount of articles, every article will get poorly matched articles that fail to match the individual’s expectancy of what would qualify as “relevant” to them.

As this kind of article recommendation requires as many articles as possible to find something relevant, old and outdated articles will often be allowed to resurface.

Tracking trends

Many websites use at least one traffic tracking and analytics solution that keeps an eye on what content is popular and on visitors’ movement through a website. This kind of analytics is some times criticized for being overly intrusive and violating the privacy of a website’s visitors.

However, this data could be a great source of data to improve article recommendations. Web analytics solutions can easily produce data on things like which article people spend time reading after finishing the current article. However, this kind of data is only collected but seemingly isn’t acted upon.

The constant tracking of everyone should at least benefit people by producing better article recommendations. Aggregating trends from traffic patterns and analytics data isn’t even a hard programming job.

This was just a collection of thoughts I’d about content recommendation while working on the new home page and article recommendation system for Slight Future. I made a custom-built solution that weights keyword relevancy, trending articles, and I’m experimenting with using past click-through performance of individual recommendations. I’ll probably wrap it up as a new WordPress plugin before long.

Getting article recommendations right can be a complicated task. However, I don’t believe that filtering out the current and referring page is asking too much. No one wants to read the article they just finished again straight away.