URLs don’t belong in <meta> elements

There’s more than one HTML element designed to hold metadata. Any HTML element can hold metadata with HTML+RDFa, but for the context of this article I’ll focus on the HTML-standard <link> element. It’s used to build link relations between documents and resources.

You’ll benefit from having read “Why are Facebook and Twitter so bad at parsing RDFa metadata?” for context before reading this article. However, it’s not a requirement to get something out of this one.

I’d like to set the stage and introduce this topic with a quick recent-history lesson before I get into the difference between <meta> and <link> elements and why URLs don’t belong in the former.

It was decided in the early days of the social web that sharing plain dull looking URLs wasn’t the most enticing customer experience. Good metadata would be key to deliver a better customer experience with detailed and image-rich previews of externally linked webpages. Existing metadata schemas had failed to see much adoption as there hadn’t been many immediate benefits to properly labeling webpages until that time.

“This wasn’t invented here!” said the engineers of Facebook, Twitter, VK, Weibo, and other companies upon seeing existing metadata schemas like the Dublin Core Schema or even Microformats. Instead of urging for the adoption of one of the existing metadata schemas, they all reinvented the wheel and introduced Open Graph Protocol by Facebook, Twitter Card, VK for Publications, Weibo Meta Tags, and similar proprietary metadata schemas. They then asked web publishers to implement and include all of these mostly overlapping in their webpages.

Web publishers were on board with the new schemas as it meant they’d get higher click-through rates from the rapidly growing social media platforms. Half a decade later we’ve seen some consolidation behind Facebook’s Open Graph Protocol as Twitter Cards and Weibo Meta Tags finally saw the benefits of using a single metadata schema instead of insisting on everyone else adopting their pages to suit the platforms.

What all of these companies got wrong independently was their URL data type, used for properties like “url” and “image” to reference the canonical location of the current URL and the location of a representative image. All of these schemas expressly stated that these should be given as string literals instead of URLs.

As an example, here’s the canonical page URL expressed using Open Graph Protocol:

<meta property="og:url" content="https://example.com/">

This seems simple enough and is often seen along with other tags like og:title and og:description. However, all the major search engines had agreed to use link-rel canonical to represent the exact same concept 14 months before Facebook’s introduction of the Open Graph Protocol.

Webpages now had to express their canonical URLs in two different ways:

<meta property="og:url" content="https://example.com/">
<link rel="canonical" href="https://example.com/">

So, what’s the difference between these two objects (other than the vocabulary used to label them?) It comes down to data types: the <meta> element is just a string literal whereas the <link> is a link. A link is aware of the base URL and any base URL overrides like the <base> element, handles relative paths, and normalizes the link. A string is literally just a string.

The following two blocks of JavaScript demonstrates URL processing of a <link> element compared to a <meta> element:

// <link rel="og:url" href="…">
document.querySelector(
  'link[rel~="og:url"][href]'
).href

// <meta property="og:url" content="…">
new URL(
  document.querySelector(
    'meta[property~="og:url"][content]'
  ).content,
  document.querySelector(
    'base[href]'
  ).href || window.location
).toString()

You’ll notice the need to expressly convert the string returned from the <meta> element to a URL object, manually specifying the base URL from the <base> element or by falling back to the current <window.location>, and then back to a string from simply reading the <link> element’s href attribute.

It’s not just a semantic difference as its clearly better to get link type data from an element that natively understands the data type. The semantic difference between these two data types also becomes clear when parsing the document for Linked Data or as an HTML+RDFa document.

You also need to have perfect preexisting knowledge about the data in the element to know that you’re expecting to get from an URL from this one specific <meta> element; compared to having the language level assurance of always getting a URL from the <link> element.

The following example demonstrates combining the two separate elements discussed in this article into one element with as an HTML native link. This is useful if you want to minimize the elements needed to express identical metadata labels or you want to minimize the document size.

<link rel="canonical og:url" href="https://example.com/">

Unfortunately, the above example isn’t supported by either Facebook or Twitter who seem to use regex-parsing instead of RDFa parsing despite both pushing a HTML+RDFa vocabulary standard. The above example is supported by the Bing and Google Search who parses RDFa as RDFa.

The point of this article was to argue for using <link> elements to express URLs. I can’t recommend that anyone change their Open Graph implementations to use them instead of <meta> elements as that would probably cost you any benefit from including it in your documents. (Again due to the social platforms incredibly poor document parsers.) However, I’m hoping to have convinced you to not design your metadata schemas or parser to be as stupid as Facebook’s.

Ctrl.blog

URLs don’t belong in `<meta>` elements