Camera crafts – Three cameras made out of cardboard.

Put your image metadata to use embedded in your image files

Your content management system (CMS) probably already have a lot of metadata associated with your website’s images. Including information about the image’s creator, licensing and copyright notices, when and where you obtained it and under which terms, titles, and image descriptions or “alt” texts, etc. Put this information to use for you inside your image files to protect your intellectual property rights and make the images more discoverable.

There’s often some information about an image when it’s embedded on a webpage, but it’s not so common to see the same information make it into the image files themselves. E.g. you’ll frequently see copyright notices and descriptions underneath an image on a webpage or find that an image has been embedded on a webpage along with an alternative textual representation of the image included for accessibility purposes.

Metadata that’s just displayed on a webpage along with the image or that’s locked away in an CMS is disassociated with the image file itself. Having it on a webpage or inside some database is of little help if you’re working on or looking for a copy of the file on your local computer, when it appears in a web search result, when a visitors “borrow” it from your website, or when you publish it on a content-addressable distributed web alternative like IPFS.

The solution is, of course, to embed the available metadata into the image file itself. In many cases this process can be automated by the content management system; allowing users to edit the metadata that goes into the file directly from their CMS.

There are a few different standards for embedding metadata information in image files, depending on the file format. The Extensible Metadata Platform (XMP) is probably the best candidate as its widely supported and there are a number of metadata editing libraries and programs that can work with it.

Image metadata have gotten a bad reputation on the web because people associate it with file bloat caused by poor handling/prioritization of metadata, issues with the metadata specifications, and inefficient metadata embedding clients. For example, the XMP Specification recommends including 2–4 kilobytes worth of space characters for “padding” at the end of the metadata section of the file. This recommendation gives metadata editors some extra flexibility but it also means everyone needs to download 2000+ bytes of literally nothing to view the image. Most webservers will not be configured to apply on-the-fly compression to binary files so this type of empty data and unnecessary use of white-space won’t be reduced by compression.

There are also a number of poorly documented compatibility pitfalls one needs to stay clear of when working with XMP metadata. E.g. XMP has to appear before the image data inside PNG files to work with Adobe and Apple software products, and images are limited to only use the “x-default” language code to be displayed properly within Windows File Explorer. (Multilingual metadata support is well documented in the specification but support seems to be limited exclusively to macOS and other Apple products.)

Issues like this require attention and fairly deep knowledge of how exactly metadata is being embedded into images and the understanding to apply targeted optimization to remove it.

Almost all cameras, including mobile cameras, embed a lot of technical information about the state and settings of the camera with every image by default. This information can be useful to an image editor and possibly to a camera enthusiast. However, you should probably strip away almost all of this information when you’re publishing an image to the web. This information, with the possible exception of geolocation coordinates and the camera make and model, will probably never be useful to anyone and should probably be removed unless you’ve specific needs for it.

Human authored metadata on the other hand, like the one you can create within your content management system or photo editing software, is more useful to both people and to machines. This information can be extensive but the most useful labels you can apply to an image file are probably a title, a description of the image, creator credits, as well as copyright and licensing information for the image.

It’s much easier to prove that someone stole your image or violated your license terms when your name and the license terms are quite literally written into the stolen file.

Google Image Search will display the creator’s name, credit line, and copyright statement next to images based on the attribution information embedded in XMP information inside image files. This can help people make better choices (e.g. not steal your image) or get information on how to license the image from you.

Bing Image Search and Google Image Search both support filtering image search results based on the usage permissions granted with an associated Creative Commons license. It doesn’t yet appear like either search engines source licensing information directly from metadata embedded into the image files. However, Creative Commons have defined a namespace and vocabulary for embedding license terms and the attribution details required to comply with the terms of the license directly into image files as an XMP format extension. It’s well worth looking into if you’re licensing your images permissively with the well recognized Creative Commons licenses.

The File Explorer in Windows, Image Viewer and to some extent Finder in macOS, and GNOME Files and GNOME Image Viewer on Linux, will all display an images embedded title, description, creator, and copyright notice to the user when they look at an image’s properties. Desktop search on all three operating systems also support indexing and searching for images by their XMP metadata. It can be very useful to be able to search for images on your computer by their title, description, and creator instead of being limited to their file names when you’re working with a static website generator, or otherwise handle a large number of static image files.

So if you’ve already have got rich image metadata laying about, be sure to put it to use for you and embed it into your image files. Including some rights statements and descriptions of your images shouldn’t add more than a kilobyte or three to the size of the image. Guetzli processing of JPEGs and Zopfli processing of PNGs can more than make up for the added kilobytes and will probably even shave off a couple of kilobytes extra. The highly efficient WebP image format also supports XMP metadata if you’re concerned about keeping file sizes tiny.