A cartoon figure places a multimedia play button in the middle of a smartphone screen. The outline of a desktop web browser and a larger video player lurks behind the phone.

The HTML <video> element needs to go back on the drawing board

We’ve had the HTML <video> element for over a decade. Yet, everyone still defaults to embedding YouTube frames instead of hosting their own videos. The underlying problem is that the <video> element isn’t suitable for embedding short video files on webpages.

Sure, the <video> element works great for large streaming platforms and tube sites. However, video is nowhere near as simple to use as other adaptive embedded media, such as responsive images.

You need half an hour of learning to get started with responsive images. All you need for responsive images is to include a specially formatted list of an image in different sizes and file formats in your HTML document. The web browser uses the list to pick a format it supports at the right dimensions for the visitor’s device. Here’s a slightly simplified example:

<picture>
  <source type="image/avif" 
    srcset="small.avif 400px, large.avif 800px">
  <source type="image/jpeg" 
    srcset="small.jpeg 400px, large.jpeg 800px">
  <img src="fallback.jpeg" sizes="400px">
</picture>

Neat, right? It’s a bit more complicated than that, but that’s the essentials. Given some familiarity with the basics of HTML, you can guess what it does and even learn that syntax. Using media queries, you can even respect user preferences like reducing motion, saving data, and dark mode.

The story isn’t the same when it comes to video. The HTML <video> element is similar to the <picture> element. It even uses the same <source> element to list videos in different codecs/formats, and you place fallback content inside it the same way.

However, the <source> element does not currently support either the media, srcset, or sizes attributes for <video>. You can only set a single source (src), and its container and codecs information through the type attribute.

HTML doesn’t provide web authors any affordances to send a high-resolution video to a desktop or tablet, and a lower resolution to a mobile phone. You can send an oversized video to mobile devices, but at potentially high data and battery costs. Or you can send an undersized video and scale it up (with ugly upscaling artifacts) to desktops. A 720p (720×405 px) video suitable for desktops and tablets contains ×2,25 times more pixels (roughly ×2,1 times more data) than a 480p (480×270 px) video file for mobile.

You can turn to JavaScript and have it pick the right video, but it’s a complicated problem. Choosing the right codec, handling full-screen mode switches, subtitles, adaptive quality changes, network conditions, pixel density, preloading, … it all adds up. It’s not a quick job to write the logic required to choose an appropriate video resolution, and handle changes on the fly.

The average JavaScript library for handling video resolutions and full-screen mode switching is about 600 KB. Minified but uncompressed size. The figure represents the execution overhead, but not the data transfer size. It’s a small overhead for a 15 minute+ video. However, it’s way too much for a short animation or a minute-long presentation.

You also have to spend time learning and integrating a complicated new library into your documents. Serving video is still relatively expensive, so you might also need a separate library to reduce the hosting costs (e.g. WebTorrent). If you’re planning on publishing many videos, it might be worth it. However, it’s too much overhead just to add a few minutes of video to a blog post every once in a while.

YouTube and other large video streaming services have long since adopted technologies like HTTP Live Streaming (HLS) or Dynamic Adaptive Streaming over HTTP (DASH). These technologies aren’t supported natively in many web browsers. Safari supports HLS. Chrome and Firefox has limited support on Android. Desktop support is non-existant besides Safari. Again, you have to rely on fairly heavy JavaScript libraries to implement the adaptive streaming technologies. You also have to cut your videos up into chunks, prepare playlist files, and attend to more overhead work.

Neither HLS nor DASH is suitable when you “just need to add a simple video to a webpage”. They’re too complicated and too powerful for such a simple use case. The HTML standard has just left this gap unfilled for a decade. It might help explain why everyone just defers to embedding a frame hosted by YouTube to embed video on their websites. HTML video is too much work even if you’re motivated to host it yourself.

Maybe I’m asking for a faster horse here, but I do believe the HTML standard needs to address this issue. There needs to be a simpler way to embed a video on a page and have the web browser pick a file with dimensions appropriate to the device. The default web browser multimedia player also needs to add a control to let the viewer override the quality picked by the browser.

Scott Jehl kickstarted the discussion about this in January 2021 with his call to add the media attribute to the video source element. It’s supported in Safari, and was part of the HTML standard a decade ago. It was removed from the standard, but … Safari isn’t known for keeping up with the times when it comes to web standards.

The proposal enables web authors to specify different video sources for different screen resolutions. It wouldn’t enable the user to override it, and it’s unclear how full-screen such would be handled. It’s currently being discussed in the Web Hypertext Application Technology Working Group (WHATWG). WHATWG is the organization currently maintaining the HTML standard.

Here’s an example using the capabilities proposed by Scott Jehl. In this example, screens of 700 px or larger gets a large video file, and smaller screens gets a small one instead. You can go more granular than this, but the below would already get 90 % of the job done.

<!-- proposed standard! -->
<video poster="enticing-placeholder.jpeg">
  <source type="video/webm; codecs='vp9, vorbis'"
    media="(min-width:700px)"
    src="large.webm">
  <source type="video/webm; codecs='vp9, vorbis'"
    src="small.webm">
  <source type="video/mp4; codecs='av01.0, opus'"
    media="(min-width:700px)"
    src="large.mp4">
  <source type="video/mp4; codecs='av01.0, opus'"
    src="small.mp4">
  Your browser doesn’t support video. You can
  <a href="large.mp4">download it</a> instead.
</video>

This would be an improvement over the status quo. The media query lets the browser pick a more appropriate source, but it’s left up to the document author to decide what’s best for different devices. The syntax doesn’t give the web browsers any information about what’s different between the different video sources. It can’t make an informed decision about the best source without knowing more about them. Without this information, it would also be impossible for the browser to display a controller to let viewers choose their preferred video playback quality.

I believe that a better solution would be to use the sizes (and possibly even srcset) attributes instead of abusing media queries. Just for a minute, forget how this attribute is used on source elements descending from a picture element. Instead, think of how it’s used for picking favicons. Web authors can include multiple favicon files and the browser looks at the sizes attribute to pick an appropriate size. Let me try to explain it with another example:

<!-- proposed standard! -->
<video poster="enticing-placeholder.jpeg">
  <source type="video/webm; codecs='vp9, vorbis'"
    sizes="720x405"
    src="large.webm">
  <source type="video/webm; codecs='vp9, vorbis'"
    sizes="480x270"
    src="small.webm">
  <!-- equivalent, compact (ordered) form -->
  <source type="video/mp4; codecs='av01.0, opus'"
    sizes="480x270 720x405"
    srcset="small.mp4, large.mp4">
  Your browser doesn’t support video. You can
  <a href="large.mp4">download it</a> instead.
</video>

The browser could then look at the intrinsic size of the video element, the screen resolution, network conditions, and pick the most appropriate source. It could even display a button to let users switch between the available video resolutions.

There are still issues with this approach, but it would make responsive videos on the web just as simple as responsive images. For example, what happens if you’re watching small.mp4, and switch to full-screen mode? Surely, you expect it to switch to large.mp4 instead and continue playback at the same time position. What if the two video files are of different durations? There’s a hornet’s nest of potential issues, but I’d take the occasional stings over the status quo any week of the day.

There are still unresolved questions, however. For example, the poster attribute lets you specify a placeholder poster image. Do we need a new posterset attribute to provide a set of responsive images at different resolutions? Then what about image formats? Or using a keyframe from the video file? Should posters be moved inside the video element as another descendant source element with a kind="poster" attribute? I don’t know.

One thing’s for sure: either we need much cheaper and faster smartphones with virtually free data plans; or HTML video needs to be overhauled to allow for responsive videos. Or we could ignore the problem and continue outsourcing it to YouTube. It works pretty well if you don’t mind centralization and Google injecting ads into your videos.