🅭

Improving the New York Times’ line wrap balancer

I looked into options to improve the breaking points for line wrapping on the web. I found a few “text balancer” programs that use different methods to even out the number of words per line on the fewest number of lines possible. I wasn’t happy with any of them, but ended up improving on the New York Times’ text balancer to get something useable. It wasn’t how I imagined spending my weekend.

Web browsers follow a simple algorithm for laying out text: one word after the other, and wrap onto a new line when there’s no more room on the current line. It’s fast and produces good enough results in most cases. However, it doesn’t guarantee an even distribution of words and you can end up with a single word on a line by itself (known as a “widow”).

An uneven distribution of words can make the design heavier on one side; making it unbalanced. It can be a small eye-sore at the end of a large paragraph of text. However, it draws unwanted attention to itself when it appears in a headline and other large type.

The simplest solution is to rewrite the text until you get a better fit. However, you can’t rely on rewriting a text to get a perfect fit for every visitor. The fit will depend on the screen size and the font and platform (or require a webfont). You also end up doing more work and possibly awkward wording for the sake of the design.

A text wrapping balancer is a program that tries to more evenly distribute words over multiple lines. There are at least two dozen algorithms used to achieve this. The most common one found on the web is the Adobe BalanceText project.

The Adobe text balancer analyzes the text, measures the length and text-breaking opportunities (such as hyphens) in each word, and inserts line break elements (<br>) at opportune points to achieve more balanced line breaks. It works; but it’s slow, RAM-intensive, and overly complicated.

In 2013, Adobe proposed a new CSS text-wrap: balance property to make the browser do (and optimize) the heavy lifting. TextBalancer is a JavaScript implementation of this CSS property. The property has yet to be implemented into any web browsers. (Come on, guys! It’s in CSS Text Level 4.)

In 2017, Harrison Liddiard, then an intern at The New York Times (NYTimes), came up with the idea for a new and simpler text balancer. Out of that idea grew the NYTimes text balancer. It’s smaller, faster, and cheekier than Adobe’s more complex text balancer.

The NYTimes text balancer measures the number of lines, and then reduces the width of the text container until the point where it breaks onto a new line. Then it widens the container a bit to avoid the extra line break. Voilá, a simple yet effective text wrapper that works great on headlines.

The method is suitable for headlines and a single short paragraph of text. It’s not suitable for long paragraphs of texts or multiple paragraphs. It yields no benefit over the browser’s default layout engine when the text exceeds a couple of lines. You’ll also end up with paragraphs of uneven widths, which breaks up the cohesion of the right-hand text edge of your design. It works great for two–four lines of text, though.

The NYTimes text balancer isn’t perfect, but it’s much faster than Adobe’s solution. Both companies have open-sourced their implementations. I took a look at the NYTimes implementation and found plenty of room to improve it. The rest of the article covers the changes and improvements I made to the balancer. I have to share what changes I made to it to comply with the Apache License version 2. If you’re only interested in the code, then this is your cue to leave:

Get source on Codeberg

The Codeberg project page contains a boilerplate.html file you can use as a template, as well as the main text-balancer.js file. I recommend you read the usage instructions on Codeberg and the remainder of this article to understand what’s going in with the template file.

The first issue with the NYTimes text balancer is that it’s triggered on every resize event. This is bad for performance reasons as the web browser can spew out hundreds of resize events in some situations. The text balancer algorithm even runs when there’s been no change to the size of the text container.

The NYTimes version changes the size of the text element itself, which is also used in future calculations to find the desired width. This means it can only shrink the text container and never expand it. I changed the implementation to use a ResizeObserver that only gets triggered when the text container’s parent changes size. It assumes changes to the dimensions of the parent container can be used as a proxy for the text container and its size constraints. This change allows it to expand the text container.

The text balancer uses a binary search algorithm to determine the max-width to apply to the text container. The algorithm searches the space from 0 px to the full width of the text container. However, we know the answer can never be less than 50 % of the text container’s width (because then it would fit onto fewer lines). I changed the text balancer to search in the range 50–100 % of the text container’s width. This change skips the first iteration of the binary search algorithm.

The text balancer needs to determine if a text is spread over multiple lines. The NYTimes’ multiline-checker is buggy and slow. It tries to isolate the first word of the text and then measures its height against the height of the text container. However, the implementation treated any HTML in the headline as plain text; causing it to calculate the wrong number of lines. It was also slow, RAM-hungry, and needlessly interacted with the Document Object Model (DOM).

I managed to get a 4500 % performance improvement over the original implementation using getComputedStyle to inspect the text container’s height compared to its line height. My approach is still susceptible to make mistakes when the headline contains other inline-block elements (e.g. images) or stylized small text (<small>). However, it works better with bold, emphasized, code, emoji, sub- and superscript text, links, and other phrasing content formatting.

I’ve sampled a few dozen pages from the NYTimes website in the Internet Archive, and I can’t find any uses of the NYTimes text balancer on its website. I can find instances of the Adobe text balancer on the NYTimes website in 2015 and 2016. The NYTimes website currently doesn’t use any JavaScript-based text balancers.

Data from PublicWWW — a service for searching in website source code — shows that the Adobe text balancer is used on 3000 websites. The NYTimes text balancer is currently only used on The Texas Tribune and Nieman Lab.

So, why is it that the NYTimes and other websites don’t use JavaScript-based balancers? It’s probably because of the last remaining problem, and it’s a really tough one to fix: the flash of unstyled content (FOUC).

The FOUC is the intermediary state after the page has been laid out and before some extra resource has loaded. Typically, this resource will be the loading of a blocking resource like a web font or stylesheet. In this instance, I’m talking about the intermediary state between the initial page layout rendering and the text balancer script getting executed.

You can’t calculate and apply the text balancing to text before the page gets laid out. However, you don’t want the reader to see the text appearing laid out one way and then immediately disrupt their reading by reflowing the text. This problem is somewhat unsolvable. It’s why Adobe proposed that the web browsers take care of this natively with a CSS property.

The best approach is to embed the required JavaScript and CSS inline on the page, and then temporarily hide the text pending the execution of the text balancer. It’s not ideal, but it’s not costly either. In my tests, it delays the text rendering by 1–9 ms on a fast device and up to 110 ms on lower-end devices. However, under terrible network conditions on a slow device, it can delay rendering by more than a second.

You also need to ensure there’s a noscript alternative in place in case JavaScript has been disabled. There’s a neat self-contained CSS-only option in the form of the @media (scripting: none) media query. Unforgettably, it isn’t yet supported in any web browsers, so I had to rely on the more traditional approach of embedding a style element inside a noscript element.

Despite all these precautions, there’s still a risk something might go wrong and the text wrapper fails to execute properly. To address this, I added a separately scoped self-invoking JavaScript function that reveals the hidden text on a three-second timer.

I also wanted to future-proofed the implementation by detecting support for and deferring the text balancing job to CSS. This is achieved using an @supports (text-wrap: balance) CSS query, paired with a CSS.supports('text-wrap', 'balance') test in JavaScript. It might be wishful thinking, but if a future browser supports it — it’ll bypass the text-hiding and wrap-balancing dance altogether.

The result won’t be perfect, but it’s definitively an improvement over the default layout engine. You can optimize the wrapping results by liberally sprinkling long words in your headlines with soft-hyphen characters (U+00AD). A soft-hyphen denotes a possible line breaking points where the browser can wrap text onto a new line. It’s an invisible character except when it’s at the end of a line when it will be rendered as a hyphen. You can find hyphenation libraries for all poplar programming languages for dynamically adding soft-hyphens. (You should avoid CSS hyphenation (hyphens: auto), as it’s results vary greatly from language to language and browser to browser.)