Your browser lies: the web isn’t written in your native language

Most webpages aren’t written in your native language; despite the entire web being available to you in your language. However, many web surfers don’t seem to realize that they’re reading machine-translated pages instead of the originals.

Over the years, I’ve received hundreds of complaints in foreign languages about the poor quality of my writing in their respective languages. Except, I’ve never written a word in their language. I often can’t even identify the language without running it through a translation service!

I also see this anecdotal evidence of this all over the web. There’s always that one person in a discussion forum, open-source project, article comments, and social media thread that comments in a different language to everyone else. It’s nothing but a small faux pas, but one that might not be the commenter’s fault.

Both the Google Chrome and Microsoft Edge web browsers machine-translate visited pages on the fly. (Firefox and Safari are working on similar features with limited market roll-outs.) It’s not always clear to the reader that they’re reading a machine-translated version. There’s just a small icon in the address field to indicate that the page has been machine-translated.

I’m not arguing against machine translations as a great tool that lowers language barriers and makes information available to more people. They’re great tools and enablers of more communication across language barriers. Except, I believe that the designs of modern web browsers shift the language barrier from the sender to the receiver.

What am I supposed to do with a reader comment or an email in a language I don’t understand? I don’t know what it says, so I can’t know if it contains secrets or personal information. I can’t upload what people send me into an online translator without knowing what it contains.

On-device translation software isn’t really a thing. You can’t sell it as a software-as-a-service offering, so the market hasn’t had any interest in developing it. The European Union funding has recently led to Mozilla and partner universities working on an open-source on-device translator. It currently only supports translating between a few languages, and won’t solve my problem in the foreseeable future.

I wanted to display a notification message to readers of machine-translated versions of my articles to indicate that they weren’t reading them in their original language. Unfortunately, translated pages use an unstandardized rendering mode. There are no APIs for identifying when a page has been machine-translated.

I don’t quite know what I’d want those APIs to look like or do. Perhaps a CSS media query like (page-variant:machine-translated)? The media query can be used to show a message to readers viewing a translated version of a page. This could be especially useful near forms.

Speaking of forms, form input validation should also be extended to handle language requirements. A form could then indicate what languages it accepts, and the browser could assist users in filling out the form using the required language.

Two years ago, I looked for methods to detect when a webpage had been machine translated. I only found implementation-specific detection methods that can slow down the browser on long or complex documents. The detection methods weren’t reliable enough to detect when browsers or other translation software had modified my pages.

That investigation did lead to some improvements, though. I managed to cut roughly 40 % of the complaints about poor-quality machine translations by improving the HTML semantics of the code snippets and machine instructions I publish. It required some time-consuming technical changes to let translators know what parts of a document not to translate. E.g. they know not to translate commands, functions, path names, and the like.

However, the remaining complaints are about the poor quality of the translated texts. I don’t know what to do about those complaints. English isn’t my first language, but I believe the articles I publish are fairly understandable and well-written. The machine translations might improve if I somehow further improved my writing. However, I’m not ready to accept the blame for the poor machine-translation services offered by my readers’ web browsers.

I believe that the underlying problem is that web browsers fail to properly communicate to their customers when they’re reading machine-translated pages. People get the wrong impression and expectations of a translated page when they don’t know it’s machine-translated.

Web browsers should offer to translate form submissions into the page’s original language. The browser already knows that it has translated the page, but it fails to live up to user expectations when it fails to translate form submissions back into the page’s language.

You wouldn’t expect a response from a French bakery if you wrote to it in Swedish. An Italian or Polish comment won’t be welcomed in the middle of a long English discussion thread. Yet, this is what we see all over the web today.

I don’t believe it’s intentional, though. People don’t understand that the web isn’t written in their native language. Their web browsers distort their world-view to a point where everyone speaks a mock version of their preferred language.