Well-Known URI vs DNS-SD for distributed web service-discovery

In this article, I want to discuss the use of Well-Known URIs and DNS based service discovery (DNS-SD) methods for mapping domain names to resources on the distributed web. I’ll focus on the different method’s ability to route around internet censorship and their centralization, and talk about some suggestions for improving the current implementations used by distributed web projects.

There are two primary methods used to auto-discover services offered on a domain: you either send it a web request to a predetermined service-discovery address or you can query the Domain Name System (DNS) for a predetermined service-discovery record.

The distributed web, like the regular web, relies on these two methods to discover the resource addresses used to retrieve content by a domain name in various distributed networks. I’ll discuss each of these methods in turn.

Well-Known URIs (RFC 5785) are simple to implement on any web server where you control the root of the domain and can expose files on the domain root. The Dat Protocol and the Beaker Browser uses this method to discover websites that offer a distributed Dat archive by requesting the archive’s hash fingerprint from a file served at https://example.com/.well-known/dat.

This method is relatively simple to explain and setup. However, it also has a single point of failure: the website serving the file. Depending on the website’s measures to make itself redundant, it can relatively easily be either blocked or even knocked offline; taking the mapping between the domain and the distributed web version of the site with it.

Dat and the Beaker Browser also support an alternative same-domain lookup method to the Well-Known URI discovery scheme. I’ll get back to this later in the article.

I performed a survey of the top 2,4 million websites (as ranked by the Tranco list; see the article sources at the end) in January 2019 and found ten Dat enabled websites. All of them used the Well-Known URI discovery mechanism, except the Beaker Browser’s own website which also offered the alternative DNS-SD method.

The Beaker Browser focuses on allowing people to build and host distributed websites right from their browsers. The project does also offer a paid-hosting service called Hashbase that makes distributed websites available over HTTPS and acts as a Dat distribution node. Hashbase customers get a customer subdomain at Hashbase that gives their distributed websites a friendlier name than a random hash value.

Hashbase uses Well-Known URIs to map their roughly 2600 customer’s domain names to Dat. Hashbase is without a doubt the single largest distribution node in the Dat network. The service have seen multiple outages that have rendered these distributed websites unavailable, even when there have been other distributed nodes hosting them on the network.

The Domain Name System (DNS) is, unlike most parts of the web, designed to be decentralized. DNS is [mostly] fast and it can be secure. DNSSEC can be used to guarantee that a dweb content address response from an authoritative DNS provider gets to your devices unmodified. Communication between your devices and your recursive DNS provider can also be secured by encrypting them through TLS or HTTPS connections.

Unless you can block off a domain’s authoritative DNS server from the internet at the source, recursive DNS servers can and will be able to resolve the domain name. (A recursive DNS server is the DNS server provided by your ISP or a provider like OpenDNS, Quad9, Cloudflare, or Google.)

A domain can be configured with multiple authoritative DNS providers to create redundancy and make it harder to block the domain. DNS blocking is hard to do right and easy to route around, as demonstrated by Turkey when they attempted to block BunnyCDN (Ctrl blog’s CDN provider.)

We’ve seen an increasing centralization in the recursive DNS space in recent years. Google products, including Chromecast; Chromebooks, and the Chrome browser, is hardcoded to use Google’s DNS servers by default instead of the ones provided by your ISP. Fancy and easy to remember free-to-use recursive providers like 1.1.1.1 and 9.9.9.9 have also helped centralize a once decentralized system. However, if your DNS provider blocks your access to anything you can still get to it by using any other DNS provider.

The Dat project have opted to use Google’s public DNS resolver over HTTPS as their default recursive DNS resolver instead of using the system’s DNS settings as most applications. You can’t configure another DNS resolver in the Beaker Browser, but you can if you’re building your application on top of the Dat project’s libraries.

This again introduces a single point of failure that can be blocked (or suffer a temporary service outage) for Dat and Beaker; a pair of projects that oddly seems to prefer centralized single-vendor solution over decentralized solutions to build their version of the distributed web. I’ve put forward a patch that will make Dat randomly choose between Cloudflare DNS, Google DNS, or Quad9 instead of relying on a single system.

The InterPlanetary File System (IPFS) uses DNS records for service discovery and don’t support Well-Known URIs. IPFS and Dat support discovering resource addresses in a TXT record stored on the bare domain name. IPFS also supports storing the domain record in a subdomain, an approach that has several advantages over keeping it on the bare domain.

You notably can’t provide additional DNS records — like the ones used to discover content on the dweb — on a domain that’s resolved using a CNAME; such as is the case for popular websites that rely on content delivery networks (CDN).

A service-discovery specific subdomain can be delegated to another DNS server (using NS records) or ‘forwarded’ to another domain (CNAME records). Both approaches let domain owners segregate the service discovery mechanism from their main domain zone.

This can help ensure the security of the DNS zone as the DNS service discovery can be outsourced to a third-party or handled by a dedicated server that can’t overwrite the main zone. It notably could allow website owners to purchase IPFS nodes and hosting from a service provider and delegate or forward the required part of their domain to the third party. This flexibility could help drive adoption of the distributed web.

IPFS’ service discovery subdomain is called _dnslink and I must admit that I hate that name. You’re querying DNS for information about IPFS and not the other way around! “DNSLink” has no defined specification and I’ve offered some notes and suggested a few changes to help drive that specification forward. I’ve also proposed and argued that Dat should follows suite and shift to using a subdomain as well. The bare domain method should be deprecated to increase performance and reduce unnecessary DNS traffic.

Conclusions

The distributed web is still evolving and may end up using other alternatives to DNS, like the InterPlanetary Name System (IPNS), making this entire discussion mute. However, using a vastly decentralized ecosystem like DNS aligns quite well with the stated goals of many of the distributed web projects.

Well designed and standardized DNS based service discovery allow for service discovery to be decentralized and offer deployment flexibility to domain owners compare to a file stored on a single centralized point in the network.

It also enables people a relatively easy way to route around internet censorship. As a bonus: DNS based service discovery will in almost all cases be hundreds of times faster than anything requiring a full network connection and secure link negotiation with a web server.

Well-Known URIs can be quite useful but their reliance on a centralized server and single point of failure makes them the lesser option for discovering resource addresses on the distributed web. Especially when there’s a better and more decentralized options available.