How a Hypercore P2P innovation could bring more privacy to IPFS

The InterPlanetary File System (IPFS), like many other peer-to-peer (P2P) file-sharing networks, has a privacy problem. IPFS clients constantly broadcast what they want to download and what is available for upload. Anyone that can observe your network traffic, like an Internet Service Provider (ISP) or other snoops, can see what you’re sharing.

IPFS clients exchange data over encrypted connections. That’s currently a meaningless privacy precaution when it’s so trivial to determine what’s being transferred. This is a fundamental problem with how IPFS is implemented.

Every file on IPFS is described by a cryptographic hash; a one-way operation that results in a long string of numbers and letters. IPFS calls these hashes Content Identifiers (CID).

The same content always produces the same CID, and even the tiniest modification always produces a completely different CID. It’s mindbogglingly unlikely that different content, coincidentally or deliberately, should result in the same CID. This technology is the same principle used in BitTorrent Info-Hashes (BTIH). IPFS uses the newer SHA-256 hash function. BTIH uses the older SHA-1 hash, which is vulnerable to deliberate hash collisions. BitTorrent 2 switches to SHA-256 but isn’t widely supported yet.

To download a file over IPFS, you can’t just vaguely ask an IPFS client to “send me that one funny picture of a cat”. You need to know its CID. So, you either need to already have an exact copy of the data to generate the CID, or someone must have shared the CID with you. Keep the CID secret and no one but the people you share it with knows to request your secret files. Except, IPFS doesn’t keep CIDs a secret.

To discover peers, other clients who already have or want a file, IPFS, like other P2P networks, turn to the Distributed Hash Table (DHT). DHT is a massive database of who has what and where they are on the internet (their IP address). The database is self-organizing, and it’s hosted distributively among all the participating clients. It’s the same technology used in BitTorrent and most other P2P systems.

Crucially, DHT network traffic is unencrypted, and the entire database is public anyways. Anyone monitoring your network traffic can observe the CIDs you’re downloading and the CIDs you’re sharing on the DHT. Anyone capable of monitoring the DHT at scale can get a complete picture of what CIDs are available and mass-download everything.

It’s a fundamental requirement that means nothing you do on IPFS or BitTorrent can truly be kept private. Except, the Hypercore Protocol (formerly known as the Dat Protocol) found a better way to do DHT and introduce some sorely needed privacy protections. Hyprecore is yet another alternative to IPFS and BitTorrent, and it uses many of the same technologies.

Hypercore’s innovation was to stop broadcasting its version of a CID publicly to the DHT. Instead, it runs the hash (the “content hash)” through the same one-way cryptographic hashing operation one more time. It broadcasts the resulting hash-of-a-hash (the “discovery hash”) to the DHT instead of the original hash.

To download a file over Hypercore, you need to know the content hash. Without the original content hash, you can’t generate the discovery hash, and you won’t know which hash to ask other clients to transfer. Once you’ve found some peers using the discovery hash, you establish an encrypted connection to them and request the content hash. The clever trick results in a layer of privacy that means that only someone who knows the content hash can request a copy of the content.

You fundamentally can’t hide from other peers who’re already in on the secret, though. Technically, you can hide from other peers using slow and convoluted onion routing. See Tribler’s take on BitTorrent. For a P2P network to function, peers must be willing and able to exchange data with each other. Hypercore effectively keeps snoops out of the loop, however.

The discovery hash also strikes a nice balance between privacy and law enforcement needs. They can monitor the DHT to find unlawful content only after the content has been identified through good old-fashioned police work. Cease one criminal’s computer or files, and you can find out who else is offering the offending content on the network.

When combined with peer-connection encryption, which IPFS already has, you get fairly good privacy protections. It’s a small change with a big impact. IPFS, BitTorrent, and other implementations of P2P networks definitely should implement this system. It allows for the use of peer-to-peer file-sharing tools for more use-cases, such as more private sharing swarms and direct transfer of private data between your devices.

The added privacy protection is a must for IPFS to succeed as a replacement for HTTP on the web. The Brave web browser supports IPFS. Opera also has some limited support for IPFS. The discontinued Beaker Browser supports Hypercore. The discontinued Maelstrom browser supported BitTorrent websites. The Agregore P2P concept browser supports all three protocols and more.