IPFS pinning services overcharge for duplicated objects

Pinning services for the InterPlanetary File System (IPFS) are commercial hosting services that offer to ‘pin’ (meaning ‘permanently distribute’ in IPFS jargon) your IPFS file objects for a fee.

I’ve tested two such services, Eternum (0,01 USD per GiB/day) and Pinata cloud (1 GiB free; and 0,30 USD per GiB/month after that), and discovered that they overcharge for duplicated IPFS objects in their storage accounting.

An IPFS object can consist of a single file; or a set of references to other IPFS objects. I’ll refer to the latter kind as a “directory object” in this article. As an example, say we have an IPFS directory object consisting of references to the unique content hashes for the files File_1 and File_2. You’d have three distinct IPFS objects: the directory object and the two files. When you pin such a directory object you also indirectly pin the two file objects. If you then add another file, File_3, to the directory you’ve got five IPFS objects: the original directory object containing two files, the new directory object containing three files, as well as the three individual files.

Since the contents of File_1 and File_2 never changes their IPFS objects remain unchanged. You’re not storing separate copies of the files even though they’re stored in two different directory objects. The directory objects only reference the file objects by their hashes. IPFS is supposed to be “the permanent web” and you can just leave the old directory object in place to preserve the history of the object. The storage cost of keeping an old version of a directory around is only a few bytes in addition to any new or modified file data.

An IPFS node doesn’t store multiple copies of the exact same IPFS object. It only needs to store one copy of each object even though they may be referenced from multiple objects. The full size of our two directory objects and everything they reference are deduplicated on the storage layer so each file is only stored once.

IPFS also splits large files into multiple chunks; each of which may also be deduplicated in storage. Meaning that the actual storage requirement with IPFS may be smaller than the sum of all of your files on a regular non-deduplicating file system.

The problem with pinning services is that they always charge for the cumulative size of pinned IPFS objects. While they take advantage of deduplicated IPFS object hashes and blocks for their own storage requirements, these savings are not passed onto their customers. This would be a non-issue in a traditional file hosting service. However, deduplication of files and blocks is baked into IPFS and customers expect to reap the benefits.

I don’t believe that the commercial pinning services are overcharging on purpose, however. I’ve discussed this with the co-founder of Eternum, as well as a developer for an yet-to-be-launched pinning service, and they were positive to the idea of billing for the actual deduplicated storage space customers use. This would enable more blogs and websites to preserve their history on IPFS and ease deployment at the same time.

IPFS doesn’t natively support calculating an object’s actual storage size although I’ve suggested adding it to go-ipfs It’s possible to get accurate storage accounting per-customer by creating a new IPFS object consisting of all the customer’s pinned objects, and recursively going through each uniquely referenced object and sum up their size.