When does IPFS’ garbage collector clear the cache

Nodes on the InterPlanetary File System (IPFS) cache resources that are downloaded through them and make those resources available for upload to other nodes. This system depends on nodes being willing and able to cache and share resources with the network. Storage isn’t infinite, however, so nodes need to clear out some of their previously cached resources to make room for new resources.

This article discusses the cache garbage collection implementation in ipfs-go version 0.4.18 and the default behavior specific to that implementation.

A common myth perpetuated by blog posts describing IPFS says that an IPFS node’s entire resource cache is garbage collected and deleted every hour. However, the garbage collector isn’t even enabled by default and caches can grow unrestrained unless garbage collection is run manually or enabled to run on a schedule.

The repository garbage collector does run every hour (configured by the GCPeriod option) when enabled (--enable-gc). It doesn’t delete anything from the cache unless the cache exceeds 90 % (configured by the StorageGCWatermark option) of the 10 GB default maximum cache storage space (configured by the StorageMax option.)

The entire cache is deleted in one go when the garbage collector runs; it doesn’t only delete enough data to bring the total size down to 90 % of the available size. Pinned resources is never deleted by the garbage collector.

A node used for web browsing by a single user is unlikely to exceed the maximum storage every hour, and with garbage collection enabled cached resources can remain for hours, days, or weeks depending on usage. A popular public gateway node will probably be purged more often than a node used by a single user. Public gateways may use their own garbage collection handling, however.

ipfs-go version 0.4.4 (2016-10-10) and older had an arbitrary limiter of allowing the garbage collector to run for only one minute per gigabyte of remaining cache space after reaching 90 % of the allowed cache time. This would delete an indeterminate amount of data in the allowed time, and may not have been enough to get the cache below the StorageGCWatermark. This method was removed in version 0.4.5 and no other attempts at limiting the cache purging behavior have been seen in ipfs-go master since this method was removed.

There’s definitely room for improvement in IPFS’ cache handling. Nodes shouldn’t delete their entire cache when they exceed one byte of the configured storage allowance, but rather start purging content more intelligently. It could delete some of the oldest and least frequently accessed content in the cache first. It could potentially query the network and delete the most widely distributed resource as determined by the number of nodes sharing that resource. Even randomly deleting resources until sufficient storage space had been freed would be an improvement.