đź…­

Syncthing: The data deduplication master

Syncthing is an open-source encrypted peer-to-peer folder synchronization program. It uses deduplication techniques to reduce the amount of data it needs to transfer over the network; saving you bandwidth costs, energy, and time. You can now optionally also let it deduplicate data storage to reduce your storage costs.

You can think of Syncthing as cloud storage service like Dropbox or OneDrive but without any intermediary servers or costs. Your files are transferred directly among your devices. It lets you safely synchronize folders without having to trust a cloud storage provider with your data.

Syncthing chunks large files into blocks, similar to how a file system works. It compares which blocks have changed and will only send changed blocks over the network. The blocks can be reused between files and even between different synced folders. This saves transfer time, network bandwidth, and energy.

Syncthing collects usage data from its users. Among the collected data, it records how much data transfer is reduced because of this feature. As of the time of publishing, 56,48 % of synced data was reused from local files. The potential savings of Syncthing’s storage deduplication feature can be assumed to be near this figure.

The storage deduplication feature re-purposes this bandwidth-saving feature to save local disk space. The same mechanism is used to reuse identical on-disk blocks between files.

The feature can significantly reduce the disk space consumed by Syncthing’s file version system. File versionings preserves copies of modified files. This is likely why the deduplication rate in Syncthing’s network is so high. Depending on your usage and files, you may only make slight changes to your files. Syncthing might only need to store two different blocks of changed data instead of storing the entire file twice. This approach can be incredibly efficient with virtual machine disk files, log files and archive files where you append data on the end, and other append-only data structures.

The feature is supported on most file systems that support cloning; including Btrfs, XFS, and EXT4 on Linux and Solaris, and ReFS on Windows. NTFS is the default file system on Windows. ReFS is only available in Windows Workstation and Server editions.

Neither the OpenZFS nor ZFS file systems are supported as they don’t expose any appropriate system calls (syscall). MacOS’ APFS isn’t supported either despite its clonefile syscall. This syscall can only clone whole files, unlike the block-level cloning available on Linux and Solaris.

The feature has the same limitations as normal file cloning. Clones must be created on the same disk/partition and cannot cross file system/partition boundaries. This limitation can lead to syncing errors as Syncthing can reuse blocks across synced folders and isn’t aware of file system boundaries.

Storage deduplication isn’t enabled by default, as of Syncthing version 1.19.1. You can enable it per folder or set it as the default behavior from the Advanced settings dialog. Look for the Copy Range Method option and set it to a supported value.

I recommend explicitly setting the option to ioctl (uses the FICLONERANGE syscall) on Linux as the copy_file_range option/syscall has caused me reduced system stability. The special all option first tries ioctl, then other methods, before eventually falling back to a regular copy. This option can reduce syncing errors (see the previous paragraph) but may cause system instability by calling copy_file_range.

The deduplication feature was added in Syncthing version 1.8.0 () and is still considered experimental. It’ll probably remain that way for some time, as there’s a chance the functionality might introduce a bug that results in dataloss.