File cloning is a feature of copy-on-write (CoW) file systems. It’s an immediate way to make duplicate copies of files without requiring a second copy to be stored. Here’s a quick overview of cloning-capable file systems and the system calls and commands required to take advantage of them on Linux, MacOS, Windows, and other operating systems.
Most file systems developed in the last decade, proprietary and open-source alike, are CoW file systems. Simply put, CoW file systems always create a new copy when you modify a file instead of overwriting data in-place. This helps protect your data against a slew of different problems that can occur while writing files.
Copy-on-write file systems also enables use cases like data integrity checking, file system snapshots, and data deduplication. In this article, I’ll focus exclusively on the latter use case.
On traditional file systems, storing extra copies of your files takes up more space on your storage drives. File cloning enables you to make multiple copies of the same file without storing more than one copy of the actual data. If either the original file or one of its clones make changes to the data, the new data is written to another place on the drive. The other copies that shares data with it remains unchanged. The clones that shared data with it can still share some identical chunks of data with the modified file.
Cloning is a safer alternative to file system hard links. Hard links are supported in most file systems, and work more like a shortcut. The hard-linked shortcut points to the exact same location on your storage drive as the original file. Almost no programs are hard-link aware and won’t warn you that you’re about to modify more than one file.
Support for file cloning must be baked into the operating system kernel and the file system driver. Programs must also be made aware of file cloning capabilities before they can take advantage of it. The following table shows the system calls (syscall) and cloning-aware copy commands for popular operating and file systems.
|Unsupported; non-CoW FS.
As shown above, the
cp (copy) command on some operating systems can make use of syscalls to instruct the kernel to clone a file instead of making a complete copy. The most notable example, after the copy command, is Finder on MacOS. It’ll clone files automatically when you copy them on an APFS volume. File Explorer on Windows and Nautilus and Dolphin on Linux will always make complete copies.
It’s unfortunate that the different operating systems have settled on different argument extensions to the POSIX
cp command. It would have been better for developers if there could have been a little more cooperation between the implementors. I had trouble finding any examples of cross-platform implementations. Syncthing stands out with support for all but Solaris’
Each of the syscalls mentioned in the table clone entire files. Linux also supports partial cloning of files using the
fideduperange syscalls. These syscalls only allow you to address parts of a file that is aligned to the block size of the underlying storage media.
OpenZFS isn’t part of the Linux kernel because of licensing issues, and that is unlikely to change. OpenZFS doesn’t support reflinks, nor any of the relevant Linux syscalls for cloning files or blocks. It supports file system-level cloning but not
Bcachefs isn’t in the kernel yet either, but it’s developed under a Linux-kernel compatible license with the ultimate goal of being merged into the kernel. It supports all the relevant Linux-specific syscalls for file cloning.
Over the last three years, Apple has switched all of its products to its new CoW-based Apple File System (APFS). Microsoft has decided to go in the opposite direction, and removed its copy-on-write file system, ReFS, from Windows 10 Professional in . ReFS is now only available on Workstation and Server editions. ReFS was not suitable for use on Windows desktops anyway. This does leave Windows as the only computer operating system without a CoW file system.
I find file cloning fascinating, and I’ll explore several potential use cases for it in the coming weeks. Next up will be how you can identify a cloned file. Something that is surprisingly difficult because the file system doesn’t keep track of it.
Update (): An earlier version of this article incorrectly stated that OpenZFS and ZFS supported
reflink internally, but didn’t expose any syscalls for it. These file systems do not support reflinks. Thanks to Richard Yao of the OpenZFS project for the correction.