Copyright and the distributed peer-to-peer web

The distributed web overturns many of the assumptions of the traditional centralized web especially when it comes to how everything is distributed. Copyright laws protect a creator’s exclusive rights to distribute their creative works. So, how does all of this work when you as a creator publish something on a distributed website? And how does it work for the end users?

When you publish a creative work on the traditional centralized web you’re implicitly licensing everyone in the world to come and view whatever you’ve published on the website where you published it. If this arrangement isn’t enough for you then you can erect paywalls and charge a fee to see it, you can license it permissively extending your viewer’s rights to redistribute and repurpose it freely, or you can exercise your exclusive distribution right and pull it off the web. These rights are protected by copyright laws and apply almost everywhere in the world thanks to the Berne Convention for the Protection of Literary and Artistic Works.

However, things work a little differently on the distributed web. Anyone who views your articles, videos, or other copyrighted creative works also cache them on their devices and contributes to the distribution to other viewers. Unless expressly licensed to do so, by redistributing any copyright licensed works they’re violating the creator’s exclusive distribution rights. Copyright infringement, even unintended copyright infringement, is no laughing matter.

A few weeks ago I looked into how Accelerated Mobile Pages (AMP) Caches work by leveraging a safe-harbor provision for caching services found in the copyright laws found in many countries including Australia, Canada, the European Union, and the USA. Could the same caching exemption be used for the distributed web?

Are distributed web protocol clients a caching service?

You can certainly think of a distributed web client like a caching service. Copyright law includes safe-harbor provisions for caching services, although they don’t do a good job of defining them.

Caches must serve the purpose of improving network efficiency, they must select content for caching using automated means, they can’t modify or alter the cached content, and a cache must respect industry standard methods that enable the rights holders to update or remove the cached files.

The Dat Protocol and compliant clients meet the first three requirements, but the last one is a bit trickier. The protocol allows for files to be updated and even removed, but old version and even deleted files can still be accessed through the protocol’s built-in file history and versioning system.

This isn’t a new problem as the same situation exists for other public version control systems like the popular Git development tool. Individual open-source development projects solve this problem using irrevocable permissive licenses that apply to the entire project and older versions in perpetuity. In other words, this is handled by an expressly written license.

Files on the InterPlanetary File System (IPFS) are immutable and can’t be updated, and there isn’t a method for removing them either. IPFS and IPFS-to-web gateways (acting more like traditional caching services) thus don’t qualify for the safe-harbor provisions afforded to caching services either.

A Dat Protocol client is close, but in my understanding doesn’t quite meet the requirements to qualify as a caching service. IPFS clients are nowhere near qualifying as caching services.

What about implied licenses?

I’ve mentioned implied licenses once in the article already so lets explore whether they’re suited for use on the distributed web. An implied license is an unwritten license granting limited rights to use copyrighted material as a result of how the rightsholder themselves distributed the material.

You can argue that for anything legally published on the distributed web by its rightsholder, there’s an implied unwritten license agreement that grants you the right to redistribute it in the form it was provided to you.

For the distributed web, this means that you’ve an implied license to cache it on your device and redistribute it through the same means that you accessed it. This doesn’t grant you other rights, such as redistribution, repurposing outside the original form, transformation, or offering it for sale.

However, an implied license is by its very nature fragile and can come with unintended consequences. The same line of reasoning that grants users the right to redistribute content on the Dat Protocol also enables other uses that the right holder may not want.

For instance, both the Beaker Browser and Cliqz’s implementation of the protocol feature prominent buttons for making an editable copy of the original website that’s immediately made available to others. That right can also be argued to have been implied by the original rightsholder, even though they might not have intended to allow for derivative works.

Peer-to-peer networks that compensate users financially, e.g. by using token virtual currency — like Tribler, FileCoin, and BitTorrent Speed — further complicates the matter by introducing economic exploitation of the rightholder’s works.

The ambiguities around implied licenses are many, and especially for a relatively new type of media without established rules like the distributed web. You can mitigate these problems by expressly providing a written license for your content.

Choosing a license for your work

Before I get going I’d like to remind the reader that I’m not a lawyer nor providing legal advice on how to write your contract. That being said, you probably don’t need to write your contract.

Creative Commons is a set of permissive licenses that have been around and evolved since 2001. They’re similar to free software and open-source licenses as used in open-source software development projects, but focus on other creative endeavors like film making, photography and writing. Creative Commons licenses are widely recognized and easy to read, with a standard set of symbols representing the rights you’ve licensed your work under.

Creative Commons have several different licenses to choose from, depending on what you’re comfortable with allowing. All of their licenses allow for the free distribution of licensed works, given that certain requirements set by the rights holder are met. All their licenses require attribution in a way specified by the rights holder (such as including your name and a link to a webpage of your choosing.)

The rights holder can then opt to either allow or disallow commercial exploitation of their work, and to either allow or disallow using their works in remixes and other derivative works. Optionally, derivative works can be allowed only if the new work is distributed under the same Creative Commons license.

The interactive Creative Commons license chooser will help you decide on and understand the differences between the different licenses. The tool helps you select what you want others to be able to do with your work and what rights you want to retain.

Note that you must read the Creative Commons license you chose in full before applying it to your work. The licenses are irrevocable so you can’t remove the license from the copy you distribute and then pursue people who have obtained copies of your work while it was Creative Commons licensed, and you can’t stop them from redistributing it under the same terms after you’ve changed your mind about the license.

There are things in the license you should be aware of, such as the right to transform the medium (e.g. convert from a PDF to HTML, or transcribe an audio podcast to text) which are exempt from the non-derivative license requirement.

If you’re unhappy with the Creative Commons licenses and the brand recognition and the public’s familiarity with those licenses, then you can, of course, write your license terms. Good luck with that.

Applying the license

The one thing super important thing about applying a license is that it’s made available to the user. For content-addressable distribution models like IPFS, it’s important to embed a copyright notice and the license into images and other files individually. There’s no guarantee a separate license file will “stick” with a collection of files you’ve shared.

This isn’t as important, but still recommended, for platforms like the Dat Protocol and BitTorrent which revolve around bundles of files instead of individual files. Most file formats have machine-readable standards for embedding a copyright and license statement (or at the very least key-value pairs for metadata that can be used for this purpose), and there’s software that can embed these into the files for you.

Keep in mind that individual components of something like a webpage may have different license terms. For example, while your blog posts may be licensed as Creative Commons, the JavaScript you use on your page might be licensed under a free-software license; whereas the illustrations to your blog posts may have been sublicensed using yet another license. Be careful to not imply that a license applies to the whole website or even all the contents of a page, but try instead to be as specific as possible.

There should also be a human-readable version of the license. This can be included at the bottom of a webpage, in the credits of a movie, or super-imposed as a watermark on top of an image. I’d personally not use the last option as I find that it often ruins the photo or illustration. However, I don’t see any real alternatives to images distributed individually, e.g. over IPFS.

This article could be completely wrong! If you spot any mistakes or something blatantly obvious that I’ve missed, please leave a comment! I’ll happily update the article with new information.