Patch origin trust vs GitHub’s URL hierarchy

Attentive readers may have noticed something a bit weird with the GitHub patch links in my last article. I shared links to two patches for Ruby's Rake build system which I also said hadn't yet been accepted into Rake. Yet, the patches looked like they came directly from the Rake project's official code repository at https://github.com/ruby/rake/. So, how did I get a patch URL that’s indistinguishable from commits/patches that are part of a project?

Every code repository on GitHub resides under first either a username or organization name, and then secondly under the project’s name. For instance, the https://github.com/ruby/rake/ project is organized under the ruby organization’s rake project. Code, issue tracking, and proposed code changes all live under that same URL hierarchy.

Proposed code changes, or “pull requests”, are suggested changes contributed by project collaborators or just anyone with something to contribute. These changes haven’t yet been approved and made part of the project. You can find them under the pulls/ section of a project’s page, e.g. https://github.com/ruby/rake/pulls/.

In my article, I included the following links to my patches in the Rake project. The patches hadn’t yet been accepted into the project at the time of publication.

https://github.com/ruby/rake/commit/f8afda2b22.patch
https://github.com/ruby/rake/commit/abf5e26464.patch

There’s nothing to indicate that my patches aren’t part of the Rake project. They’re published under the same URL scheme as any other commit or patch in the project. You can’t tell from either the URL or the patches that the code isn’t part of the project they appear to originate from. These links shouldn’t have worked!

You could trick someone through social engineering to deploy a malicious patch that appears to legitimately have originated from a target project. All it would take to get a legitimate-looking URL is to open and close a pull request in the project. It’s not unheard of that large deployments receive an early heads up about critical security patches. The malicious source code and intent would then be public, but a quick “oops, that was stupid — honest mistake” comment on the pull request could be enough to defuse suspicions.

I first noticed this problem when troubleshooting a problem with building libjxl — the JPEG XL codec library — on FreeBSD. The FreshPorts build-instructions for the library appeared to fetch a patch from the libjxl project on GitHub:

https://github.com/libjxl/libjxl/commit/adb32f3f8f.patch

To troubleshoot my issue, I wanted to know when the patch was accepted into the project. However, I couldn’t find the patch anywhere in the project’s git logs. After some further digging, I realized that the FreshPorts package maintainer, Jan Beich, had proposed and then withdrew the change request over a disagreement with a Google bot over the need to sign a Contributor License Agreement (CLA).

After not failing to get his fix accepted upstream, Beich included the patch in the FreshPorts build-instructions instead. Based on the URL, the patch still appears to originate from the upstream project. Technically, it’s an orphaned commit in the project as it doesn’t belong to any of its branches. Beich has also deleted his fork of the project, so the commit can’t be shown under his username in the URL hierarchy.

To be clear, I’m not saying that Jan Beich did anything malicious or wrong. It’s just another example of how GitHub enables you to publish a patch under a URL hierarchy you shouldn’t have access to.

GitHub will even go out of its way to do the wrong thing. For instance, it redirects https://github.com/libjxl/libjxl/pull/193/commits/adb32f3f8f.patch to https://github.com/libjxl/libjxl/commit/adb32f3f8f.patch. The redirect removes important context from the URL. GitHub should instead have enforced the redirect in the other direction. Truly orphaned commits that don’t belong to any branch would still need some special handling, though.

So, how do you know whether a patch is part of a project or not? Just remove the .patch suffix from the URL, and GitHub will show you a webpage with more information. This page may say “This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.” GitHub may instead show you which branches and [release] tags contain the commit.

The git patch format doesn’t allow for the inclusion of comments not part of the git commit itself. This could have been a neat way for GitHub to include a warning message for orphaned or otherwise untrusted commits. GitHub tracks this information, as evidenced by the information shown on its website. There’s just no way to communicate it to users in the patch files themselves. This leaves the URL hierarchy — where GitHub fails to adequately communicate that a patch isn’t a part of the project, despite what the URL hierarchy might suggest.