Software vendors are no longer content with just tracking people as they move around their website to make up their minds whether to download the software or not. A new ‘super-cookie’ technique inserts unique tracking code into the software you download, and is installed onto your computer as a persistent identifier along with the software.
Just about every company keep an eye on what visitors do while they visit their websites. They may run experiments like download button red instead of green or changing page’s background image to see which variant can convince more people to download their software or buy their products. People can to some extent opt-out of this type of monitoring by making changes to their web browsers.
How unique data is embedded in downloaded executables
Code-signing is a tool of cryptographically verifying that the software you download haven’t been modified or tampered with. Unsigned or modified software will display a scary warning on from your operating system; warning against or outright blocking you from using the software.
However, you’ve likely downloaded software from the official website of the software vendor that has been tampered with after it was signed but without breaking the signature seal. This defies the myth that says code-signed programs can’t be modified after the fact.
The code-signature verification systems in macOS and Windows has a few loopholes that are actively used for tracking and fingerprinting downloaded software installers. Developers can embed arbitrary data before or after the code signature certificates on Windows, or embed data in extended file system attributes on macOS.
Think of it as someone opening the sanitation seal on a hygiene or grocery product, inserting something into the packet, and then sealing it back up without damaging the reassuring seal.
There is little to no transparency regarding this type of tracking. You have to carefully investigate each bit of software you download to identify whether the software vendor has embedded unique tracking identifiers in the software you’ve downloaded.
The types of data that is being embedded
When you visit a software vendor’s website, they’ll assign your web browser a unique identifier. This identifier is included with every request from your web browser to the server, including the software download server. This identifier is either included verbatim, or you may be assigned another unique identifier (the download server then being able to map the to identifiers).
Other data includes the web browser’s name, make, and version (the User-Agent string); marketing campaign data (Google Analytics/UTM parameters); and the user’s IP address. The latter appears most common with trialware and shareware. However, I’d not be surprised to see IP addresses and maybe even a customer numbers or similar as a fingerprint embedded inside downloads in more expensive premium software.
In most cases the data is embedded as plain-text, but some vendors try to hide and obfuscate the nature of the data. Making changes to the User-Agent, request cookies, URL parameters, or the IP addressed used to download the software does however reveal that the additional data is changing based on these parameters.
Alternative approach: unique file name
Embedding tracking codes inside downloads require quite a bit of highly specialized knowledge and a purpose-built download delivery infrastructure.
Some software vendors have settled on serving people files where the downloaded file name itself contains the tracking information. The software installer can then read the desired out of the file name of its own file name, and report back to the software company or store the data persistently on peoples’ computers.
Which applications are doing this?
You can detect this type of tracking by downloading the same software from different web browsers from different IP addresses. The same version of the same software should be identical. If they’re not, then you can look at the differences between the two downloads.
It can be tricky to detect, though, as some vendors only embed this if you download their software after clicking an advertisement which lead to their website first.
Here are some examples of popular applications and what data they include with their installers:
- Avast and AVG antivirus includes marketing campaign data in the downloaded file name, but no unique identifiers.
- Avira antivirus includes marketing campaign data, and an unique identifier (an incremental counter) in the downloaded file name.
- Backblaze backup includes partner referral codes in the downloaded file names to accredit sale leads.
- Brave browser includes partner referral codes in the downloaded file names to accredit sale leads.
- Google Chrome embeds a unique identifier, the name of the browser you used to download their browser, your device language, whether you’ve opted-out of data collection using a checkbox on their website, and marketing data including which Google website or advertisement you clicked to download Chrome.
- Mozilla Firefox doesn’t embed unique identifiers, but they identify whether you downloaded Firefox from Firefox.com, Mozilla.org, or other. Mozilla lets you opt-out by enabling the Do-Not-Track preference in your web browser.
- Opera is believed to embed similar data to Chrome and Yandex, but hard to verify exactly as they obfuscate the data.
- WinZip doesn’t embed unique identifiers, but they do embed marketing campaign data. Notably, WinZip appears to have prepared a few different variants of their software and code-signed each of them individually.
- Yandex.browser embeds two different unique identifiers, the name of the browser you used to download their browser (which they use to copy bookmarks and data from your preferred browser on install), and whether you’ve opted-out of data data collection using a checkbox on their website (preference appears to be non-functioning at the time of publication).
You can tell from the above list of software that web browsers track the most data about their users. You can also see that all, excluding WinZip, are freeware or freemium software with either rely on advertisement revenue or that have a high ad-spending budget of their own.
Continue reading Part 2: How to limit tracking in software downloads.