Ubuntu have begun tracking more non-personalized and anonymous usage data to enable better data-driven decisions for the future development of the popular Linux distribution. As a Fedora user myself, I want Fedora to begin collecting the same type of data as I believe it would benefit Fedora in the long term.
Debian derived Linux distributions — including Ubuntu — have had an optional tool called popularity-contest, or “popcon”, for years. Popcon collects data about the most frequently installed and used packages and programs from participating systems and sends them back to the distribution so they may known which software packages are the most frequently installed and used.
Participation in popcon has always been something people would have to opt-in to through the command line which has skewed reporting towards more technical users. Ubuntu 18.04 will now asks everyone to participate in popcon when they login to their computer for the first time after installing Ubuntu. This will greatly improve popcon participation and increase the accuracy and usefulness of the dataset.
Ubuntu 18.04 has also begun collecting some generic hardware information such as the processor architecture, the amount of installed memory, the screen resolution, and a few other parameters.
Popcon allows distribution maintainers and the whole open-source community to get a sense of the popularity of different packages so that they may prioritize and focus their resources accordingly. The data can also be useful to end-users who can get get software recommendations from other distribution users.
Debian and Ubuntu publishes aggregated usage data to the public on popcon.debian.org and popcon.ubuntu.com respectively. Notably, Popcon only works for packages installed via the default Debian package system (dpkg/apt) and exclude programs installed via alternative package systems such as Flatpak, Snap, Python pip, and others.
Fedora on the other hand doesn’t collect any data whatsoever and the project operates on gut feelings and intuition. Fedora is my preferred Linux distribution and I think they’re doing a superb job without collecting any data whatsoever.
(Despite reports to the contrary, internet connectivity testing is only used to check if your network connection is working and not data collection.)
However, I believe that Fedora could benefit from some limited data collection out-of-the-box such as the Fedora version and edition, and which packages are installed and how often they’re used. Not only would this for the first time help establish how many computers are actually running Fedora; but it would also help identify areas which Fedora should work on improving.
Here are some examples off the top of my head for how some basic usage data collection would help answer important questions for the Fedora project:
- Major version updates. Fedora releases major versions two times per year. However, how long do people normally stay on one version before updating to the next?
- Security updates. How quickly are critical security updates installed? Is the Fedora community for the most part patched and immunized within a few days? does it take weeks? or aren’t critical updates being installed at all?
- Broken packages. Rapid changes in the number of installs or time-since-last use in popular software can help identify updates that caused problems.
- Retention rate. Do people keep using Fedora for a long time or do they stop after a week? Do people quickly abandon Cloud edition but stick with Server edition for years and years? Are people more likely to keep using Fedora with Plasma desktop compared to GNOME desktop? Is the retention rate lower among people who use NVIDIA drivers/hardware than other graphic cards?
- Default software selection. Do people uninstall or not use software installed by default? Which alternative do they install and use instead?
Getting answers to the above questions could help guide Fedora development and decisions to benefit the most number of users and enable more people to get a great user experience with Fedora. I’m not advocating to let the data do all the thinking but I do believe it could be a useful factor to determine priorities in many areas.
I’d also expect to see an uptick in interest in Fedora from upstream project once they can see that there are actually people using their software on Fedora. Knowing that there are actually people using your product can work as a sort of ego-boost and help encourrage support, development, and testing on Fedora.
Fedora could either build on popcon, or better yet — develop a better alternative with a stronger guarantee of privacy to help alleviate the more paranoid among their user base. A very small number of software packages such as health related packages, virtual private networking software (VPN), and the anonymizing Tor proxy network might be considered sensitive in some countries or contexts. These packages could be excluded to remove any fears about leaking potentially compromising or personal data.
Ubuntu will soon have a much better understanding about the overall software trends on Linux than other Linux distributions do. I believe Fedora and other distributions also should begin collecting some limited and anonymous information to at least count the total number of users and what version of the distribution they’re using.
This type of data isn’t about associating data with people or creating an online profile about individuals or collecting personal data like Google Android and Windows 10 does. It’s about strengthening open source by building a better understanding of what software is actually being installed and used. It also gives individuals an effortless way to contribute something back to their preferred Linux distribution.