Ubuntu have begun tracking more non-personalized and anonymous usage data to enable better data-driven decisions for the future development of the popular Linux distribution. As a Fedora Linux user myself, I want Fedora Linux to begin collecting the same type of data as I believe it would benefit Fedora Linux in the long term.
Debian derived Linux distributions — including Ubuntu — have had an optional tool called popularity-contest, or “popcon”, for years. Popcon collects data about the most frequently installed and used packages and programs from participating systems and sends them back to the distribution so they may known which software packages are the most frequently installed and used.
Participation in popcon has always been something people would have to opt-in to through the command-line which has skewed reporting towards more technical users. Ubuntu 18.04 will now asks everyone to participate in popcon when they login to their computer for the first time after installing Ubuntu. This will greatly improve popcon participation and increase the accuracy and usefulness of the dataset.
Ubuntu 18.04 has also begun collecting some generic hardware information such as the processor architecture, the amount of installed memory, the screen resolution, and a few other parameters.
Popcon allows distribution maintainers and the whole open-source community to get a sense of the popularity of different packages so that they may prioritize and focus their resources accordingly. The data can also be useful to end-users who can get software recommendations from other distribution users.
Debian and Ubuntu publishes aggregated usage data to the public on popcon.debian.org and popcon.ubuntu.com respectively. Notably, Popcon only works for packages installed via the default Debian package system (
apt) and exclude programs installed via alternative package systems such as Flatpak, Snap, Python pip, and others.
Update (): Canonical has announced it will remove popcon from the default Ubuntu installation image. The system had fallen into disrepair and didn’t see much use.
Fedora Linux on the other hand doesn’t collect any data whatsoever and the project operates on gut feelings and intuition. Fedora Linux is my preferred Linux distribution and I think they’re doing a superb job without collecting any data whatsoever.
(Despite reports to the contrary, internet connectivity testing is only used to check if your network connection is working and not data collection.)
However, I believe that Fedora Linux could benefit from some limited data collection out-of-the-box such as the Fedora Linux version and edition, and which packages are installed and how often they’re used. Not only would this for the first time help establish how many computers are running Fedora Linux; but it would also help identify areas which Fedora Linux should work on improving.
Here are some examples off the top of my head for how some basic usage data collection would help answer important questions for the Fedora Linux project:
- Major version updates. Fedora Linux releases major versions two times per year. However, how long do people normally stay on one version before updating to the next?
- Security updates. How quickly are critical security updates installed? Is the Fedora Linux community for the most part patched and immunized within a few days? does it take weeks? or aren’t critical updates being installed at all?
- Broken packages. Rapid changes in the number of installs or time-since-last use in popular software can help identify updates that caused problems.
- Retention rate. Do people keep using Fedora Linux for a long time or do they stop after a week? Do people quickly abandon Cloud edition but stick with Server edition for years and years? Are people more likely to keep using Fedora Linux with Plasma desktop compared to GNOME desktop? Is the retention rate lower among people who use NVIDIA drivers/hardware than other graphic cards?
- Default software selection. Do people uninstall or not use software installed by default? Which alternative do they install and use instead?
Getting answers to the above questions could help guide Fedora Linux development and decisions to benefit the most number of users and enable more people to get a great user experience with Fedora Linux. I’m not advocating to let the data do all the thinking but I do believe it could be a useful factor to determine priorities in many areas.
I’d also expect to see an uptick in interest in Fedora Linux from upstream project once they can see that there are people using their software on Fedora Linux. Knowing that there are people using your product can work as a sort of ego-boost and help encourrage support, development, and testing on Fedora Linux.
Fedora Linux could either build on popcon, or better yet — develop a better alternative with a stronger guarantee of privacy to help alleviate the more paranoid among their user base. A very small number of software packages such as health related packages, virtual private networking (VPN) software, and the anonymizing Tor proxy network might be considered sensitive in some countries or contexts. These packages could be excluded to remove any fears about leaking potentially compromising or personal data.
Ubuntu will soon have a much better understanding about the overall software trends on Linux than other Linux distributions do. I believe Fedora Linux and other distributions also should begin collecting some limited and anonymous information to at least count the total number of users and what version of the distribution they’re using.
This type of data isn’t about associating data with people or creating an online profile about individuals or collecting personal data like Google Android and Windows 10 does. It’s about strengthening open-source by building a better understanding of what software is being installed and used. It also gives individuals an effortless way to contribute something back to their preferred Linux distribution.