I’ve on occasion needed a list of the web’s top websites and have always defaulted to the Alexa Top 1 Million list. However, I recently stumbled upon a better alternative called the Tranco list that sources multiple data points to produce a more accurate representation of the web’s most popular websites.
The Tranco rank listing service is the result of a research paper that looked into how easy it was to manipulate Alexa Internet rankings and how this could be used to influence research that uses it.
Some of the findings that stood out to me in their paper is that almost half of the Alexa top one million list changes every single day. Another key finding was that the average daily intersections between the four lists in a 9 month period was just shy of 2,5 % with the highest intersection of three of the lists was just above 2 %. This makes it clear that the different data collection methods provide a vastly different view on which websites are popular on any given day.
The Tranco list sources data from multiple providers and average out rankings over a thirty-day period instead of relying on a snapshot of the list from just one day (rankings change daily).
Here is a quick overview of the sources used by Tranco:
- Alexa Internet Top 1 Million
Collects data from users who’ve installed the Alexa browser extension. Only available to desktop browsers with approximately 536 000 installs on Chrome and an unknown number of installations on Firefox.
- Cisco Umbrella Popularity List
Collects data from Cisco DNS services (including Umbrella for enterprise and internet service providers and OpenDNS). Includes popular services that aren’t websites and tracks all networked devices, not just desktop computers.
- Majestic Million
Domain in-link popularity ranking based on the unique number of linking IP address subdomains (as opposed to other unique domain names.)
- Quantcast Top Sites
Collects data from end-users through tracking pixels embedded on websites; estimates data for other popular websites. Their tracking pixel is a paid service skewed towards websites in the United States.
The default list from Tranco is capped at one million, but you can optionally download their entire dataset of all domains seen on any of the lists in the last month (almost 7,5 million domains at the time of writing).
Tranco also offers daily average sorted versions of each of the lists they source. You can generate a ranked list with custom settings and optionally include subdomains (sourced from Cisco Umbrella), or filter out known malware and phishing websites using Google Safe Browsing.