How I archived 26 500 old Ask Fedora community Q&As

Ask Fedora is the Fedora Linux community’s questions-and-answers portal, and it recently transitioned from a forum software called Askbot to Discourse. Changing the underlying forum software doesn’t have to be destructive but Ask Fedora decided to go with a nuke-and-pave migration strategy: They decided to start from scratch instead of copying user accounts and the user-contributed content to the new software.

The first time I learned of the migration was a few days after it had happened. I’d run into an issue with my Fedora installation and went online looking for solutions. Every useful search result was from the old Ask Fedora site and every link returned an HTTP 404 Not Found error message as those answers hadn’t been migrated to the new Ask Fedora website. None of the pages I were looking for where available in the Internet Archive either.

The Fedora Linux community have lost 26 500 questions and the community discussions surrounding those questions with Ask Fedora’s software migration. I don’t have any numbers for how many answers and comments each question had received, but I’d guesstimate an average of more than one answer per question. 25 000 of the questions where in English, 680 in Spanish, and the rest was split between Portuguese, Russian, and a couple of other languages.

Askbot had its issues, especially when it came to performance, and I’m sure there were plenty of good reasons to migrate to Discourse. That still doesn’t excuse not preserving all the work that people had put into answering questions on the old Ask Fedora platform over the years.

Ask Fedora is run and maintained by volunteers and it’s other volunteers still who had answered the questions posted by all the novice and experienced Fedora users who’ve needed some help over the years. Many top contributors on Askbot had amassed thousands of “internet points” and recognition on the old Askbot platform. I’m sure deleting all their hard work and asking them to start from scratch must have pushed at least some of them away from contributing on the new Ask Fedora website.

Whenever you upload or write anything on someone else’s platform, the faith of that work is left up to the operator of the platform. This shouldn’t come as a surprise to anyone — least of all me — but I’m still disappointed to see all the time, effort, and answers people had put into the site just vanish over night. As you might be able to tell, I’m not happy about seeing my 140 answers and 1400 karma points being discarded over night.

However, the old Ask Fedora website does still exist! There’s a read-only archived clone of it on askbot.fedoraproject.org. This doesn’t help much as you’ve to know that you’ve to manually, more on that later, go to that domain instead of ask.fedoraproject.org to find the questions and answers from the old site. The old domain will probably remain in search results for months to come and references to them will remain throughout the web and in people’s bookmarks forever.

The old Ask Fedora website explicitly licensed all user-submitted content under the permissive Creative Commons Attribution-ShareAlike 3.0 License. That licensing scheme allows for anyone to make copies of and distribute, and preserve every discussion on the platform in their entirety.

I jumped on the opportunity and submitted all 26 500 questions and answers from the old Ask Fedora website (using the askbot.fedoraproject.org domain) for archival in the Internet Archive’s Wayback Machine. The whole process only took my computer a couple of hours and the archival process is easy to set up.

You can access the archived version in the Wayback Machine. Please note that retrieval can take a minute or two after the page has loaded, and you don’t get a lot of feedback while it’s loading.

I wish someone had done that before Askbot was moved to a new domain so that the old URLs could have been preserved. At least it should now be possible to dig up all the old questions and answers given that you know that they can be accessed from a new location. (Which is a big ask without any redirects.)

This archival trick isn’t going to work again in the future. Discourse pages are too dependent on JavaScript for the Internet Archive to properly archive their contents. Many other JavaScript-heavy websites manages to produce an HTML versions of their pages for bots and archiving but this have been a known-issue with Discourse for years. The new Ask Fedora portal also doesn’t explicitly license new user contributions under a permissive Creative Commons license.

I understand that migrating content from one platform to another can be difficult and time-consuming. However, caring for the old URL design and redirecting them to an off-site archive isn’t a huge task. I couldn’t find any URL collisions between the old and new URL designs, so setting up redirects should have been a trivial task.

Unfortunately, Ask Fedora isn’t hosted on Fedora’s own infrastructure and is instead hosted and operated by Discourse. They don’t offer good enough tools to forward the old URLs to either the archived version at askbot.fedoraproject.org or the Internet Archive’s copy. Losing the ability to deploy something as routine as a couple of redirects is a huge argument against outsourcing in my book.