Pressing Out the Words

Page content

I’ve been using WordPress on my blog for so long, I don’t even really remember when I converted to using it from my homegrown system. My post archives suggest it happened some time in 2010, so that’s a long time to be on a platform I ultimately disliked. Heck, I hated the Gutenberg block writing system so much I followed a guide to disable it. And then I installed an actual plugin to disable it permanently.

Upgrade Treadmill

The recent purchase of Linode by Akamai had me reevaluating my entire constellation of remotely hosted software. Just in case they start significantly altering the excellent service Linode provides, it’s good to have a backup option. What probably annoyed me the most with maintaining a VPS though, was the upgrade treadmill.

I was already getting annoyed with constantly updating Wordpress and all of my plugins, and fixing SuperCache every time the upgrade broke caching for some reason. But that was on top of updating the operating system and all of the other software I was using. Ubuntu 18.04 was getting too old for the version of PHP Wordpress wanted to use, so I had to upgrade it. That broke my Django site because Django is the master of deprecating literally everything. It also caused my Postfix + Dovecot combo to fall over. It was just an endless litany of shattered dreams.

My Django site is still inoperable. The entire ordeal was just hosting my serial novel that I finished 10 years ago, so there’s no rush. Still, Django and their constant shenanigans caused me to strongly consider replacing it. So that made two site engines I wanted to ditch. What to use instead? I’d recently joined a community called THAT with the goal to “foster a community of geeks who want to help one another be awesome.” After entering their Slack server, I asked what someone might suggest in my situation.

Meet the New Guy

This is how I learned about Hugo. Rather relying on PHP, Python, Ruby, or some other language to produce a page when viewed by a user, it just generates every page all at once. The end product is an entirely static site that has no overhead, need for caches to reduce load times, or anything else a dynamic system requires. Since it uses Markdown for the content, it already matched my writing format. All that remained was finding some system to extract everything from Wordpress in bulk.

I know enough about databases and templates to write one of these myself, but I figured there was at least one floating around out there already. The Hugo documentation lists four of them, and I tried them all. None were quite what I wanted. The wordpress-to-hugo-exporter WordPress plugin was the most convenient, but it forced everything through the rendering system, so rather than my Markdown, it was getting the HTML output WordPress would display. It did its best to convert everything back to Markdown, but it’s like using Google Translate on the same text several times; some of the text or code blocks became unrecognizable gibberish.

Next came blog2md. It uses an XML export from WordPress itself, which worried me initially. But I examined the export and it actually preserves the Markdown source, meaning this would likely come closer than the wordpress-to-hugo. The problem with this tool is that it wasn’t expecting Markdown, so it escaped everything with a backslash. There’s no combination of regular expressions to fix that, so I just threw it out.

My last option was exitwp. It also leverages the WordPress export, and got almost everything right. The primary mistake was that it uses post_date_gmt rather than post_date, and this particular field is frequently blank or filled with nonsense like 0000-00-00. A quick tweak of the Python code fixed that, leaving me with a single directory filled with several hundred Markdown files with headings suitable for Hugo.

Going to TweakTown

Now I was in possession of several hundred accumulated posts going back to 2001 dumped unceremoniously into a single directory. My first task was to reorganize everything, and I chose the lazy route: splitting everything up by year. I don’t post often enough to need monthly folders, so that made the most sense anyway. Next came fixing the little things exitwp got slightly wrong or had no way of understanding:

  • Replacing “Admin” with “Shaun” as the author of the posts.
  • Stripping the post date off of each filename.
  • Removing “https://bonesmoses.org” from all links so they’re relative to the site itself.
  • Converting all of my internal links ([intlink id="slug"]text[/intlink]) with Hugo {{< relref ... >}} equivalents.
  • Moving images from /wp-content/uploads to /static/img and updating links.
  • Rewriting images with captions as Hugo {{< figure ... >}} equivalents.
  • Replacing url: with aliases:, because I wanted to shorten the links. I never really liked the way WordPress forces the full date of a post into the path, so I used the opportunity to shorten everything to /year/slug instead. The alias exists purely as a redirect to the new shortened format.

I also did a bit of misc. reworking, like moving some content out of the mix of general posts. Some of my content, such as short stories, reviews, and things that aren’t just generic updates, really should be elsewhere. Now they are! Currently they’re only reachable in the Category sidebar, but I’ll probably be fiddling with this until I’m mostly satisfied.

Finding “The Look”

The hardest part of the migration was actually choosing a theme and hammering it into what I actually wanted. I initially tried Clean White, but it was extremely fragile. It uses categories for the main menu, and I have so many it broke the entire layout. I could have probably fixed it, but there was a chance I’d like another theme, so I continued the search.

This brought me to Mainroad. This theme was unmistakably similar to the Esteem theme I’d already deployed, so I decided to try it out. The only thing it seemed to be missing was the masthead image below the menu. I also wanted to reformat a few things, like how it displayed social links in the sidebar.

Adding the masthead was easily the hardest part. It took me several hours and I still don’t think I have everything quite right. Digging through all of the templates and fiddling with CSS isn’t what I’d call fun, especially given how rusty I am after not really leveraging that skill for over a decade. But it’s close enough for now. I did swipe one idea from the Clean White theme though: post banners. While I was in the code, I added the ability to specify an image: attribute in post front-matter, and if it’s set, it’ll replace the default banner while viewing that post.

There’s still some work to be done, but the bulk of it is complete. Now I can just drop a text file in my .git repo, run a single command to generate the site, and I’m done. Once I set up the git hooks, even that will be done automatically. No more security-exploit-prone, continuously deprecated, dynamic site generators.

Future Steps

My next step is probably to acquire a Disqus account so I can get comments back, and to replace the search box. The search box currently redirects to DuckDuckGo, which works, but is extremely remedial. I’d rather have something that can search the site content directly and show results. I’ve seen Hugo themes that do this, so I may have to yank that functionality and plug it into my now heavily modified Mainroad theme.

Either way, it’s been a journey and I’m now better leveraged for a potential future host migration. As I learn more, I’ll probably also convert my serial novel site to Hugo rather than fix whatever the Django devs broke this time. Maybe I’ll even start posting on it after a 10-year hiatus…

Until Tomorrow