Project: Kubed Chaos
I’ve had something of a “busy” week thus far. My sleep has suffered unfortunately thanks to it, but it’ll settle down eventually. See, when I get an idea in my head, it essentially consumes me. I go to bed yearning to work on it, and if I wake up at night to use the restroom, it’s all I can do to go back to sleep. Sometimes, I simply can’t.
It was that kind of week, when my urge to tinker absolutely devours my faculties. And with good reason!
Shopping for Servers
Sometimes I browse Lab Gopher for kicks. If I see something interesting, I may go directly to eBay to try a few search variants to see if it missed anything. While performing that little trick a little over a week ago, I stumbled across a truly ridiculously specced Dell R730XD for $250.
It’s simply not possible to get an R730XD for that cheap, even if it is a small-form-factor version. It even had iDRAC Enterprise, and there are some eBay vendors who would rather pull their own teeth than sell a system with that installed. Not a single other R730XD could be acquired so cheaply.
I couldn’t believe my eyes, so I went back to Lab Gopher to see if I’d somehow missed it in the results, and I hadn’t. It simply wasn’t there. But in all of my back-and-forth comparisons, I actually found a better system. It had 128GB of RAM instead of 32GB, included a 10G Ethernet card, and was a large-form-factor device for the SAN I’ve always been thinking of putting together.
Even though it cost $100 more, the RAM alone made up the difference. And again, no other listings came even close, especially if I wanted the iDRAC so I could open a remote terminal into the management interface. I hate to say it, but that inspired me to start searching for other components, and then we were off to the races.
All in all, I ended up settling on these components:
- 8x 6TB 3.5" SAS HDDs (New! Only $30 each!)
- 8x LFF tray caddies ($60)
- 2x Xeon E5-2690v4 CPUs ($50 - Same as I have in my current R730)
- 2x Optane p1600x cache drives ($60)
- 2x m.2 NVMe PCIe adapters ($15)
- Broadcom 57800s 10G Ethernet card ($15)
One of the things that helped my search is that I didn’t care what processor the system came with because I was going to replace it. It’s almost impossible to find an exact system match, and old Xeon CPUs are dirt cheap. The two I chose are close to the best price/performance versions of the v4 chips. I considered E5-2695v4 instead to get 4 more cores per chip, but those have proportionally lower clock speeds and cost almost 2x as much. So I’d get more cores, but they’d be so much slower that I’d essentially get the same performance.
Unfortunately I noticed too late that the existing 10G ethernet card was the “wrong” version. I’m not too familiar with networking, but apparently RJ45 10GE requires more expensive equipment than SFP+ connectors. I noticed that all the ports on the system were RJ45 when it arrived (the next day!), compared to my existing server and its SFP+ ports. That meant if I wanted to set up a cheap 10G network between the two systems, I’d have to mix technologies or just buy a new card. At the price I paid, the new card was the easy winner.
The idea is to use the Optane drives as a ZFS SLOG mirror just like I do with my existing R730. I can use the two rear drives as a mirrored boot device, and that leaves the remaining 8 drives as a single volume split into two RAIDZ1 vdevs. I could lose one drive from each vdev and still be OK. It also allows me to expand with another 4-drive vdev in the future if I decide 48TB isn’t enough space somehow.
Despite all of this zaniness, that was only half of my weekend.
Playing with K0s
During the course of the week after I set up my K3s cluster, I noticed every node in the cluster was using about 20-30% CPU while it was essentially idle. That’s not a lot in the grand scheme, but think about those poor people trying to run this thing on Raspberry Pis or some other kind of SBC. Maybe it didn’t start that way, and compared to a full Kubernetes k8s stack it’s probably tiny, but some might accuse K3s of being bloated.
Since I have plenty of automation in my playground, I just decided to wipe my Kubernetes VMs and start over. But what to put on them? I noticed a lot of chatter about K0s as a competitor in the “tiny Kubernetes” space, so I decided to give it a try.
I adapted my existing k3s deployment script to use k0s instead—at the cost of about 2-3 hours of scripting—and it just never worked. Something about the stack utterly baffled me, and I couldn’t determine the underlying cause. It was only after my wasted attempt to adapt my deployment script that I found they provide their own quick deployment tool in k0sctl.
I shrugged and tried to use that instead, only to be met with the same failure. At first I thought the problem was because I had been using Debian Unstable (Trixie) as my VM base. So I switched to Stable (Bookworm) and it still didn’t work. What was I missing?! I worked on it until past 11pm before I managed to tear myself away and go to bed, defeated.
Remember how I said I can’t stop thinking about projects I’m working on? Something was nagging at me, and I found the problem in the official documentation the next morning. The problem is that Mirantis has a fundamentally different design philosophy driving K0s than SUSE used for K3s.
The issue is that my whole background is in High Availability, so I refuse to run a single node as my control plane. Yet all of the examples in the K0s documentation demonstrate a single node control plane! I eventually learned that every worker node expects to communicate with the control plane through a single route. It’s very common to use Kube-VIP with K3s for this purpose, but the design of K0s explicitly excludes that solution. Instead, they expect you to use an experimental feature they call Node-local load balancing.
That was the problem the entire time. So rather than letting the cluster act as its own VIP, they recommend relying on an external Load Balancer like HAProxy, some cloud service LB, physical hardware, or maybe round-robin DNS. I personally feel that a Kubernetes structure that isn’t entirely self-reliant is a broken design, but it’s clear the devs disagree. As a result, I’m running a small HAProxy node (like some kind of vagrant) as a temporary solution until I can come up with something more elegant.
Regardless, I finally got it to work. I even managed to get the same test workload running on the cluster, and I can confirm that the mysterious CPU overhead I witnessed in K3s is not evident while using K0s. My main regret is that it took so long.
But hey, at least I learned a little something along the way, and that’s the whole point of this exercise, no?
Until Tomorrow