Homelab 2022 Part 1 - Hardware and Hypervisors

I think it's been 10 years or so since I last wrote anything about networking or computers, since then a fair bit has changed.

Back in the day, if you wanted a homelab that was more than a drawer full of raspberry pi's, you were looking at a 19" rack and an absurd amount of cabling and clutter (plus the constant whining from house mates and fans alike). But now the improvements in compute density have reached a point where you can have both low power consumption and enough compute to actually do something interesting.

I figure it's worth showing what I believe to be a good set up going forward. Or at least share some ideas for those still on the fence about having a home server.

Hardware

This time around the 19" rack does not make a comeback. Instead I decided to go with a consumer aesthetic without skimping on reliability.

The system is based around a chassis by Chenbro with the catchy name "SR30169T3", it was quite the effort to track down during the pandemic and since Italy was the only country with stock, it managed to get caught up in the Suez canal fiasco for a few weeks. The chassis is very well built and has modern I/O although the part number has been around for over a decade. The only real issue being an oversight in the drive bay design, as the backplane board has slots cut into the PCB, hot air rising off the CPU will tend to get caught in a endless loop but is easily fixed with some tape.

The second major component, the motherboard is an interesting one. For many years Asrock as been dumping cheap boards on to the market with not a lot going for them other than LEDs and the price point. It seems they have started to put on a more mature face and get into the world of server motherboards. The "X570D4I-2T" is a AMD platform with a couple of twists, firstly you won't find your standard AM4 cooler mount, since these sockets generally aren't used in servers, no front-to-back cooling solutions exist, so an Intel cooler is used instead. Then we have the SO-DIMMs, depending on your CPU it will take a full loadout up to 64GB of DDR4 ECC, which is still a decent amount of memory, or at least good enough for me.

The board also boast dual 10G Base-T Ethernet and IPMI management with one of the nicest looking web UIs (as far as IPMI goes anyway) I've seen.

For a CPU, the ubiquitous Ryzen 5 3600 was chosen. Being an older generation chip it's dirt cheap for what it can do.

Cooling is performed by 3 Noctua fans, the CPU using a Dynatron K666R1 cooler (in case you decide to build one yourself, there isn't much that fits in this chassis) with a tiny 60x25mm fan, and a pair of 120x25mm fans in the chassis, the last one being installed inside the PSU, since the one built into it never actually turned on, it's power cable is routed through the same gland the PSU power cables exit and plugs into the MB, thus allowing novel BMC control of the PSU fans speed.

Thermals overall are very reasonable, the system can sustain full utilisation indefinitely and most importantly storage devices are kept just above ambient.

Storage

The primary use for this system is bulk storage for both archival purposes and some 4k video editing.

For this 4 x 10TB Seagate HDDs are configured in a ZFS raidz1 array, single redundancy is perfectly fine for home use providing you have a backup.

For running applications and other high throughput uses I decided to take advantage of the AMD platform and it's ability to do PCIe bifurcation, that is, we can split a single X16 slot into 4 discrete X4 slots and use commodity expansion cards to convert the PCIe slot to m.2 interfaces. The adapter in question is a "414-BBBK" made by Dell, it's a spare part for one of their workstation lines but will do the job just fine and is very affordable compared to other options.

Pictured are 4 Seagate NVMe drives but these were returned due to firmware incompatibility with Illumos. They were replaced with Samsung 980 PRO drives and have been operating without issue. They are arranged in a ZFS raid10 array to form a 2TB pool.

Hypervisor

For the hypervisor I needed something reasonably lightweight so things like ESXi were binned immediately, I also needed something with a good focus on storage, so I definitely considered TrueNAS and the like but these BSD projects never seem to be finished/stable. Ultimately, while this system is by definition a toy, I did not want to deal with a pet operating system by trying to set up some KVM/libvirt cluster fuck on a bare bones Linux install.

Don't get me wrong, Linux is good, it's just not good at being simple.

Given the number options out there, it would seem no matter what you pick there is always a catch. Knowing this I decided to take a gamble and go with something outside the norm and eventually settled on SmartOS which is an Illumos based distro and a distant descendant of Solaris. We have used this platform at work for many years to great success so I'm not unfamiliar with it, but had never considered running it on my own hardware until recently.

Illumos???

Illumos is quite the interesting choice in today's landscape of operating systems. Back in the early 2000's Solaris, Linux, Windows and even some now clearly esoteric systems like BeOS all held (somewhat) equal footing in what would become the de facto choice of OS for business. While microshits windows was being pushed on everything with a x86 capable chip, Sun Microsystems had an interesting advantage in that it offered both a standard UNIX userspace for running real workloads while also having a competent graphical user interface suited to general office use.

As development continued throughout the 2000's many innovative features were added, setting Solaris apart from anything that came before it. Unfortunately, the cruel arrow of time eventually saw Windows as the de facto standard and bureaucracy, mismanagement and temper tantrums sowed the seeds of demise of a once great company and technology stack... Or did it?

Illumos Chapter 2

Well the good news is it isn't dead, not by a long shot. The kernel developers are alive and well, they just fall into the age bracket where they aren't constantly shilling their work on reddit and don't run a slack/twitch/discord server so end users can endlessly request features completely unrelated to the scope of the project. The SmartOS team and the overall Illumos community have been hard at work maintaining and improving the platform for the past decade, so we are now left with a very mature and stable piece of software.

In fact what makes SmartOS so appealing is its stability. Unlike most operating systems it does not need a boot disk, it's loaded off a USB drive and runs entirely out of memory while the persistent data is retained on the designated zones pool along with your VMs and... zones. This means upgrading or trying different builds if you do have issues is painless and instantly reversible.

Another pro is the configuration as code approach to zone creation, everything is JSON, even modifying existing machines is treated like a REST PUT operation. Alternatively there is a set of CLI helper tools with excellent documentation in the man pages.

Finally it's worth mentioning OmniOS, it's also a very mature and feature rich distro that takes a more traditional approach in that it is installed to a boot disk and allows for complete customisation of the base user land. It has the zadm tool, which is very similar to SmartOS's vmadm approach, but a little more refined and user friendly in many ways.

VMs

Finally, here are some of the core services/systems being run. I won't list out absolutely every process that's running since it changes weekly depending on what I'm playing with at the time.

IO (native zone)

IO is the most volcanically active moon in the solar system, it's also the hostname for a small, yet very useful zone used for statistics, monitoring and file shares.

Here we run prometheus, grafana and a SNMP agent to keep tabs on the state of the other services and the hardware itself. There are several alerts configured under grafana for environmental monitoring, any errors arising from automated zpool scrubs and general limits on disk, memory and cpu usage.

In order to monitor SmartOS there is a prometheus exporter avaliable here. Coupling this with the IPMI data the BMC generates gives a complete overview of the systems health.

Amalthea

VyOS router operating system. This is the core router for the VM network as well as the means of connecting clients to the internet.

It has full oversight off all vnics on the hypervisor and runs OSPF to distribute routing to these subnets elsewhere. Performance under Byhve is very good, using jumbo frames multigigabit routing is no big deal.

Much more detail on this in a later post.

Ganymede

An arch linux system running Nomad and Docker (podman), this allows easy deployment of containers and other software. It's actually what's hosting this website right now (with the assets cached by cloudflare). Nomad is a pretty neat tool in it's own right and definitely worth looking into if you don't like the insane complexety of kubernetes.

Europa

Europa is a media server and seedbox. It runs the standard setup with jellyfin serving content over LAN and a tmux session to keep rtorrent idling along.

It's also a good place to fiddle with things.

Conclusion?

This system has been rock solid for the past 12 months, as have the disk arrays with no checksum errors with around 12TB of data between them. ZFS scrubs are surprisingly quick (currently only taking 4 hours) on the storage pool owing to the very decent sequential performance of around 250MB/s per drive that these iron wolfs have.

Power consumption comes in at around 1.8 MWh per year, so "not great not terrible". I don't own a dB meter but you can't really notice it's existence unless you're right next to it thanks to the Noctua fans.

For a home set up this is definitely on the premium side of pricing. The entire system cost around $3.2k USD, which you could reduce by a large factor by omitting some niceties like IPMI, 10GbE and using a basic PC chassis. And of course dropping the total storage amount would drastically lower the cost.

Overall the utility and performance in such a small and quiet package seems well justified.