Setting up a ZFS Home Server on Arch Linux

Since moving to a laptop for my primary device, I have had an underutilized desktop sitting around. That in addition to a box full of old hard drives made me want to build a home NAS to store my ever growing collection of data. The machine already had Arch Linux installed on it. I would have installed FreeBSD so ZFS would be better supported, but I wanted to use this machine as a Docker host as well. Docker is still pretty experimental to FreeBSD. In this post I will go over the process of setting up my new ZFS based file server.

Intro to ZFS

ZFS was originally developed as a proprietary file system for Solaris OS by Sun Microsystems.[1] Since then it has been open-sourced, then forked back to closed source (boo Oracle). The current open source version is known as OpenZFS, with the Linux version is called ZFS on Linux (ZoL).[2] All references to ZFS in this post are references to ZoL. Because ZFS is licensed as CDDL, a free (but non GPL compatible) license, it is not included as a binary kernel module for Linux.

There a quite a few interesting features of ZFS. One pretty unique one is that it combines the volume manager and the file system.[3] This makes the storage easier to manage. ZFS is a copy on write (CoW) file system. This means that new data is written to a new block, instead of replacing the old data. If an error were to occur during a write, the old data is still available. ZFS also places a large emphasis on data integrity. By storing checksums of the files in the metadata, it can detect errors and automatically repair them.

Terminology

Preparing the Hard Drives

First step was to prepare the hard drives that I was going to use. Much of this process is documented in some way on the arch wiki.[4]

All the drives I have are pretty old, and I didn’t want to put a faulty drive into a new pool, so I tested each drive using the built in SMART tools. Note that many of the following commands will require you to run as a privileged user.
$ smartctl -t long /dev/whatever

I also tested with hdparm to check the speeds of each disk, as one slow disk can throw off the speed of the whole array.
$ hdparm -Tt /dev/whatever

After testing, I wiped the disks completely, so there wouldn’t be the possibility of errors from old partition information floating around on the drives.
$ mdadm --misc --zero-superblock /dev/whatever

Everything on the hardware side is now ready to go. After redoing my cable management, it was time to install the software.

Installing Software

My machine runs Archlinux, and someone[5] is kind enough to make a custom repository[6] for ZFS installs.

To add the user repository to the pacman.conf file:
$ echo "[archzfs]\nServer = http://archzfs.com/$repo/x86_64" >> /etc/pacman.conf

To add the unofficial signing key:

$ pacman-key -r <keyid>
$ pacman-key -f <keyid>
$ pacman-key --lsign-key <keyid>

There is a lag between kernel updates, and ZFS driver updates, so it was necessary to downgrade my kernel for this. If you need to downgrade your kernel, the archwiki has a good resource.[7]

From there I installed ZFS:
$ pacman -S zfs-linux

This will pull in any other dependencies as necessary.

Setting up the Storage Pool

I had room for four disks on my machine, and I wanted to have more storage than what was allowed by a 2x2 mirror, so I went with RAIDZ. For a detailed discussion on the different raid levels ZFS supports, see[8]. It is also important to note that growing the size of your array requires either adding an entirely new vdev (good luck if you used all your SATA ports) or replacing the drives one by one and rebuilding the array. This is something you should be aware of if you ever plan on upgrading your storage capacity.[9]

In the following example, I will be creating a pool named zstore.

Load the kernel module:

$ modprobe zfs
$ echo zfs >> /etc/modules-load.d/zfs.conf

Enable ZFS system services:

$ systemctl enable --now zfs.target
$ systemctl enable --now zfs-import-cache
$ systemctl enable --now zfs-mount

Don’t use the standard /dev/sdX as these labels can change between reboots, causing ZFS to fail when it tries to use the wrong disks. To get the list of disk id’s:
$ ls -lh /dev/disk/by-id

Creating the zpool:

$ mkdir /mnt/zstore
$ zpool create -m /mnt/zstore zstore raidz all-your-disk-ids

This will create you main storage pool. If you run zpool status you should get an output displaying stats on your newly created pool. And that is pretty much it. You can now add files to your pool, snapshot it, scrub it, etc.

Instead of having one big pool of data, I recommend creating separate datasets depending on what you are putting there. e.g. have a separate dataset for your media files:
$ zfs create zstore/media

That way you can granularly control settings such as compression depending on the file type and intended use case.

Sharing the Data

The next step for me was being able to access the files easily from my laptop or other devices.

I decided to go with NFS.[10]

Software Installation:
$ pacman -S nfs-utils

Create a bind mount to the ZFS dataset:

$ echo "/mnt/zstore/media  /srv/nfs  none  bind,defaults,nofail,x-systemd.requires=zfs-mount.service  0 0" >> /etc/fstab

Add desired shares to /etc/exports:
$ echo "/srv/nfs/media *(rw)" >> /etc/exports

Note that this can be done natively with ZFS as well
$ zfs set sharenfs=on zstore

Enable the service
$ systemctl enable --now nfs-server

Install nfs-utils on your client machine as well, and you should now be able to access your ZFS shares over your network.

Conclusion

It has been a couple of weeks since I set all of this up, and the storage pool has been working great so far. I find that I am mostly limited on the network side of things (need to go to gigabit).

Something that I wish was supported was native encryption. There is an experimental patch, but I am waiting until it is more fleshed out. Right now I am just encrypting any private files with GPG before storing them.

Footnotes