Homelab 2022 Part 2 - Samba on SmartOS using delegated datasets

Overview

SmartOS is an ephemeral type 1 hypervisor, as such anything we do to the global zone will either be lost on reboot or require hacks and workarounds to make it persistent and function like a traditional UNIX system. Instead this guide aims to provide a means of using a native zone to host shares of a large zpool without relegating that data to the zone forever. It would be entirely unacceptable to have to copy dozens of TB of data every time the zone needs to be re-provisioned.

The official documentation would lead you to believe that the only step required is setting the delegate_dataset flag. But as cautioned in the guide anything we put into this zone will share it's life cycle with the zone itself.

We won't be using this flag at all and instead I will be delegating an existing dataset in such a way that the zone can be deleted at any time while retaining our content for future zones or importing straight into any other ZFS capable operating system.

Implementation

In order to set this up you'll need a dataset somewhere, it can contain existing data or just create one for testing as you can swap it out with something different later on. In my case I'm going to be exposing a raidz pool in it's entirety.

First we'll create a joyent branded zone.

{
        "brand": "joyent",
        "alias": "shares",
        "resolvers": ["1.1.1.1", "1.0.0.1"],
        "image_uuid": "c8715b60-7e98-11ec-82d1-03d16599f529",
        "nics": [
                {
                        "interface": "net0",
                        "mtu": "1500",
                        "nic_tag": "admin",
                        "ip": "dhcp",
                        "allow_ip_spoofing": true
                }
        ],
        "quota": 20
}

Once that's booted zlogin, update your packages and install samba. Edit the smb.conf file to our liking.

pkgin update
pkgin upgrade
pkgin install nano samba
nano /opt/local/etc/samba/smb.conf

Your config should look something like this, by default our delegated dataset is going to have the same mount point as it did in the global zone.

[global]
security = share
load printers = no
guest account = nobody
log file = /var/log/samba/log.%m

[tank]
path = /tank
valid users = thetooth
public = no
writable = yes
printable = no

Now we will setup a user to login. Samba accounts are separate from system accounts but I like to set up both at the same time so we don't have any permissions problems later on.

groupadd -g 1000 thetooth
useradd -m -u 1000 -g 1000 thetooth
passwd thetooth
smbpasswd -a thetooth
chown thetooth:thetooth /tank

We have have everything ready in our zone (easy right?), except for the dataset itself. Next we'll enable the smb related daemons, exit the shell and shut down our zone.

[root@41b68ac0-fcaf-4128-c030-e822487e6ee0 ~]# svcadm enable samba:smbd
[root@41b68ac0-fcaf-4128-c030-e822487e6ee0 ~]# svcadm enable samba:nmbd
[root@41b68ac0-fcaf-4128-c030-e822487e6ee0 ~]# svcadm enable dns/multicast
[root@41b68ac0-fcaf-4128-c030-e822487e6ee0 ~]# exit
[root@jupiter ~]# vmadm stop 41b68ac0-fcaf-4128-c030-e822487e6ee0

Now use zonecfg to hand off the dataset to the zone and restart, note that the moment we do this the filesystem will be unmounted from the global zone.

[root@jupiter ~]# zonecfg -z 41b68ac0-fcaf-4128-c030-e822487e6ee0
zonecfg:41b68ac0-fcaf-4128-c030-e822487e6ee0> add dataset
zonecfg:41b68ac0-fcaf-4128-c030-e822487e6ee0:dataset> set name=tank
zonecfg:41b68ac0-fcaf-4128-c030-e822487e6ee0:dataset> end
zonecfg:41b68ac0-fcaf-4128-c030-e822487e6ee0> exit
[root@jupiter ~]# vmadm start 41b68ac0-fcaf-4128-c030-e822487e6ee0

And that should be about it, login to the zone again and you should see your dataset and mount point by running zfs list, logging in with a smb client should also work without a hitch.

Zoned property

You are probably wondering how to get your data back now, well it's actually quite easy, just stop the zone and revert the zoned property on the dataset with zfs inherit zoned {{dataset}} and issue zfs mount -a. Below is a quote from the old Oracle docs that explains how to work with this flag effectively:

When a dataset is delegated to a non-global zone, the dataset must be specially marked so that certain properties are not interpreted within the context of the global zone. After a dataset has been delegated to a non-global zone and is under the control of a zone administrator, its contents can no longer be trusted. As with any file system, there might be setuid binaries, symbolic links, or otherwise questionable contents that might adversely affect the security of the global zone. In addition, the mountpoint property cannot be interpreted in the context of the global zone. Otherwise, the zone administrator could affect the global zone's namespace. To address the latter, ZFS uses the zoned property to indicate that a dataset has been delegated to a non-global zone at one point in time.

When a dataset is removed from a zone or a zone is destroyed, the zoned property is not automatically cleared. This behavior is due to the inherent security risks associated with these tasks. Because an untrusted user has had complete access to the dataset and its descendents, the mountpoint property might be set to bad values, or setuid binaries might exist on the file systems.

To prevent accidental security risks, the zoned property must be manually cleared by the global zone administrator if you want to reuse the dataset in any way. Before setting the zoned property to off, ensure that the mountpoint property for the dataset and all its descendents are set to reasonable values and that no setuid binaries exist, or turn off the setuid property.

After you have verified that no security vulnerabilities are left, the zoned property can be turned off by using the zfs set or zfs inherit command. If the zoned property is turned off while a dataset is in use within a zone, the system might behave in unpredictable ways. Only change the property if you are sure the dataset is no longer in use by a non-global zone.

Performance tuning

Samba on Illumos does not support multi channel transfers despite being version 3 capable. If you're on 10GbE you will easily hit wirespeed without it though.

The most important thing is to ensure you're using the largest frame size possible. In the global zone, edit the /usbkey/config file and add yournicalias_mtu=9000 for your secondary interface (the one that isn't your admin nic, this can't be changed) and reboot the host.

Once the machine has booted confirm your NIC is using the desired MTU with dladm show-linkprop {{nic}}.

Now update the zone to also use the same MTU as the physical interface and reboot it.

echo '{"update_nics": [{"mac": "42:e4:a:67:4a:84", "mtu": 9000}]}' | vmadm update 41b68ac0-fcaf-4128-c030-e822487e6ee0
vmadm restart 41b68ac0-fcaf-4128-c030-e822487e6ee0

Of course set the same value on your client(s) and your should see a very decent speed up, in my case a purely sequential read from the sever yields just over 1GB/s (10GbE copper), writes are about 600-700MB/s, matching the capabilities of the array.

Anything beyond this starts hitting the single thread limit of most CPUs though. Hopefully now that 25 and 100GbE are becoming affordable for mere mortals support for multi channel and RSS will come to Illumos.