Encrypted GlusterFS storage cluster

encryption at rest
snapshots
growable
replicated
distributed
spindown disks when not in use

Partition the disks

Configure the partitions to be used for the storage cluster. For disks that are to be used completely for the storage cluster I recommend a GPT partition table with a single partition spanning the whole disk. The GPT partition table makes it possible for the partition to be named and given a type making it easier to recognize the disks purpose. I generally go with names like "glusterfs01b001", "glusterfs01b002", etcetera, to indicate they are GlusterFS bricks belonging to GlusterFS cluster 1. If you expect to have multiple volumes per cluster you might want to add that to the name. The outermost "layer" of the data will be encrypted with a LUKS header and therefore I find partition type 8309 (Linux LUKS) the most appropriate type. Both the gdisk and cgdisk command line programs do a great job here. When re-using a disk I create a new GPT partition table with fdisk -w always -W always /dev/..., g, w before gdisk/cgdisk for a clean result.

If you are getting partition alignment issues disabling UAS might help.

Encrypt and open the disks

cryptsetup has a lot of options to tune the encryption. Which options are the right choice depend on the hardware and goals. For the sake of simplicity I will not specify any options here. The Arch Linux Documentation is a good read to find out more.

cryptsetup luksFormat /dev/disk/by-partlabel/$BRICKNAME
cryptsetup open /dev/disk/by-partlabel/$BRICKNAME $BRICKNAME

If your hardware/goals require certain options they can be added to the first command, e.g.:

cryptsetup luksFormat --iter-time 5000 /dev/disk/by-partlabel/$BRICKNAME

There should now be "virtual" blockdevices available at /dev/mapper/$BRICKNAME. These blockdevices can be used normally. dm-crypt will make sure the data written to them will end up encrypted on /dev/disk/by-partlabel/$BRICKNAME.

The "virtual" blockdevices will be available until cryptsetup close $BRICKNAME is executed or the system reboots. The encrypted blockdevices can be opened automatically during system boot by configuring them in /etc/crypttab, more on that later.

LVM

GlusterFS snapshots rely on LVM thin provisioning.

Create the LVM Physical Volumes and LVM Volume Groups first.

pvcreate /dev/mapper/$BRICKNAME
vgcreate $BRICKNAME /dev/mapper/$BRICKNAME

Followed by an LVM Logical Volume Thin Pool. Search man lvcreate and man lvmthin for "chunk" for pointers on an appropriate $CHUNKSIZE. Add -Zn when using large chunk sizes.

lvcreate -c $CHUNKSIZE -Zn -l 100%FREE --thinpool ${BRICKNAME}_tp $BRICKNAME

Finally the thinly provisioned LVM Logical Volume. Use lvs --units b to get the value for $LVSIZE.

lvcreate -V $LVSIZE --thin -n $BRICKNAME $BRICKNAME/${BRICKNAME}_tp

There should now be "virtual" blockdevices available at /dev/mapper/$BRICKNAME-$BRICKNAME. LVM manages the data written to them and hands it off to dm-crypt at /dev/mapper/$BRICKNAME which will encrypt it and write it to disk at /dev/disk/by-partlabel/$BRICKNAME.

Filesystem

GlusterFS recommends the XFS filesystem and to make sure the inode size is large enough to hold the extended attributes GlusterFS relies on.

mkfs.xfs -i size=512 /dev/mapper/$BRICKNAME-$BRICKNAME
mkdir -p /mnt/glusterfs/$BRICKNAME
mount -t xfs -o rw,inode64,noatime,nouuid /dev/mapper/$BRICKNAME-$BRICKNAME /mnt/glusterfs/$BRICKNAME
mkdir /mnt/glusterfs/$BRICKNAME/brick

Hosts

Open and mount the partitions

The disks should now be moved to the hosts that will be serving the bricks.

To ensure they are cryptsetup open-ed at boot edit /etc/crypttab:

TODO: /etc/crypttab

Append the following to /etc/fstab:

/dev/mapper/$BRICKNAME-$BRICKNAME /mnt/glusterfs/$BRICKNAME xfs rw,inode64,noatime,nouuid 1 2

Make sure the host has the mount point:

mkdir -p /mnt/glusterfs/$BRICKNAME

Reboot to verify the changes.

Networking

The hosts serving the bricks need to be able to find eachother. Even if all your bricks are served by the same host it will need to have a name and resolve itself using that name. Simple setups can be managed by editing /etc/hosts. When testing on a single host any 127.x.x.x (e.g. 127.0.1.1) address line containing the hostname is sufficient. For multiple hosts with static IP addresses ensure /etc/hosts on every host has entries for itself and all other hosts. Hostname suggestion: "glusterfs01h001". Make sure the glusterd GlusterFS daemon is running. Can be as simple as executing glusterd from the command line. But it can also run as a systemd service or other process manager. Ensure all of the above has been set up for all the hosts before continuing. SSH to one the hosts to set up the Trusted Storage Pool by executing the following for all other hosts:

gluster peer probe $OTHER_HOSTNAME

No need to run this on any of the other hosts, GlusterFS takes care of that.

GlusterFS volume

Execute a variation of the gluster volume create command, examples below:

1 host, 1 brick, no replication, no distribution

This is an okay start, more bricks can be added later for replication, distribution, or both.

gluster volume create $VOLUME_NAME $HOSTNAME:/mnt/glusterfs/$BRICKNAME/brick

1 host, 2 bricks, no replication, distributed

This results in a volume size equal to the size of the 2 bricks combined.

gluster volume create $VOLUME_NAME $HOSTNAME:/mnt/glusterfs/$BRICKNAME1/brick $HOSTNAME:/mnt/glusterfs/$BRICKNAME2/brick

1 host, 2 bricks, replica 2, no distribution

This results in a volume size equal to the smallest of the 2 bricks.

gluster volume create $VOLUME_NAME replica 2 $HOSTNAME:/mnt/glusterfs/$BRICKNAME1/brick $HOSTNAME:/mnt/glusterfs/$BRICKNAME2/brick

Etcetera

Setting up GlusterFS Volumes Documentation

Spindown

The disks themselves need to be configured to spindown. This can be done using the hdparmutility. These settings are not persisted across reboots, for that check /etc/hdparm.conf.

hdparm -B 127 /dev/disk/by-partlabel/$BRICKNAME
hdparm -S 12 /dev/disk/by-partlabel/$BRICKNAME

The GlusterFS volume must also be configured to not perform filesystem healthchecks. This setting seems to only take effect after the volume is restarted.

gluster volume set $VOLUME_NAME storage.health-check-interval 0
gluster volume stop $VOLUME_NAME
gluster volume start $VOLUME_NAME