Encrypted GlusterFS storage cluster
- encryption at rest
- snapshots
- growable
- replicated
- distributed
- spindown disks when not in use
Partition the disks
Configure the partitions to be used for the storage cluster.
For disks that are to be used completely for the storage cluster I recommend a GPT partition table with a single partition spanning the whole disk.
The GPT partition table makes it possible for the partition to be named and given a type making it easier to recognize the disks purpose.
I generally go with names like "glusterfs01b001", "glusterfs01b002", etcetera, to indicate they are GlusterFS bricks belonging to GlusterFS cluster 1.
If you expect to have multiple volumes per cluster you might want to add that to the name.
The outermost "layer" of the data will be encrypted with a LUKS header and therefore I find partition type 8309 (Linux LUKS) the most appropriate type.
Both the gdisk
and cgdisk
command line programs do a great job here.
When re-using a disk I create a new GPT partition table with fdisk -w always -W always /dev/...
, g
, w
before gdisk
/cgdisk
for a clean result.
If you are getting partition alignment issues disabling UAS might help.
Encrypt and open the disks
cryptsetup
has a lot of options to tune the encryption.
Which options are the right choice depend on the hardware and goals.
For the sake of simplicity I will not specify any options here.
The Arch Linux Documentation is a good read to find out more.
cryptsetup luksFormat /dev/disk/by-partlabel/$BRICKNAME
cryptsetup open /dev/disk/by-partlabel/$BRICKNAME $BRICKNAME
If your hardware/goals require certain options they can be added to the first command, e.g.:
cryptsetup luksFormat --iter-time 5000 /dev/disk/by-partlabel/$BRICKNAME
There should now be "virtual" blockdevices available at /dev/mapper/$BRICKNAME
.
These blockdevices can be used normally.
dm-crypt
will make sure the data written to them will end up encrypted on /dev/disk/by-partlabel/$BRICKNAME
.
The "virtual" blockdevices will be available until cryptsetup close $BRICKNAME
is executed or the system reboots.
The encrypted blockdevices can be opened automatically during system boot by configuring them in /etc/crypttab
, more on that later.
LVM
GlusterFS snapshots rely on LVM thin provisioning.
Create the LVM Physical Volumes and LVM Volume Groups first.
pvcreate /dev/mapper/$BRICKNAME
vgcreate $BRICKNAME /dev/mapper/$BRICKNAME
Followed by an LVM Logical Volume Thin Pool.
Search man lvcreate
and man lvmthin
for "chunk" for pointers on an appropriate $CHUNKSIZE
.
Add -Zn
when using large chunk sizes.
lvcreate -c $CHUNKSIZE -Zn -l 100%FREE --thinpool ${BRICKNAME}_tp $BRICKNAME
Finally the thinly provisioned LVM Logical Volume.
Use lvs --units b
to get the value for $LVSIZE
.
lvcreate -V $LVSIZE --thin -n $BRICKNAME $BRICKNAME/${BRICKNAME}_tp
There should now be "virtual" blockdevices available at /dev/mapper/$BRICKNAME-$BRICKNAME
.
LVM manages the data written to them and hands it off to dm-crypt at /dev/mapper/$BRICKNAME
which will encrypt it and write it to disk at /dev/disk/by-partlabel/$BRICKNAME
.
Filesystem
GlusterFS recommends the XFS filesystem and to make sure the inode size is large enough to hold the extended attributes GlusterFS relies on.
mkfs.xfs -i size=512 /dev/mapper/$BRICKNAME-$BRICKNAME
mkdir -p /mnt/glusterfs/$BRICKNAME
mount -t xfs -o rw,inode64,noatime,nouuid /dev/mapper/$BRICKNAME-$BRICKNAME /mnt/glusterfs/$BRICKNAME
mkdir /mnt/glusterfs/$BRICKNAME/brick
Hosts
Open and mount the partitions
The disks should now be moved to the hosts that will be serving the bricks.
To ensure they are cryptsetup open
-ed at boot edit /etc/crypttab:
TODO: /etc/crypttab
Append the following to /etc/fstab
:
/dev/mapper/$BRICKNAME-$BRICKNAME /mnt/glusterfs/$BRICKNAME xfs rw,inode64,noatime,nouuid 1 2
Make sure the host has the mount point:
mkdir -p /mnt/glusterfs/$BRICKNAME
Reboot to verify the changes.
Networking
The hosts serving the bricks need to be able to find eachother.
Even if all your bricks are served by the same host it will need to have a name and resolve itself using that name.
Simple setups can be managed by editing /etc/hosts
.
When testing on a single host any 127.x.x.x (e.g. 127.0.1.1) address line containing the hostname is sufficient.
For multiple hosts with static IP addresses ensure /etc/hosts
on every host has entries for itself and all other hosts.
Hostname suggestion: "glusterfs01h001".
Make sure the glusterd
GlusterFS daemon is running.
Can be as simple as executing glusterd
from the command line.
But it can also run as a systemd service or other process manager.
Ensure all of the above has been set up for all the hosts before continuing.
SSH to one the hosts to set up the Trusted Storage Pool by executing the following for all other hosts:
gluster peer probe $OTHER_HOSTNAME
No need to run this on any of the other hosts, GlusterFS takes care of that.
GlusterFS volume
Execute a variation of the gluster volume create
command, examples below:
1 host, 1 brick, no replication, no distribution
This is an okay start, more bricks can be added later for replication, distribution, or both.
gluster volume create $VOLUME_NAME $HOSTNAME:/mnt/glusterfs/$BRICKNAME/brick
1 host, 2 bricks, no replication, distributed
This results in a volume size equal to the size of the 2 bricks combined.
gluster volume create $VOLUME_NAME $HOSTNAME:/mnt/glusterfs/$BRICKNAME1/brick $HOSTNAME:/mnt/glusterfs/$BRICKNAME2/brick
1 host, 2 bricks, replica 2, no distribution
This results in a volume size equal to the smallest of the 2 bricks.
gluster volume create $VOLUME_NAME replica 2 $HOSTNAME:/mnt/glusterfs/$BRICKNAME1/brick $HOSTNAME:/mnt/glusterfs/$BRICKNAME2/brick
Etcetera
Setting up GlusterFS Volumes Documentation
Spindown
The disks themselves need to be configured to spindown.
This can be done using the hdparm
utility.
These settings are not persisted across reboots, for that check /etc/hdparm.conf
.
hdparm -B 127 /dev/disk/by-partlabel/$BRICKNAME
hdparm -S 12 /dev/disk/by-partlabel/$BRICKNAME
The GlusterFS volume must also be configured to not perform filesystem healthchecks. This setting seems to only take effect after the volume is restarted.
gluster volume set $VOLUME_NAME storage.health-check-interval 0
gluster volume stop $VOLUME_NAME
gluster volume start $VOLUME_NAME