Boot Process

This section describes the boot process of Qlustar cluster-nodes.

Compute-node booting

The boot process of the compute-nodes follows precise rules. It takes place in six steps:

  1. The PXE boot ROM of the network card sends a DHCP request. If the node is already registered in QluMan, the request is answered by the DHCP server running on the head-node(s), allowing the adapter to configure its basic IP settings.

  2. The boot ROM requests a PXE loader program from the TFTP server on the head-node (the TFTP server specified by DHCP could also be on another node, but this is not the default). The PXE loader is then sent to the compute-node via TFTP.

  3. PXELinux downloads the Qlustar Linux kernel and the assigned initial RAM-disk image and boots the kernel. This image doesn’t hold the final OS, it has just enough functionality to download the real OS image in the next step.

  4. A Qlustar specific script /init is executed as the initial init process. This script sets up basic networking functionality for the boot NIC and starts the Qlustar multicast client ql-mcast-client to download the real node OS image assigned to the node. It does so by connecting to the Qlustar multicast image server ql-mcastd on the headnode to request a multicast IP and port. The OS image for the node will then be streamed to all nodes which requested the same image at that time. Should multicast fail, a slower unicast fallback is used to download the OS image.

    After the real OS image is downloaded and it’s checksum verified, a unionfs filesystem structure is setup under /union. The image is then unpacked into /union/image as one component of the union. A tmpfs to store runtime changes made to the image filesystem is created as a second component and finally a third, empty one is reserved for an optional chroot to be added later.

    At the end, the system moves to the next boot stage by changing the root filesystem to the just created unionfs. Control is then passed to the 2nd Qlustar init script /sbin/init.qlustar.

  5. The latter first executes systemd-udevd to trigger the auto-loading of the full set of kernel drivers, and then starts QluMan execd in a one-shot configure-mode. Hereby execd a) receives all the node-specific options from the head-node’s qlumand and b) executes corresponding scripts to process the options received.

    This dynamic customization/configuration of the node must be done before systemd starts. Among others, the following tasks are performed at this stage: Setup of systemd units for QluMan defined Network FS mounts, Root FS Customization, synchronization of the system time with the head-node(s), pam/sssd customization, enabling of NIS and OpenSM configuration (both optional). Finally, if configured and present, local disks are initialized/mounted before running any Root FS Customization scripts transferred to /lib/qlustar/init.d.

  6. When all the above is finished, control is finally passed to systemd as the final init process. From here on, the boot procedure continues in the standard Linux fashion.

Log files concerning the Qlustar specific boot phase are located under /var/log/qlustar. Adding the parameter early-shell to the kernel cmdline, will interrupt the boot process when entering /sbin/init.qlustar and give the opportunity to debug in case of possible problems. The kernel parameter debug will produce lots of debug messages during this phase.

TFTP Boot Server

The TFTP server component of dnsmasq transfers the boot image to the compute-nodes. All files that should be served by tftp must reside in the directory /var/lib/tftpboot. On a Qlustar installation, it contains three symbolic links:

pxelinux.0 -> /usr/lib/syslinux/pxelinux.0
pxelinux.cfg -> /etc/qlustar/pxelinux.cfg
qlustar -> /var/lib/qlustar

The directory /etc/qlustar/pxelinux.cfg contains the PXE boot configuration files for the compute-nodes. There is a default configuration that applies to any node without an assigned custom boot configuration in QluMan. For every host with a custom boot configuration, QluMan adds a symbolic link pointing to the actual configuration file. The links are named after the node’s Hostid, which you can find out with the gethostip command. For more details about how to define boot configurations see the corresponding section of the QluMan Guide.

RAM-disk OS images

The squashfs-based RAM-disk image is the file-system holding the node OS that is mounted as the root filesystem of the compute-nodes. It is assembled on the head-node(s) from the image modules, you are able to select in QluMan. Every RAM-disk image contains at least the core module. See the corresponding section of the QluMan Guide for more details. All available image modules are displayed and selectable in QluMan and the configuration and assembly of images is done automatically from within QluMan.

By default, the root password of a Qlustar OS image and hence the node booting it, is taken from the head-node(s) /etc/shadow file and is therefore the same as on the head-node(s). If you want to change this, you can call qlustar-image-reconfigure <image-name>. (Replacing <image-name> with the actual name of the image). You can then specify a different root password for the node OS image.

Changelogs

Any Qlustar node OS image contains changelogs of the various image modules it is composed of. They are located in the directory /usr/share/doc/qlustar-image. The main changelog file is core.changelog.gz. The other files are automatically generated. The files .packages.version.gz lists the packages each module is made of. The files .contents.changelog*.gz lists the files that were changed between each version, and .packages.changelog.gz list differences in the package list and versions. Hence, you always have detailed information about what has been changed in new images as well as the package sources of their content.

Static node OS image customization/modification

Node OS images are regenerated automatically, when the image module packages they are based on are updated. That means, that files can’t be simply modified or added to a generated image as the changes would be lost on the next update.

Qlustar therefore provides a mechanism to add extra files to images every time they are rebuild and hence make changes permanent. Files can be added to all images or only to one specific image using the qlustar-image-edit tool. All the commands in this section must be executed as root on the head-node.

To modify or add the file /some/path/filename to all images execute:

0 root@cl-head ~ #
qlustar-image-edit -e /some/path/filename

To modify/add the file to a specific image <img>:

0 root@cl-head ~ #
qlustar-image-edit -e img /some/path/filename

To edit the file again later, simply run the same command again.

Files created this way will be located underneath the path /etc/qlustar/images on the head-node, either in the sub-directory common (for files entering all images) or in the sub-directory img (for files entering just the image img).

The whole directory structure of a file is created there so the full path of the above examples would be /etc/qlustar/images/common/copy/some/path/filename or /etc/qlustar/images/img/copy/some/path/filename respectively. To undo adding such files to the images, simply remove these files.

A second mode of qlustar-image-edit is to directly edit the generated images. Such changes are always temporary meaning they will be overwritten by image module updates. This method is suitable to apply a quick fix for a problem that is known to be solved in subsequent image module versions or for testing.

To edit the initial RAMdisk of the image img execute:

0 root@cl-head ~ #
qlustar-image-edit -i img

To edit the squashfs OS image do:

0 root@cl-head ~ #
qlustar-image-edit -s img

In both cases, you will be placed into the root directory of the corresponding initrd/image. You can then manipulate any file in the initrd/image or add new ones to it. When done, enter exit and the initrd/image will be regenerated. Alternatively enter exit 1 to abort and throw away any modifications.

By manipulating the image in this way, you can easily break things and in the worst case make the OS unbootable. Please be aware that you’re on your own, if you choose to experiment with the above methods. In other words: The Qlustar team won’t be able to give support for problems arising from a modified OS initrd/image. You can reset the initramfs and squashfs images to the original content using qlustar-image-reconfigure <name of image>.

Dynamic node OS image customization/modification

Rebuilding images is time consuming and changes made to an image apply to all nodes using the same image. This makes customizations somewhat unflexible. To improve on this, Qlustar provides another mechanism to customize/modify node OS Images. It is applied in the pre-systemd boot phase to target node-specific customizations assignable via QluMan.

This is implemented by the Root FS Customization config class in qluman-qt. For details about how to create such a config class and how to assign it to nodes see the corresponding section of the QluMan Guide.

The files and directory structure for the Root FS Customization is stored below /var/lib/qlustar/root-fs/<custom> on the head-node(s) where <custom> is the name of the config as defined in QluMan. The qlustar-image-edit tool provides shortcuts to create/edit or delete files. More complex operations like changing file ownership or permissions must be done from the shell directly using the full path.

To create or edit the file /some/path/filename for a Root FS Customization config named custom execute:

0 root@cl-head ~ #
qlustar-image-edit -r -e custom /some/path/filename

This will use sensible-editor to open the file in an editor, honoring your EDITOR and VISUAL settings or using the system default editor.

To delete the file execute:

0 root@cl-head ~ #
qlustar-image-edit -r -d custom /some/path/filename

Run-time node customization/configuration via QluMan execd

The QluMan execution server qluman-execd runs on any node of a Qlustar cluster. It is one of Qlustar’s main components, responsible for executing remote commands, writing configurations to disk, as well as monitoring.

When a compute-node boots, qluman-execd initially starts in a one-shot fashion (starts and exits when done with its configuration tasks) during the pre-systemd boot phase (see Compute-node booting for details). At this stage, it performs a number of initialization/configuration tasks depending on the node’s configuration settings defined in QluMan. Generated option files are written under /etc/qlustar/options.d. The following is a list of these tasks:

Network configuration

Configuration of all network parameters in the corresponding configuration files, so that they can be activated later on by systemd. The information is written to /etc/network/interfaces.d/qluman (Ubuntu nodes), or in adapter specific files under /etc/sysconfig/network-scripts (CentOS nodes).

Disk configuration

Writing of the host’s QluMan defined disk configuration into the file /etc/qlustar/options.d/disk-config for later use by the disk initialization script.

Setup of Network FS mounts

Writing of systemd (auto)mount unit files according to the Network FS mounts config assigned to the host in QluMan.

Infiniband OpenSM activation

Activation of OpenSM in case the node is configured to run it.

IPMI IP configuration

Reconfiguration of the node’s IPMI address, if activated for the node in QluMan.

UnionFS chroot

An optionally assigned custom unionFS chroot will be setup instead of the one that is defined in the Qlustar image.

SSH authorized_keys

The ssh keys that are configured in QluMan to allow password-less login to the node as root, are copied into /root/.ssh/authorized_keys.

Root FS Customization

If one or more Root FS Customization configs are assigned to a node then the corresponding directory structure(s) under /var/lib/qlustar/root-fs/config will be sent to the node preserving the user, group and permissions of each file or directory. Any files ending up in /lib/qlustar/init.d will be executed in alphanumeric order at the end of the 2nd stage boot script /sbin/init.qlustar just before handing over control to systemd.

Miscellaneous

If configured in QluMan, mail transport will be activated on the node and ssh access for normal users will be limited to those having a running slurm job on the node (by making changes to the pam config).

For details about the configuration of the above components, see the corresponding sections of the QluMan Guide.

All files being written to a node by qluman-execd in the pre-systemd boot phase can be previewed in the QluMan GUI.