Untitled :: Qlustar Documentation

Boot Process

This section describes the boot process of Qlustar cluster-nodes.

Compute-node booting

The boot process of the compute-nodes follows precise rules. It takes place in six steps:

The PXE boot ROM of the network card sends a DHCP request. If the node is already registered in QluMan, the request is answered by the DHCP server running on the head-node(s), allowing the adapter to configure its basic IP settings.
The boot ROM requests a PXE loader program from the TFTP server on the head-node (the TFTP server specified by DHCP could also be on another node, but this is not the default). The PXE loader is then sent to the compute-node via TFTP.
PXELinux downloads the Qlustar Linux kernel and the assigned initial RAM-disk image and boots the kernel. This image doesn’t hold the final OS, it has just enough functionality to download the real OS image in the next step.

A Qlustar specific script /init is executed as the initial init process. This script sets up the unionFS filesystem structure as well as basic networking functionality and finally starts the Qlustar multicast client to download the real OS image assigned to the node and delivered by the Qlustar multicast image server ql-mcastd. This image is unpacked into the path /union/image.

  The system then moves to the next boot stage by changing the root filesystem to the new
  unpacked OS image. Control is now passed to the 2nd Qlustar init script `/sbin/init.qlustar`.
. `/sbin/init.qlustar` first executes systemd-udevd to trigger the auto-loading of kernel
  drivers and then starts `QluMan execd` in a one-shot configure-mode. Hereby `QluMan execd` a)
  receives all the node-specific options from `qlumand` running on the head-node and b)
  executes corresponding scripts to process the options received.

This dynamic customization/configuration of the node must be done before systemd
starts. Among others, the following tasks are performed at this stage: Initialization of
local disks, setup of systemd units for QluMan defined `Network FS mounts`, `pam`
customization, `sssd` customization, `OpenSM` configuration (if desired) and a check whether
`NTP` can synchronize the time from the head-node(s).

  Log files concerning this boot phase are located under `/var/log/qlustar`. Adding the
  parameter _early-shell_ to the kernel cmdline, will interrupt the boot process when entering
  `/sbin/init.qlustar` and give the opportunity to debug in case of possible problems. The
  kernel parameter _debug_ will produce lots of debug messages during this phase.
. When all the above is finished, control is finally passed to `systemd` as the final init
  process.  From here on, the boot procedure proceeds in the standard Linux fashion.

The details of the node boot process have changed substantially while going from Qlustar 9.2 to 10.0, then 10.1 and finally 11.0. So make sure you check the documentation of the release you’re running to obtain correct information.

TFTP Boot Server

The TFTP server component of dnsmasq transfers the boot image to the compute-nodes. All files that should be served by tftp must reside in the directory /var/lib/tftpboot. On a Qlustar installation, it contains three symbolic links:

pxelinux.0 -> /usr/lib/syslinux/pxelinux.0
pxelinux.cfg -> /etc/qlustar/pxelinux.cfg
qlustar -> /var/lib/qlustar

The directory /etc/qlustar/pxelinux.cfg contains the PXE boot configuration files for the compute-nodes. There is a default configuration that applies to any node without an assigned custom boot configuration in QluMan. For every host with a custom boot configuration, QluMan adds a symbolic link pointing to the actual configuration file. The links are named after the node’s Hostid, which you can find out with the gethostip command. For more details about how to define boot configurations see the corresponding section of the QluMan Guide.

RAM-disk OS images

The squashfs-based RAM-disk image is the file-system holding the node OS that is mounted as the root filesystem of the compute-nodes. It is assembled on the head-node(s) from the image modules, you are able to select in QluMan. Every RAM-disk image contains at least the core module. See the corresponding section of the QluMan Guide for more details. All available image modules are displayed and selectable in QluMan and the configuration and assembly of images is done automatically from within QluMan.

By default, the root password of a Qlustar OS image and hence the node booting it, is taken from the head-node(s) /etc/shadow file and is therefore the same as on the head-node(s). If you want to change this, you can call qlustar-image-reconfigure <image-name>. (Replacing <image-name> with the actual name of the image). You can then specify a different root password for the node OS image.

Changelogs: Any Qlustar node OS image contains changelogs of the various image modules it is composed of. They are located in the directory /usr/share/doc/qlustar-image. The main changelog file is core.changelog.gz. The other files are automatically generated. The files .packages.version.gz lists the packages each module is made of. The files .contents.changelog*.gz lists the files that were changed between each version, and .packages.changelog.gz list differences in the package list and versions. Hence, you always have detailed information about what has been changed in new images as well as the package sources of their content.

Static node OS image customization/modification

Node OS images are regenerated automatically, when the image module packages they are based on are updated. That means, that files can’t be simply modified or added to a generated image as the changes would be lost on the next update.

Qlustar therefore provides a mechanism to add extra files to images every time they are rebuild and hence make changes permanent. Files can be added to all images or only to one specific image using the qlustar-image-edit tool. All the commands in this section must be executed as root on the head-node.

To modify or add the file /some/path/filename to all images execute:

0 root@cl-head ~ #
qlustar-image-edit -e /some/path/filename

To modify/add the file to a specific image <img>:

0 root@cl-head ~ #
qlustar-image-edit -e img /some/path/filename

To edit the file again later, simply run the same command again.

Files created this way will be located underneath the path /etc/qlustar/images on the head-node, either in the sub-directory common (for files entering all images) or in the sub-directory img (for files entering just the image img).

The whole directory structure of a file is created there so the full path of the above examples would be /etc/qlustar/images/common/copy/some/path/filename or /etc/qlustar/images/img/copy/some/path/filename respectively. To undo adding such files to the images, simply remove these files.

A second mode of qlustar-image-edit is to directly edit the generated images. Such changes are always temporary meaning they will be overwritten by image module updates. This method is suitable to apply a quick fix for a problem that is known to be solved in subsequent image module versions or for testing.

To edit the initial RAMdisk of the image img execute:

0 root@cl-head ~ #
qlustar-image-edit -i img

To edit the squashfs OS image do:

0 root@cl-head ~ #
qlustar-image-edit -s img

In both cases, you will be placed into the root directory of the corresponding initrd/image. You can then manipulate any file in the initrd/image or add new ones to it. When done, enter exit and the initrd/image will be regenerated.

By manipulating the image in this way, you can easily break things and in the worst case make the OS unbootable. Please be aware that you’re on your own, if you choose to experiment with the above methods. In other words: The Qlustar team won’t be able to give support for problems arising from a modified OS initrd/image.

QluMan Remote Execution Server

The QluMan execution server (qluman-execd) runs on any head- and compute-node of a Qlustar cluster. It is one of Qlustar’s main components, responsible for executing remote commands as well as writing configurations to disk.

Dynamic Node Customization/Configuration

When a compute-node boots, qluman-execd initially starts in a one-shot fashion (starts and exits when done with its configuration tasks) during the pre-systemd boot phase (see Compute-node booting for details). At this stage, it performs a number of initialization/configuration tasks depending on the node’s QluMan configuration options. Generated option files are written under /etc/qlustar/options.d. The following is a list of these tasks:

Network configuration: Configuration of all network parameters in the corresponding configuration files, so that they can be activated later on by systemd. If the node boots Ubuntu or Debian, the information is written to /etc/network/interfaces.d/qluman, for CentOS in adapter specific files under /etc/sysconfig/network-scripts.
Disk configuration: Writing of the host’s QluMan defined disk configuration into the file /etc/qlustar/options.d/disk-config for later use by the disk initialization script.
Setup of Network FS mounts: Writing of systemd (auto)mount unit files according to the Network FS mounts config the host is assigned to in QluMan.
Infiniband OpenSM activation: Activation of OpenSM in case the node is configured to run it.
IPMI IP configuration: Reconfiguration of the node’s IPMI address, if a node is configured correspondingly within QluMan.
UnionFS chroot: If configured in QluMan, a custom unionFS chroot will be setup rather than the one that is defined in the Qlustar image.
SSH authorized_keys: The ssh keys that are configured in QluMan to allow password-less login to the node as root, are copied into /root/.ssh/authorized_keys.
Miscellaneous: If configured in QluMan, mail transport will be activated on the node and ssh access for normal users will be limited to those users having a running slurm or torque job on the node by making changes to the pam config.

For details about the configurationof the above components, see the corresponding section of the QluMan Guide.