Release Notes

1. Basic Info

The Qlustar 11.0 release is based on Ubuntu 18.04. It includes all security fixes and other package updates published before June 13th 2019. Available security updates relevant to Qlustar 11.0, that have appeared after this date, will be announced on the Qlustar website and in the Qlustar security newsletter. Supported edge-platforms are Ubuntu 18.04 (Bionic), CentOS 7 and CentOS 8 with integration of OpenHPC 1.3.8.

2. New features

2.1. dnsmasq

dnsmasq is now employed as a central Qlustar component to consolidate three network services previously provided by three independent daemons and hence significantly reduces complexity on the head-node. In addition it provides DNS proxy services per default which had to be configured manually in earlier Qlustar releases.

DHCP

dnsmasq now provides cluster-internal DHCP services replacing the previously used ISC DHCP server software.

TFTP

dnsmasq also acts as a TFTP/PXE boot server making the previously used atftpd obsolete.

Cluster-internal DNS

dnsmasq provides DNS name resolution both for cluster-internal nodes as well as external machines via its proxy functionality. As a consequence, the NIS host map is not needed anymore and has been removed for new installations.

Legacy support

When updating from an earlier release, you have the option to keep the previous DHCP/TFTP setup through a legacy option for now (see below).

2.2. QluMan 11.0

Network Mount Management

QluMan has a new Config Class that allows to configure and assign network mounts to nodes. Initially NFS mounts including RDMA variants are supported. They are setup on the nodes as systemd automount units. + QluMan automatically activates NFSoRDMA on clients that support it with an option to switch back to TCP if needed. It also allows to define priorities for the available networks, so that the network to be used for a mount is optimally chosen if a node provides multiple paths to the corresponding NFS server.

DNSmasq Management

The newly introduced dnsmasq service is fully manageable via QluMan. A new dialog has been introduced to add external DNS servers and search domains, as well as name/IPs for cluster external machines and other global network related settings.

Preview node-specific configs

The context menu of a node in the Enclosure View now includes an entry that allows a preview of all host-specific files/configs that are assigned and sent to a node when booting.

3. Major component updates

Kernel 4.19

Qlustar 11.0 is based on the 4.19 LTS kernel series (Ubuntu only). The new kernel option mitigations=off has been added to QluMan for an easy way to run compute nodes without performance penalties from CPU security bug mitigations (Spectre etc.).

Slurm

Qlustar 11.0 introduces the Slurm 18.08 series with the current version being 18.08.7.

OpenMPI

Qlustar 11.0 upgrades to OpenMPI 4.0.1 now using ucx transport per default. Furthermore, we added support for multiple gcc versions, now with a gcc7 flavor based on the Ubuntu default compiler (gcc 7.4.0) and a gcc8 flavor based on gcc 8.3.

Nvidia CUDA

Qlustar 11.0 provides optimal support for Nvidia GPU hardware by supplying pre-compiled and up-to-date kernel drivers as well as CUDA 10.1.

BeeGFS

Qlustar 11.0 has integrated the most recent BeeGFS release 7.1.3 for clients and servers with ready-to-use image modules.

4. Other notable package version updates

  • rdma-core: 21.0 (Ubuntu only, on CentOS, the original RHEL OFED stack is used).

  • Intel/PGI Compiler support: The Qlustar wrapper packages have been updated to support the new version of the Intel parallel studio (2019) and PGI community edition 2018.4/10 (package qlustar-pgi-dev-tools). Corresponding OpenMPI package variants for this compiler are also provided (both Ubuntu only).

  • Lustre: 2.12.4

  • ZFS: 0.7.13

  • singularity: 3.2.0

  • openblas: 0.3.5

  • hwloc: 2.0.3

  • ucx: 1.5.1

  • libpsm2: 11.2.68

  • OmniPath (OPA) stack: 10.8.0.0.201/2

5. General changes/improvements

The automount daemon together with the corresponding NIS maps have been replaced in favor of the new network mount config class provided by QluMan (see above). Legacy setups based on automount/NIS will continue to function and will be supported.

6. Update instructions

  1. Preparations

    Upgrading to Qlustar 11.0 is only supported from a 10.1.x release.

    Make sure that you have no unwritten changes in the QluMan database. If you do, write them to disk as described in the QluMan Guide before proceeding with the update.

  2. Optionally clone chroots

    Clone existing Ubuntu 16.04 and CentOS7 chroots based on 10.1 and then afterwards upgrade the clones to 11.0. That allows for easy rollback.

  3. Update to 11.0 package sources list

    The Qlustar apt sources list needs to be changed as follows both on the head-node(s) and in all existing Ubuntu based chroot(s) that should be updated.

    0 root@cl-head ~ #
    apt update
    0 root@cl-head ~ #
    apt install qlustar-sources-list-11.0

    To prepare a CentOS7 based chroot for the upgrade, change into it and execute the following:

    (centos7-11.0) 0 root@cl-head ~ #
    sed -i -e 's/10.1/11.0/g'  /etc/yum.repos.d/qlustar-10.1-centos7.repo
  4. Update packages

    On the head-node execute

    0 root@cl-head ~ #
    apt update
    0 root@cl-head ~ #
    apt dist-upgrade

    When asked about whether you want to update the configuration file for some package, you should answer 'N' (keep the old version) unless you have a specific reason to change it.

    Change into each Ubuntu based chroot you want to update (e.g.)

    0 root@cl-head ~ #
    chroot-bionic

    and also execute:

    (bionic) 0 root@cl-head ~ #
    apt update
    (bionic) 0 root@cl-head ~ #
    apt dist-upgrade

    Change into each CentOS based chroot you want to update (e.g.)

    0 root@cl-head ~ #
    chroot-centos7

    and execute (twice the same command):

    (centos7-11.0) 0 root@cl-head ~ #
    yum update
    (centos7-11.0) 0 root@cl-head ~ #
    yum update

    Confirm the import of the new Qlustar GPG key.

  5. Reboot head-node(s)

    Initially only reboot the head-node(s).

  6. Regenerating Qlustar images

    Regenerate your Qlustar images with the 11.0 image modules. To accomplish this, you’ll have to select Version 11.0 in the QluMan Qlustar Images dialog. If you have new cloned chroots, select those as well.

    If your images include image modules that have a version in their name (e.g. beegfs-6-server), make sure that you change to the corresponding module with the most recent version (e.g. beegfs-7-server).

  7. Migration to GRUB PXE booting To enable support for UEFI PXE booting, GRUB is now used per default as the boot loader for both EFI and PC BIOS netboot nodes. To make this work, the new package qlustar-headnode must be installed at this time.

    This is absolutely necessary, otherwise PXE booting won’t work anymore.

    If for some reason, you want to keep booting PC BIOS nodes via the legacy PXELINUX mechanism, you will have to install the package qlustar-netboot-compat.

  8. Migration to dnsmasq

    Migration to dnsmasq is not absolutely required during this upgrade, but highly recommended. However, for PXE booting of UEFI nodes it is mandatory. If you want to delay the migration for now, you can do so by checking the legacy checkbox in QluMan. In this case you still have to disable systemd-timesyncd though in order for ntp to work:

    0 root@cl-head ~ #
    service systemd-timesyncd stop
    systemctl disable systemd-timesyncd

    You also have to install the package qlustar-netboot-compat in this case. After this, you can reboot the head-node once more and proceed with step 8.

    Support for the old legacy setup with separate DHCP and TFTP servers will be discontinued starting with the 12.0 release. So you’ll have to do the migration sometime before updating to 12.0. To do so, just follow the remainder of this step at any time.

    To start the migration you first have to install dnsmasq and disable the old DHCP and TFTP server as well as some systemd services:

    0 root@cl-head ~ #
    apt install dnsmasq
    0 root@cl-head ~ #
    for s in isc-dhcp-server atftpd systemd-resolved systemd-timesyncd; do
      service $s stop
      systemctl disable $s
    done

    Now add at least one of your external DNS servers in the corresponding QluMan dialog and afterwards write the dnsmasq config.

    This write step is essential. If you forget it, qlumand won’t be able to start after a reboot and you’ll be left with a system in an inconsistent state that needs manual repair.

    Also remove nis from the hosts entry in nsswitch.conf

    0 root@cl-head ~ #
    sed -ie 's/^hosts:./hosts:\t\tfiles dns/g' /etc/nsswitch.conf*

    and change the dns-nameservers entry in /etc/network/interfaces to localhost, so that the head-node itself is using dnsmasq for DNS lookups.

    0 root@cl-head ~ #
    sed -ie 's/  dns-nameservers./  dns-nameservers localhost/g' /etc/network/interfaces*

    Finally reboot the head-node once more. Once it is up again, check that dnsmasq is running:

    0 root@cl-head ~ #
    service dnsmasq status

    If everything is working as expected, you can remove the now obsolete packages:

    0 root@cl-head ~ #
    apt remove atftpd isc-dhcp-server
  9. Write config file changes

    To activate all remaining changes in the QluMan database that were introduced by the update, they need to be written to disk now. Check the QluMan Guide about how to write such changes.

  10. Reboot all netboot nodes

    After the regeneration of the images is complete, and all the above steps have been done, you can reboot all other nodes in the cluster, including virtual FE nodes. This completes the update procedure.

  11. Adjust root bash shell initialization

    Starting from this release, Qlustar supplied bash shell functions and some aliases for root on the head-node have been migrated to the qlustar-base package and now live underneath /etc/qlustar. Previously they have been generated on the fly during installation, and hence were not easily upgradeable.

    To move to the new scheme, the following modifications have to be made manually with a text editor on the cluster head-node(s). In the file /root/.bashrc replace the content at the end from

    # Source other files
    source_list="/root/.bash_aliases /root/.bash_functions"
    shopt -oq posix || source_list="$source_list /etc/bash_completion"
    for c in $source_list ; do
      [ -r $c ] && . $c ; done; unset c

    to

    # Source misc bash stuff:
    # First global Qlustar aliases/functions (from qlustar-base package)
    source_list="/etc/qlustar/bash.aliases /etc/qlustar/bash.functions"
    # ... then the local equivalents
    source_list="$source_list /root/.bash_aliases /root/.bash_functions"
    # Finally activate additional completions
    shopt -oq posix || source_list="$source_list /etc/bash_completion"
    for c in $source_list ; do
      [ -r $c ] && . $c ; done; unset c

    In the file /root/.bash_aliases delete or comment out the following two lines:

    alias console-fe-vm="connect_screen login-vm"
    alias console-demo-vms="connect_screen demo-system"

    These two aliases are now provided globally and the old ones will not work anymore.

7. Changelogs

A detailed log of changes in the image modules can be found in the directories /usr/share/doc/qlustar-module-<module-name>-*-amd64-11.0.0. As an example, in the directory /usr/share/doc/qlustar-module-core-xenial-amd64-11.0.0 you will find a summary changelog in core.changelog, a complete list of packages with version numbers entering the current core module in core.packages.version.gz, a complete changelog of the core modules package versions in core.packages.version.gz and finally a complete log of changed files in core.contents.changelog.gz.