Product SiteDocumentation Site

Qlustar Cluster OS 9.1

Release Notes

Abstract

Qlustar 9.1 is a major feature release of the 9.x series. It ships QluMan 2.1 packed with new features and improvements. The highlights among the numerous component updates and bug fixes are: Slurm 15.8.5, OpenMPI 1.8.8, ZFS 0.6.5, OFED 3.18-1, Lustre 2.8 and CUDA 7.5.
1. Basic Info
2. New features
2.1. QluMan 2.1
3. Major component updates
4. Other package version updates
5. General changes/improvements
6. Important bug fixes
7. Changelogs
8. Update instructions
9. Release Notes for the 9.1.1 Maintenance Release
9.1. Basic Info
9.2. New features / tools / packages

1. Basic Info

The Qlustar 9.1 core platform is based on Ubuntu 14.04.3 and the additional Debian edge platform on Debian 7.9. They include all security fixes and other package updates published before December 30th 2015. Available security updates relevant to Qlustar 9.1, that have appeared after this date, will be announced on the Qlustar website and in the Qlustar security news-letter.

2. New features

2.1. QluMan 2.1

Certificate based authentification
The authentification of a QluMan user is now established using certificates. The elimination of passwords combined with the new initial user registration based on one-time tokens provides an exceptionally high level of security needed for confidential connections over public networks. More details …
Certificate safe
The certificates used to authenticate a QluMan user are stored in an encrypted password protected certificate safe. This safe may contain an unlimited number of connection certificates for different clusters and can easily be managed using the new Connection Manager. More details …
Hardware Wizard
The powerful new HW-Wizard auto-detects partially, wrongly or un-configured hosts and guides through the config steps to set them up in a correct and optimal way. A new type of icon in the Enclosure View marks hosts that need reconfiguration. More details …
New node property display
Assigned host properties and settings are now shown in a transparent tree structure within the Enclosure View. This helps to clearly identify what settings are active for a given host and where in the config hierarchy they come from.
Direct node property assignment
Any property, setting or set thereof can now be directly assigned to nodes. This hugely increases flexibility in node setups.
Copy&Paste host configs
All assignments of a particular host can now be used to setup a group of other nodes. This is the most simple and fastest way to get new nodes up and running.
Improved use of filters in RXEngine
The RXEngine now allows the use of host filters for all command types. This allows for extremely speedy selection of the nodes a command should be executed on.

3. Major component updates

Slurm
Qlustar 9.1 introduces the Slurm 15.8 series with the current version being 15.8.5.
OFED Infiniband stack
Qlustar 9.1 ships the recently released OFED 3.18-1 Infiniband stack including all necessary components.
OpenMPI
Qlustar 9.1 updates the OpenMPI 1.8 series to the most recent version 1.8.8.
Package versions (openmpi-icc) compiled with the most recent Intel compiler suite were also added (this requires a manual installation of the Intel compilers using the base install path /apps/prod/intel/ps_xe_2016).
Lustre
Qlustar 9.1 adds the brand-new Lustre 2.8 release with ZFS as the back-end filesystem.
ZFS Filesystem
Qlustar 9.1 updates ZFS to version 0.6.5.x bringing many improvements and bug-fixes.
Nvidia CUDA
Qlustar 9.1 provides optimal support for Nvidia GPU hardware by supplying pre-compiled and up-to-date kernel drivers as well as CUDA 7.5.
HA stack
Qlustar 9.1 ships corosync 2.3.4, pacemaker 1.1.12 and crmsh 2.1.1 for reliable High-Availability configurations.
Improved Qlustar resource scripts dramatically reduce failover and startup times of HA resources by more than 80%.

4. Other package version updates

  • openblas: 0.2.15
  • openafs: 1.6.11.1

5. General changes/improvements

  • Slurm: The slurmd cgroup RAM limit on compute nodes was increased from 64MB to 128MB. Previously, slurmd sometimes required more than 64MB and consequently crashed.
  • Disk auto-setup: Add config option to limit ZFS arc cache.
  • Root shell: New fancy prompt.
  • Netboot nodes: Set timezone from head-node's config.

6. Important bug fixes

The boot loader grub used to throw the error message Error: diskfilter writes are not supported and hang for a couple of seconds. This is fixed with the upgrade to grub version 2.02~beta2-9ubuntu1.7.

7. Changelogs

A detailed log of changes in the image modules can be found in the directories /usr/share/doc/qlustar-module-<module-name>-*-amd64-9.1.0. As an example, in the directory /usr/share/doc/qlustar-module-core-precise-amd64-9.1.0 you will find a summary changelog in core.changelog, a complete list of packages with version numbers entering the current core module in core.packages.version.gz, a complete changelog of the core modules package versions in core.packages.version.gz and finally a complete log of changed files in core.contents.changelog.gz.
As usual, individual changelogs of packages can be found on an installed head-node in the directories /usr/share/doc/<package name>.

8. Update instructions

  1. Preparations

    Upgrading to Qlustar 9.1 is only supported from a 9.0.x or a 8.1.x release.

    Note

    When updating from Qlustar 8.1 you'll have to do some further preparation:
    Backup Slurm Config
    Backup your Slurm config file (/etc/qlustar/common/slurm-llnl/slurm.conf) to a safe place before starting. In Qlustar 9.x, the Slurm config is managed by QluMan and needs to be setup from scratch (see the corresponding section in the QluMan guide). Use the settings in your backed up config file to accomplish this.
    Remove conflicting packages
    Execute the following command as root on the head-node:
    0 root@cl-head ~ #
    dpkg -l | grep -q kvm-ipxe && dpkg --purge kvm-ipxe
    
    Fix munge startup
    Execute the following commands as root on the head-node:
    0 root@cl-head ~ #
    cfg=/etc/default/munge
    0 root@cl-head ~ #
    [ -w $cfg ] && echo "OPTIONS=--force" >> $cfg
    

    Note

    Make sure that you have no unwritten changes in the QluMan database. If you do, write them to disk as described in the QluMan Guide before proceeding with the update.
  2. Update to 9.1 package sources list

    The Qlustar apt sources list needs to be changed as follows both on the head-node(s) and in all chroot(s) that should be updated.
    0 root@cl-head ~ #
    apt-get update
    0 root@cl-head ~ #
    apt-get install qlustar-sources-list-9.1
    
  3. Update packages

    Now proceed as explained in the Qlustar Administration Manual.
  4. Reboot head-node(s)

    Initially only reboot the head-node(s).
  5. Generate a one-time token

    After having updated and rebooted the head-node, you'll have to generate a one-time token with which you'll be able to establish authentification with QluMan.
    0 root@cl-head ~ #
    qluman-cli --gencert
    
    You'll be prompted for a pin when generating the token. This step is explained elsewhere in more detail.
    With this one-time token, you can proceed to register yourself using the QluMan GUI as explained in the QluMan Guide.
  6. Write config file changes

    The update might have introduced changes in the QluMan database. These need to be written to disk. Check the QluMan Guide about how to write such changes.
  7. Regenerating Qlustar images

    Most likely you will also have to regenerate your Qlustar images with the 9.1 image modules. To accomplish this, you'll have to select Version 9.1 in the QluMan Qlustar Images dialog.
  8. Reboot all netboot node(s)

    After the regeneration of the images is complete, you can reboot all other nodes in the cluster, including virtual FE nodes. This completes the update procedure.

9. Release Notes for the 9.1.1 Maintenance Release

Qlustar 9.1.1 is the first maintenance release within the 9.1 series. It includes quite a number of new features and minor component updates. Updating from 9.1.0 works as usual.

9.1. Basic Info

The Qlustar 9.1.1 core platform is based on Ubuntu 14.04.4 and the additional Debian edge platform on Debian 7.9. They include all security fixes and other package updates published before July 15th 2016. Available security updates relevant to Qlustar 9.1.1, that have appeared after this date, will be announced on the Qlustar website and in the Qlustar security news-letter.

9.2. New features / tools / packages

9.2.1. QluMan 2.1.1

Slurm GRES Configurator
Slurm generic resources (GRES) are mostly used to assign additional hardware features (e.g. GPUs) to a running job. QluMan now has a nice configurator which allows to assign and manage GRES on nodes. More details …
GPU Wizard
To make GPU management for usage with Slurm really simple, a powerful GPU wizard has been added to QluMan. Auto-detected GPUs on running nodes are correctly setup with just a few mouse clicks. More details …
Message Log Viewer
QluMan now also includes a powerful Log Viewer that allows to inspect important events in the cluster. Messages are categorized depending on the type of event, when it occurred, which component(s) it involved and how important it was. With our convenient filter editor, the type of messages to be shown is easily custom-tailored. More details …

9.2.2. Commercial compiler support

Two new packages have been created that allow for straight-forward installation and integration of the two most popular commercial compilers on Linux Clusters:
  • Intel Parallel Studio XE 2016
  • PGI compilers / Tools
Just install either of the compilers on the head-nodes NFS using their standard installation routine. Afterwards install and configure the corresponding Qlustar packages in the UnionFS chroot used for your compute nodes:
  • qlustar-ips-2016 - for Intel Parallel Studio
  • qlustar-pgi-2015 - for PGI compilers / Tools
Integrated OpenMPI packages for the compilers are also available:
  • openmpi-icc - for Intel Parallel Studio
  • openmpi-pgi - for PGI compilers / Tools

9.2.3. New utilities

Configurable automatic ZFS snapshot backup
The new package qluman-fs-tools includes the configurable ZFS snapshot backup tool zfs-snap-bak.py with a corresponding configuration file at /etc/qlustar/fs-tools/zfs-snap-bak.conf.
Easy resetting of NIS passwords
The new utility reset-nis-passwd.sh allows for easy resetting of user passwords in NIS, including generating a new random password.

9.2.4. I/O benchmark tools

mdtest HPC benchmark
The new mdtest package compiled against OpenMPI provides the parallel metadata benchmark mdtest (1.9.3), that performs open/stat/close operations on files and directories and then reports the performance. It's mostly used to benchmark parallel filesystem performance.
IOR MPI parallel I/O benchmark
The new ior package compiled against OpenMPI provides the popular parallel I/O benchmark IOR (3.0.1), that leverages the scalability of MPI to easily and accurately calculate the aggregate bandwidth of an unlimited number of client machines. IOR can utilize the POSIX, MPI-IO, and HDF5 I/O interfaces. It's mostly used to benchmark parallel filesystem performance.

9.2.5. Major component updates

  • slurm 15.08.12
  • OFED 3.18-2 (includes new libfabric package)
  • Lustre 2.8 final
  • ZFS 0.6.5.7

9.2.6. Changelogs

A detailed log of changes in the image modules can be found in the directories /usr/share/doc/qlustar-module-<module-name>-*-amd64-9.1.1. As an example, in the directory /usr/share/doc/qlustar-module-core-precise-amd64-9.1.1 you will find a summary changelog in core.changelog, a complete list of packages with version numbers entering the current core module in core.packages.version.gz, a complete changelog of the core modules package versions in core.packages.version.gz and finally a complete log of changed files in core.contents.changelog.gz.