Product SiteDocumentation Site

Qlustar Cluster OS 9.2

Release Notes

Abstract

Qlustar 9.2 is a major feature release of the 9.x series. It ships QluMan 3.0 that introduces a powerful new management and operating interface for Slurm. Under the hood, a completely new messaging architecture QluNet has been implemented. It serves as the communication backbone for all QluMan components.
We also introduced the Qlustar BioStack delivering a huge selection of easily deployable and cluster-optimized Bioinformatics software packages based on the DebianMed project. These packages are grouped in tasks and can conveniently be selected when setting up a chroot for the cluster nodes.
The highlights among the numerous component updates and bug fixes are: Slurm 16.5.8, OpenMPI 2.0.2, CUDA 8.0 and BeeGFS 6. New in this release is support for Singularity containers.
1. Basic Info
2. New features
2.1. QluMan 3.0
2.2. Container Support
3. Major component updates
4. Other package version updates
5. General changes/improvements
6. Update instructions
7. Changelogs

1. Basic Info

The Qlustar 9.2 core platform is based on Ubuntu 14.04.5 and the additional Debian edge platform on Debian 7.9. Both include all security fixes and other package updates published before April 19th 2017. Available security updates relevant to Qlustar 9.2, that have appeared after this date, will be announced on the Qlustar website and in the Qlustar security news-letter.

2. New features

2.1. QluMan 3.0

Slurm management interface
The all new management interface for Slurm provides a nice overview of cluster and job usage, a fancy job management interface that includes customizable job filters, a node state management tool to display and manipulate the Slurm state of compute nodes, a powerful dialog to view, create and modify reservations, as well as sub-windows that present cluster usage, fairshare and job priority data.
Additionally, it includes an interface to Slurm accounting that allows to view, create and modify Slurm accounts and Slurm users. Configured QOS are also viewable in a separate dialog. See the corresponding section in the QluMan guide to read about all the capabilities of the new Slurm component.
All frequently changing Slurm state data like job/node information, cluster usage etc. is automatically updated in real-time in the corresponding GUI clients. Update intervals can be individually customized for different data types to allow for an optimal cluster-specific compromise between generated network traffic/server load and desired (high) update frequency.
QluNet Messaging Architecture
QluMan's messaging architecture has been completely redesigned and extended to allow for flexible additions of future QluMan server components and is now available as a separately packaged component called QluNet.
A new router daemon (qluman-router) has become the central hub for all messages exchanged between QluMan components. Among other tasks, it is responsible for managing the registration of new QluMan network nodes (like the qlumand, GUI client, etc.) and monitoring the availability of existing ones besides its main task of routing messages from a sending QluMan component to the correct destination.
The new Slurm component e.g. includes a separate daemon (qluman-slurmd) which GUI clients can now talk to directly via the router without qlumand being involved. Subscription channels provide real-time updates of necessary information from the server components to the connected GUI clients.
Miscellaneous
  • The RXEngine dialog has a new editor for the history of custom commands. This allows convenient book-keeping and rearrangement of recently executed command lines.
  • New LEDs at the bottom of the main QluMan window indicate the connection status of a GUI session to the QluMan server components.
  • It is now possible to control (enable/disable) hyper-threading of compute nodes CPUs from within QluMan by setting the corresponding generic property.
  • QluMan's server components provide better control of logging levels in their configuration files. Each component now has its own config file.
  • The QluMan GUI qluman-qt is no longer installed on the cluster head-node or a chroot during installation. If you really want to run it as a remote application from the cluster, you'll have to install it manually (apt-get install qluman-qt). However, it is recommended to run it locally from your Linux workstation either by installing the package directly (if on Debian/Ubuntu) or via the provided container-based packages (Singularity or Docker).

2.2. Container Support

Qlustar now fully supports the two container technologies Docker and Singularity.

3. Major component updates

Slurm
Qlustar 9.2 introduces the Slurm 16.5 series with the current version being 16.5.8.
OFED Infiniband stack
Qlustar 9.2 ships the recently released OFED 3.18-2 Infiniband stack including all necessary components.
OpenMPI
Qlustar 9.2 introduces the new OpenMPI 2.0 series with the most recent version 2.0.2.
Nvidia CUDA
Qlustar 9.2 provides optimal support for Nvidia GPU hardware by supplying pre-compiled and up-to-date kernel drivers as well as CUDA 8.0.
BeeGFS
Qlustar 9.2 ships the latest BeeGFS version 6.7 and provides ready-to-use image modules based on it for real easy deployment of this ultra-fast parallel filesystem backed by ZFS on your storage cluster.
ZFS Filesystem
Qlustar 9.2 updates ZFS to version 0.6.5.8 including a number of bug-fixes.

4. Other package version updates

  • Intel/PGI Compiler support: The Qlustar wrapper packages have been updated to support the new versions Intel Parallel Studio 2017 and PGI 2016. Corresponding OpenMPI package variants for these compilers are also provided.
  • openblas: 0.2.19
  • lapack: 3.7.0
  • HPL Linpack: 2.2
  • freeipmi: 1.5.5

5. General changes/improvements

  • Slurm
    • Changes in the Qlustar Slurm default config (shown below as a diff)
      -TaskPlugin=task/cgroup
      -TaskPluginParam=Cpusets,Verbose
      +TaskPlugin=task/affinity,task/cgroup
      +TaskPluginParam=Sched
      
      -#DefMemPerCPU=0
      +DefMemPerCPU=1024
      -SelectTypeParameters=CR_CPU
      +SelectTypeParameters=CR_CPU_Memory
      
      +# Preemption
      +PreemptType=preempt/qos
      +PreemptMode=requeue
      
      +PriorityFlags=FAIR_TREE
      +FairShareDampeningFactor=5
      -PriorityWeightFairshare=1000
      +PriorityWeightFairshare=5000
      
      -AccountingStorageEnforce=0
      +AccountingStorageEnforce=limits,qos,nosteps
      
    • New job submit parameter --gres-flags=enforce-binding that allows e.g. to bind Slurm tasks of a submitted job to the correct CPU as configured in a GRES definition.
    • A base setup for Slurm accounting is now automatically initialized provided that Slurm was chosen as an option during installation. Furthermore, new cluster users are automatically added to the default Slurm account users. This is defined in the so-called add-user-hook /etc/qlustar/common/add-user-hook of the adduser.sh script
  • Parallelized Online Update: The Qlustar online update mechanism for net-booted cluster nodes has added parallelism as well as auto-detection of the image type needed for the update of their node OS leading to maximum simplicity. Processing speed has improved dramatically to allow updating of up to a thousand nodes in less than five minutes.
  • NFS path for applications: The default NFS path to be used by an admin for custom applications installed outside of the packaging system has changed to /apps/local. This affects only new installations.
  • Infiniband: A new kernel parameter tune_ib_qib was added to activate auto-tuning of Intel Truescale adapters.
  • Disk auto-setup: Make use of ZFS volumes for swap safe.
  • Lustre: Added additional config option ko2iblnd_conf to allow for inhomogeneous IB clusters with a mix of IB cards.
  • Ssh known_hosts file: On the head-nodes, this is now a real file instead of a link which simplifies handling of HA head-nodes.
  • Bash completion for users: The Bash setup for users now activates full support for program dependent completion provided by many Debian software packages.
  • Installer: Supports NVME boot disks.
  • Netboot nodes: Set timezone from head-node's config.

6. Update instructions

  1. Preparations

    Upgrading to Qlustar 9.2 is only supported from a 9.1.x release.

    Note

    Make sure that you have no unwritten changes in the QluMan database. If you do, write them to disk as described in the QluMan Guide before proceeding with the update.
  2. Update to 9.2 package sources list

    The Qlustar apt sources list needs to be changed as follows both on the head-node(s) and in all chroot(s) that should be updated.
    0 root@cl-head ~ #
    apt-get update
    0 root@cl-head ~ #
    apt-get install qlustar-sources-list-9.2
    
  3. Update packages

    Now proceed as explained in the Qlustar Administration Manual.
  4. Install new QluMan packages

    QluMan 3.0 has two new server components and corresponding packages that need to be installed and activated. To do so, execute the following on the head-node:
    0 root@cl-head ~ #
    apt-get install qluman-router qluman-slurmd
    
    If your cluster doesn't have slurm installed, you can skip qluman-slurmd. Once installed, activate the daemons for automatic startup at boot time by setting RUN=yes in the packages default files /etc/default/qluman-router and /etc/default/qluman-slurmd.
  5. Reboot head-node(s)

    Initially only reboot the head-node(s).
  6. Regenerating Qlustar images

    Most likely you will also have to regenerate your Qlustar images with the 9.2 image modules. To accomplish this, you'll have to select Version 9.2 in the QluMan Qlustar Images dialog.
  7. Write config file changes

    To activate all changes in the QluMan database that were introduced by the update, they need to be written to disk now. Check the QluMan Guide about how to write such changes.
  8. Reboot all netboot nodes

    After the regeneration of the images is complete, you can reboot all other nodes in the cluster, including virtual FE nodes. This completes the update procedure.

7. Changelogs

A detailed log of changes in the image modules can be found in the directories /usr/share/doc/qlustar-module-<module-name>-*-amd64-9.2.0. As an example, in the directory /usr/share/doc/qlustar-module-core-precise-amd64-9.2.0 you will find a summary changelog in core.changelog, a complete list of packages with version numbers entering the current core module in core.packages.version.gz, a complete changelog of the core modules package versions in core.packages.version.gz and finally a complete log of changed files in core.contents.changelog.gz.