HPC Admin Manual

1. The Qlustar HPC Core Stack

The HPC core stack is a Qlustar component that is related to the choice of Spack as the package manager for HPC applications on Qlustar clusters.

1.1. Motivation

Rather than aiming to provide as much HPC functionality as possible via OS packages (debs/rpms), the HPC core stack makes a clear cut between Qlustar-related system software (provided as OS packages) and HPC application software (to be provided by Spack). In practice, the borderline is made at the MPI level: All MPIs are to be provided by Spack, whereas the system-related dependencies of MPIs are provided by Qlustar packages integrated into Qlustar image modules.

This separation is necessary to assure that the system-side components like the workload manager and the Infiniband stack of a Qlustar cluster work correctly with the application software. Because of version and configuration incompatibilities between e.g. slurm or RDMA packages provided by Spack and the ones running on the Qlustar head-node/images, leaving these packages under Spack control would often lead to installed software not functioning properly or running inefficiently.

1.2. Components

Currently, the following packages are part of the Qlustar HPC core stack:

  • hwloc

  • rdma-core

  • slurm

  • pmix

  • ucx

  • Spack

These packages are provided with the same version on all supported Qlustar edge-platforms. This ensures that at the HPC application level, cluster users will feel no difference when working on different edge-platforms. Admins can therefore easily switch between them without disrupting user workflows if the need arises.

The packages of the core stack are declared as so-called external packages in the Qlustar Spack deb/rpm package. This way we make sure, that packages created by Qlustar Spack will work flawlessly on Qlustar clusters.

1.3. Release cycles

The HPC Core Stack has a shorter release cycle than Qlustar itself. It is coupled to the release cycle of slurm, hence currently it is 9 months. This guarantees that Qlustar always provides a slurm version that is still supported by upstream. The aim is to update to the most recent major slurm version whenever it is at the minor release level of .5 or .6, e.g. switch to the 23.02.x series as soon as 23.02.5/6 has been released. At this stage, all other packages of the core stack are also updated to current versions.

2. Spack on Qlustar

Qlustar 13 introduced Spack as the new package manager for HPC related software beyond the HPC core stack. Spack has a huge list of supported software packages and provides hardware optimized versions of them by design. Another big advantage of Spack: Multiple versions of the same software can easily co-exist on the same cluster.

The Qlustar version of Spack is provided as an OS package (deb/rpm) and defines the packages of the Qlustar HPC core stack as so-called external packages thus assuring a flawless integration. The same versions of these packages are provided for all Qlustar edge-platforms hence guaranteeing that Spack HPC applications work the same on all of them.

2.1. Particularities of the Qlustar Spack setup

The Qlustar Spack setup has a few particularities that need explanation.

Provided as an OS package (deb/rpm)

If you follow the Spack upstream documentation, it will tell you to clone the Spack git repository for installation. This step is unnecessary on Qlustar since the Spack installation is provided as an OS package for all Qlustar edge-platforms. This package is automatically installed when you create a chroot via QluMan. Using these packages assures fully functional integration of Spack packages into Qlustar which would not be the case when using a git cloned version of Spack.

Definition of external packages

As already mentioned, the packages of the Qlustar HPC core Stack are defined as external packages together with a number of other packages mainly containing tools which are not critical to performance and unrelated to HPC. The definitions are in /etc/spack/defaults/packages.yaml which is part of the Qlustar Spack package.

Important directory paths / configs

The base directory of the Qlustar Spack instance on a cluster is always at /apps/local/spack. This directory is automatically created at install time and must be used as the container for the Spack root. If for some reason you want to have this base directory at a different location, you may copy the original /apps/local/spack directory to there and then link /apps/local/spack to the new location.

  • Root of Qlustar Spack instance

    The root of the Qlustar Spack instance denoted as $spack is at /usr/share/spack/root and is part of the provided deb/rpm. At many places in Spack configuration files you can use $spack to refer to this path.

    At some places of the upstream Spack documentation (e.g. about configuration scopes), $(prefix) is also used instead of $spack to indicate this root. In the Qlustar docs, we only use $spack though.

    To allow for write access by non-root admins, many sub-directories of $spack are symbolic links to directories underneath the base directory /apps/local/spack.

  • Root of the Spack install tree

    The root of the Spack install tree $SPACK_ROOT is at $spack/opt/spack which is a link to /apps/local/spack/spack. Installed packages are located in sub-directories of $SPACK_ROOT.

  • Custom repos.yaml

    If you need to define local custom Spack repositories, you can do so in /etc/spack/repos.yaml. This file doesn’t exist per default and needs to be created in the corresponding chroot. Consult the upstream Spack documentation about the structure of repos.yaml.

  • Spack compiler definitions

    Spack compiler definitions are in the file /usr/share/spack/root/opt/.user/config/linux/compilers.yaml.

Multi-admin setup

To allow a group of non-root admins to work on the Qlustar Spack instance, correct access permissions must be given to the root $spack. For this purpose, a special user and group softadm are created at Qlustar installation time. The base directory of the Qlustar Spack instance /apps/local/spack is owned by this user, but access permissions are setup such that any member of the group softadm can manage Spack packages.

Compile all packages

Rather than installing pre-compiled Spack packages, Qlustar Spack is configured such that all packages will be configured and compiled from scratch on the cluster. This adds an additional layer of security and at the same time serves as a test-bed for the whole Spack development stack.

2.2. Basic setup/packages

All commands shown below must be executed as a user who is a member of the softadm group on the cluster FE node or any other node (not the head-nodes themselves) with a unionFS chroot containing Spack. Since all Spack packages need to be compiled, the first thing that needs to be done on a new Spack instance is adding a system compiler.

2.2.1. Compilers

To add the system gcc compiler to the instance, execute (example for Qlustar 13/jammy)

 0  cl-fe:~ $
spack compiler find
==> Added 1 new compiler to /usr/share/spack/root/opt/.user/config/linux/compilers.yaml
    gcc@11.3.0
==> Compilers are defined in the following files:
    /usr/share/spack/root/opt/.user/config/linux/compilers.yaml

After this we can compile/install the newest gcc compiler (version 12.2.0 in this example) with all its dependencies using the system gcc as follows:

 0  cl-fe:~ $
spack install gcc@12.2.0 target=x86_64
[+] /usr (external diffutils-3.8-ptwf25tneglryigainabw5n3newdmp6e)
[+] /usr (external gawk-5.1.0-hzdvttiw75b4jpecd2ipfz7sxx34qk7a)
[+] /usr (external m4-1.4.18-buskmfvwfb5tiadj6koxcwcvjac7elmm)
[+] /usr (external perl-5.34.0-xbzlntwxjvembfgz72yf6ugmo72jshqw).....

The installation can take more than an hour depending on your hardware.

We added the target=x86_64 option here to make sure that this gcc version will work on any node in the cluster. Using this target also makes sense for binary packages like Cuda or the Intel oneAPI compilers e.g.

Finally add the new gcc to the available compilers for Spack.

 0  cl-fe:~ $
spack compiler add $(spack location -i gcc@12.2.0)
==> Added 1 new compiler to /usr/share/spack/root/opt/.user/config/linux/compilers.yaml
    gcc@12.2.0
==> Compilers are defined in the following files:
    /usr/share/spack/root/opt/.user/config/linux/compilers.yaml

Now we’re ready to install packages using the new compiler. If you also want to use the Intel oneAPI compiler family proceed as follows.

 0  cl-fe:~ $
spack install intel-oneapi-compilers %gcc@11.3.0 target=x86_64
 0  cl-fe:~ $
spack compiler add $(spack location -i intel-oneapi-compilers)/compiler/latest/linux/bin/intel64
 0  cl-fe:~ $
spack compiler add $(spack location -i intel-oneapi-compilers)/compiler/latest/linux/bin

We explicitly specified the system gcc@11.3.0 compiler here, which we always advise to do for binary packages like this. If you don’t, most likely the newest Spack installed gcc (here gcc@12.2.0) will be used instead. This can cause problems when using the Intel compilers later, e.g. while searching for system header files. Also note that we needed two separate commands here to add the classic and the new oneAPI variant of the compiler package.

There are a number of other compilers available on Spack. You can install and create a tool-chain based on any of them, if the need arises.

2.2.2. MPIs

The most common MPI variant is OpenMPI which we can now install using the new gcc compiler.

 0  cl-fe:~ $
spack install openmpi +pmi+legacylaunchers schedulers=slurm fabrics=ucx %gcc@12.2.0

If needed, an OpenMPI variant based on the Intel oneAPI compilers may be created as follows:

 0  cl-fe:~ $
spack install openmpi +pmi+legacylaunchers schedulers=slurm fabrics=ucx %oneapi

For the classical Intel Compilers (icc, ifort, etc.) do

 0  cl-fe:~ $
spack install openmpi +pmi+legacylaunchers schedulers=slurm fabrics=ucx %intel

If you need special features, add clauses like +cuda or +lustre. Sometimes ucx causes problems, in that case, you can add libfabric support by using fabrics=ucx,libfabric. For OPA networks, you should use fabrics=psm2.

There are a number of other MPI variants available on Spack. You can install and create a tool-chain based on any of them, if the need arises.

2.2.3. Application example: Linpack

As an example for an application, let’s build Linpack. Linpack needs to be compiled against an implementation of a BLAS library. The standard open-source high-performance BLAS library is openblas which we first build using the new gcc (specified by %gcc@12.2.0.

 0  cl-fe:~ $
spack install openblas threads=openmp %gcc@12.2.0
 0  cl-fe:~ $
spack install hpl %gcc@12.2.0

By specifying %gcc@12.2.0 for the hpl installation, Spack automatically knows that the gcc based OpenMPI we built previously should be used. After successful installation, you can proceed as described in the First Steps Guide and start a Linpack run on the cluster.

If you need to squeeze the maximum performance out of your hardware, a Linpack version based on the Intel MKL will be the best choice in most circumstances. To build it, install the MKL and use the Intel based MPI you just installed above:

 0  cl-fe:~ $
spack install intel-oneapi-mkl target=x86_64 %oneapi
 0  cl-fe:~ $
spack install hpl %oneapi

This should have given you a rough idea how Spack may be used on Qlustar. For more details consult the official Spack documentation.