Qlustar Cluster OS 9.2

QluMan Guide

This is the operation manual for QluMan, the Qlustar cluster management framework.

Qlustar Documentation Team

Q-Leap Networks GmbH

qlustar-docs@q-leap.com

Legal Notice

This material may only be copied or distributed with explicit permission from Q-Leap Networks GmbH. The Qlustar license can be found at /usr/share/qlustar/LICENSE.html on an installed Qlustar head-node.

Abstract

This is the operation manual for QluMan, the Qlustar cluster management framework.

Preface

1. Qlustar Document Conventions

1.1. Typographic Conventions
1.2. Pull-quote Conventions
1.3. Notes and Warnings

2. Feedback requested

1. Introduction

1.1. Qlustar Clusters
1.2. Overview of basic Setup Principles

2. Cluster Connections

2.1. Connecting to a Cluster

2.1.1. Connection Status

2.2. Managing Clusters

2.2.1. Adding a new Cluster
2.2.2. Sorting multiple clusters
2.2.3. Changing the certificate safe password

3. Global Cluster Settings

3.1. License Installation
3.2. Configuring Network Parameters

4. Enclosures

4.1. Enclosure View

4.2. Managing Enclosures

4.2.1. Populating Enclosures

5. Adding/Configuring Hosts

5.1. Adding Hosts
5.2. Configuring Hosts

6. Hardware Wizard

6.1. Purpose

6.2. Selecting Hosts

6.3. Configuring the Host Template

6.4. Selecting a Hardware Property Set

6.5. Resolving Hardware Conflicts

6.5.1. Resolving by per-host Hardware Property Sets

6.6. Selecting a Generic Property Set / Config Set

6.7. Summary Page

7. Common Config Classes

7.1. Overview
7.2. Writing Config Files
7.3. Boot Configs
7.4. DHCP Config
7.5. Disk Configs

8. Other Configs

8.1. Qlustar OS Images

8.1.1. Image Versioning
8.1.2. Image Properties

8.2. NIS hosts

8.3. SSH host files

8.4. UnionFS Chroots

8.5. Infiniband Network

8.5.1. Activating/configuring OpenSM

8.6. IPMI settings

9. RXEngine / Remote Execution Engine

9.1. RXEngine Overview

9.2. Executing a pre-defined command

9.3. Executing a custom command

9.3.1. Custom Command Editor

9.4. Analysis of Command Status/Output

9.4.1. Command Status
9.4.2. Host Grouping by Status and Output
9.4.3. Filtering by stdout and stderr

9.5. Command Editor

9.5.1. Sorting commands
9.5.2. Defining or editing a command

10. Host Filters

10.1. Overview

10.2. Host Filter Editor

10.2.1. Editing a Filter
10.2.2. Types of Filters
10.2.3. Inverting a Filter
10.2.4. Additive versus subtractive

11. QluMan User and Rights Management

11.1. Overview

11.2. Managing QluMan Users

11.2.1. Adding a User
11.2.2. Generating the Auth Token
11.2.3. Removing a User

11.3. Managing User Roles/Permissions

12. Log Viewer

12.1. Purpose

12.2. Messages indicator button

12.3. Log Viewer window

12.4. Message Filter

12.4.1. Default Action
12.4.2. Adding a Filter
12.4.3. Filtering Seen Messages
12.4.4. Filtering by Priority
12.4.5. Filtering by Origin
12.4.6. Filtering by Category
12.4.7. Filtering by Short text
12.4.8. A Filtering Example

13. Optional Components

13.1. Slurm Configuration and Management

13.1.1. Slurm Configuration
13.1.2. Slurm Management

14. Customizing the Look&Feel

14.1. Overview

14.2. QluMan Preferences

14.3. Customizing general Properties

14.3.1. Customizing general Fonts
14.3.2. Customizing general Colors
14.3.3. Cloning KDE Settings
14.3.4. Customizing the Widget Style

A. Revision History

Index

⁠Preface

⁠1. Qlustar Document Conventions

Qlustar manuals use the following conventions to highlight certain words and phrases and draw attention to specific pieces of information.

⁠1.1. Typographic Conventions

Four typographic conventions are used to call attention to specific words and phrases. These conventions, and the circumstances they apply to, are as follows.

Mono-spaced Bold

Used to highlight system input, including shell commands, file names and paths. Also used to highlight keys and key combinations. For example:

To see the contents of the file my_next_bestselling_novel in your current working directory, enter the cat my_next_bestselling_novel command at the shell prompt and press Enter to execute the command.

The above includes a file name, a shell command and a key, all presented in mono-spaced bold and all distinguishable thanks to context.

Key combinations can be distinguished from an individual key by the plus sign that connects each part of a key combination. For example:

Press Enter to execute the command.

Press Ctrl+Alt+F2 to switch to a virtual terminal.

The first example highlights a particular key to press. The second example highlights a key combination: a set of three keys pressed simultaneously.

If source code is discussed, class names, methods, functions, variable names and returned values mentioned within a paragraph will be presented as above, in mono-spaced bold. For example:

File-related classes include filesystem for file systems, file for files, and dir for directories. Each class has its own associated set of permissions.

Proportional Bold

This denotes words or phrases encountered on a system, including application names; dialog-box text; labeled buttons; check-box and radio-button labels; menu titles and submenu titles. For example:

Choose System → Preferences → Mouse from the main menu bar to launch Mouse Preferences. In the Buttons tab, select the Left-handed mouse check box and click Close to switch the primary mouse button from the left to the right (making the mouse suitable for use in the left hand).

To insert a special character into a gedit file, choose Applications → Accessories → Character Map from the main menu bar. Next, choose Search → Find… from the Character Map menu bar, type the name of the character in the Search field and click Next. The character you sought will be highlighted in the Character Table. Double-click this highlighted character to place it in the Text to copy field and then click the Copy button. Now switch back to your document and choose Edit → Paste from the gedit menu bar.

The above text includes application names; system-wide menu names and items; application-specific menu names; and buttons and text found within a GUI interface, all presented in proportional bold and all distinguishable by context.

Mono-spaced Bold Italic or Proportional Bold Italic

Whether mono-spaced bold or proportional bold, the addition of italics indicates replaceable or variable text. Italics denotes text you do not input literally or displayed text that changes depending on circumstance. For example:

To connect to a remote machine using ssh, type ssh username@domain.name at a shell prompt. If the remote machine is example.com and your username on that machine is john, type ssh john@example.com.

The mount -o remount file-system command remounts the named file system. For example, to remount the /home file system, the command is mount -o remount /home.

To see the version of a currently installed package, use the rpm -q package command. It will return a result as follows: package-version-release.

Note the words in bold italics above: username, domain.name, file-system, package, version and release. Each word is a placeholder, either for text you enter when issuing a command or for text displayed by the system.

Aside from standard usage for presenting the title of a work, italics denotes the first use of a new and important term. For example:

Publican is a DocBook publishing system.

⁠1.2. Pull-quote Conventions

Terminal output and source code listings are set off visually from the surrounding text.

Output sent to a terminal is set in mono-spaced roman and presented thus:

books        Desktop   documentation  drafts  mss    photos   stuff  svn
books_tests  Desktop1  downloads      images  notes  scripts  svgs

Commands to be executed on certain nodes of a cluster or the admins workstation are indicated by using descriptive shell prompts including user and hostname. Note that by default, the shell prompt on Qlustar nodes always ends in the newline character, thus commands are typed on the line following the prompt. As mentioned above, the command itself is shown in mono-spaced bold and the output of a command in mono-spaced roman. Examples:

0 root@cl-head ~ #
echo "I'm executed by root on a head-node"
I'm executed by root on a head-node

0 root@beo-01 ~ #
echo "I'm executed by root on a compute node"
I'm executed by root on a compute node

0 root@sn-1 ~ #
echo "I'm executed by root on a storage node"
I'm executed by root on a storage node

0 user@workstation ~ $ 
echo "I'm executed by user admin on the admins workstation"
I'm executed by user admin on the admins workstation

Source-code listings are also set in mono-spaced roman but add syntax highlighting as follows:

package org.jboss.book.jca.ex1;

import javax.naming.InitialContext;

public class ExClient
{
   public static void main(String args[]) 
       throws Exception
   {
      InitialContext iniCtx = new InitialContext();
      Object         ref    = iniCtx.lookup("EchoBean");
      EchoHome       home   = (EchoHome) ref;
      Echo           echo   = home.create();

      System.out.println("Created Echo");

      System.out.println("Echo.echo('Hello') = " + echo.echo("Hello"));
   }
}

⁠1.3. Notes and Warnings

Finally, we use three visual styles to draw attention to information that might otherwise be overlooked.

Note

Notes are tips, shortcuts or alternative approaches to the task at hand. Ignoring a note should have no negative consequences, but you might miss out on a trick that makes your life easier.

Important

Important boxes detail things that are easily missed: configuration changes that only apply to the current session, or services that need restarting before an update will apply. Ignoring a box labeled “Important” will not cause data loss but may cause irritation and frustration.

Warning

Warnings should not be ignored. Ignoring warnings will most likely cause data loss.

⁠2. Feedback requested

Contact qlustar-docs@qlustar.com to report errors or missing pieces in this documentation.

⁠Chapter 1. Introduction

⁠1.1. Qlustar Clusters

A Qlustar cluster is designed to boot and manage compute and/or storage nodes (hosts) over the network and make them run a minimal OS (Operating System) image in RAM. Local disks (if present) are only used to preserve log files across boots and for temporary storage (e.g. for compute jobs). Hence all Qlustar cluster nodes apart from head-nodes are always state-less.

One or more head-nodes deliver the OS boot images to the nodes. Additionally, a small NFS share containing part of the configuration space for the nodes is exported from one of the head-nodes. Optionally, the RAM-based root FS (file-system) can be supplemented by a global UnionFS chroot to support software not already contained in the boot images themselves. The head-node(s) of the cluster typically provides TFTP/PXE boot services, DHCP service, NIS service, torque or slurm resource management etc. to the cluster.

The management of these and all cluster-related components of a Qlustar installation in general can easily be accomplished through a single administration interface: QluMan, the Qlustar Management interface. The QluMan GUI is multi-user as well as multi-cluster capable: Different users are allowed to work simultaneously with the GUI. Changes made by one user are updated and visible in real-time in the windows opened by all the other users. On the other hand, it is possible to manage a virtually unlimited number of clusters within a single instance of the QluMan GUI at the same time. Each cluster is shown in a tab or in a separate main window.

⁠1.2. Overview of basic Setup Principles

A central part of Qlustar are its pre-configured modular OS images. Different nodes may have different hardware or need to provide specific and varying functionality/services. Therefore, to optimize the use of hardware resources and increase stability/security, Qlustar does not come with just one boot image that covers every use-case. Instead, a number of image modules with different software components are provided from which individual custom OS images can be created as needed. A Qlustar OS image just contains what is actually required to accomplish the tasks of a node, nothing more. See below for more details about configuring OS images.

But providing different OS images is still not enough for a flexible yet easily manageable cluster: A node booting a generated image also receives extra configuration options via DHCP and via NFS at boot time, thus allowing to fine-tune the OS configuration at run-time. E.g. it is possible to determine how the local disks are to be used (if any are present), whether additional services like OpenSM or samba should be enabled/disabled and a lot more. Four different configuration/property categories exist in QluMan:

Generic Properties are simple on/off options or key+value pairs applicable to groups of nodes, e.g. to flag the reformatting of the local disks at the next boot, add torque node properties, etc.
Config Classes handle more complex configurations like boot/disk configs, DHCP, etc.
Hardware Properties are not used to configure the nodes themselves but describe their hardware configuration and are of importance e.g. for the slurm or torque resource managers and/or inventory management.
Finally, Specific Properties are properties that are unique to a particular node, like its serial number. Therefore, these properties can only be assigned to individual nodes.

Of course, one can configure every host in a cluster individually. But in most clusters, there are large groups of hosts that should be configured identically. However, even if there are several groups, they might share some properties/configurations but not all of them. To provide a simple handling of such scenarios, QluMan allows to combine generic properties, hardware properties and config classes each into sets. Moreover, it is possible to combine exactly one generic property, one hardware property and one config set into a Host Template. Assigning a Host Template to a group of hosts makes it possible to specify their complete properties and configuration settings with a single click. Nonetheless, for maximum flexibility, it is also allowed to override or extend the settings from the template of a host by assigning either one of the sets and/or individual properties/config classes to it. In case of conflicts, values from individual properties/config classes have highest priority followed by set values and finally the Host Template values. For more details on this see Section 5.2, “Configuring Hosts”.

⁠Chapter 2. Cluster Connections

⁠2.1. Connecting to a Cluster

When starting qluman-qt, it requests the password for your certificate safe. This safe holds the login information for your clusters together with the private keys for the corresponding QluMan user account. The password for the certificate safe is required on every start and whenever changes to the safe need to be written. You can have the client remember the password for the duration it is running by checking the Remember password check-box. Without enabling this, you will have to input the password again, whenever changes to the safe need to be written. If you are starting qluman-qt for the first time and therefore have no certificate safe yet, this dialog is skipped and an empty Connect Cluster dialog opens directly. See Section 2.2.1, “ Adding a new Cluster ” below about how to add a new cluster.

Having entered the correct password for the certificate safe the Connect Cluster dialog opens. The last cluster used will be pre-selected but a different cluster can be selected from the drop-down menu. Click the Connect button to connect to the selected cluster. If this is the first time you connect to this cluster, the client generates a random public/private key pair. These keys will eventually be used for permanent authentification of the chosen user with this cluster. Following this, a connection to the server is made with an attempt to authenticate the client using the one-time token. On success, the server stores the public key of the client for future logins and the client stores both the private and public keys in the certificate safe. This finalizes the initial handshake.

Note

The GUI client asks for the password of the certificate safe to store the generated public/private key pair. It will only do so, when you initially connect with a one-time token. For future connections, it will use the stored key pair to connect and authenticate. The safe contents will then not be changed again.

⁠2.1.1. Connection Status

The status of the network connection between a GUI session and the relevant QluMan server components (qlumand, qluman-slurmd, etc.) is displayed by LEDs in the status bar of the main QluMan window. The QluNet internal network client ID is also listed there.

⁠2.2. Managing Clusters

The Manage Clusters dialog manages all your accounts on different clusters or as different users on the same cluster. It allows adding new cluster connections, editing existing and removing obsolete ones as well as changing the password for the certificate safe. It can be opened by clicking Edit in the Connect Cluster dialog.

⁠2.2.1. Adding a new Cluster

To add a new cluster click the New button and select New Cluster (paste) or New Cluster (file) from the menu depending on whether you want to paste the one-time token or load it from a file. If you don't have a one-time token for the cluster see Section 11.2.2, “Generating the Auth Token”.

Paste the one-time token data into the dialog and click Decrypt or select the file containing the token. When asked for the password, enter the pin

that was used when creating the token (in case you didn't generate the token yourself, you should have been told the pin by your main cluster administrator). The dialog should then show the cluster/head-node infos that where packed into the one-time token. If you started qluman-qt on your workstation, then you might have to change the Local Hostname to use the external hostname of the head-node. Similarly, if you changed the port for qlumand or if you're

connecting via port forwarding, you have to adjust that too. The Alias is the name this cluster will be shown as, when using the drop-down menu in the Connect Cluster dialog. Click Ok to add the cluster connection.

After adding the new cluster select Save to save the changes. If this is your first cluster then it will create the certificate safe and ask you to enter and confirm a password. Otherwise it will ask for the existing password unless the Remember password check-box was enabled.

⁠2.2.2. Sorting multiple clusters

If multiple cluster connections are registered, the corresponding entries can be reordered using drag&drop. This allows to move them to the desired location in the list. Clusters can also be grouped in sub-menus by first creating a new group (sub_menu) and then dragging cluster entries into it. The tree structure of the Manage Clusters dialog will be reflected in the drop-down menu

of the Connect Cluster dialog. This allows a nicely structured layout when dealing with a larger number of clusters as e.g. in the case of service providers. Standard cluster admins will most likely not need this feature.

⁠2.2.3. Changing the certificate safe password

The Manage Clusters dialog allows changing the password for the certificate safe. This requires entering the old password for the safe as well as the new password and a confirmation of the new password. The Ok button will only be selectable if the new password and confirmation matches.

⁠Chapter 3. Global Cluster Settings

⁠3.1. License Installation

To be able to configure your cluster with QluMan, a license key needs to be installed. Without a valid key, an error will be displayed upon start-up and the QluMan daemon will only allow read-only operations. In read-only mode, you can have a look at all the settings, but you can't change anything.

After closing the error dialog the License Key window can be opened by selecting Manage Cluster->License Key in the main Menu. If you never installed a license key before, the window will come up mostly empty.

To request a license key, click the Request Key button and a Key Info Dialog will open asking you for your name and address info. If you intend to use Qlustar Basic Edition, checkmark Free / non-profit license at the bottom of the dialog. Select OK to continue saving the key request in a file. Enter a file-name and save. You will now have to send an e-mail specifying the required number of nodes and features (see our order page) with this key file attached to order@q-leap.com

After a while, you will receive a license key by mail, that you can import using the Import Key button. If the import was successful, the license terms will be displayed. After accepting the license, you're ready to start working with QluMan.

After a successful license installation, you can move on to the interesting stuff. Upon start-up, the QluMan client automatically connects to the local QluMan daemon and opens the Enclosure View window.

⁠3.2. Configuring Network Parameters

During the installation of Qlustar, the configuration parameters for the cluster network had to be entered and normally you won't need to change them. For the rare circumstances, where they do need to be changed, or in case you just want to verify the settings, you can select Manage Cluster->Network Config from the main windows menu.

Note

Note, that changing anything in this dialog involves a fundamental reconfiguration of the cluster setup, including rebooting the whole cluster. For this reason, changes in the Global Cluster Network Config do not take effect immediately after being changed in the dialog. They require a further confirmation by clicking the Save button, that will then actually commit the changes.

⁠ Configuring Ethernet

Changing the Ethernet network configuration is the most involved and requires several steps. In the Global Cluster Network Config there are 3 settings relevant for Ethernet: The Cluster Network IP address, the netmask and the cluster internal Head IP address. When changing the Cluster Network IP address or netmask, the IP addresses of all hosts configured in QluMan will be remapped to reflect their new values. This requires that the new netmask is large enough, so that the resulting network range can include all existing hosts in the cluster. Therefore, the GUI won't let you pick anything too small. If there are unused address ranges in your existing network and you need a smaller netmask than currently selectable, you will first have to change some host addresses so that all of them combined occupy a small enough subset of the current network.

Changing the network address IP will automatically remap the cluster internal Head IP address as well, while changing the netmask will not. Note, that the Qlustar convention, to use the second last IP of the cluster network as the Head IP, is obviously not a requirement. Hence, this is not done automatically when changing the netmask. Furthermore, changing the Head IP involves some additional steps without which the nodes in the cluster won't function or even boot. The point is that the Head IP also appears in the Global DHCP Template, NIS Host header and SSH Hosts / Known Hosts. These templates are simple, freely editable text blobs. A change of the Head IP in the Global Cluster Network Config will not change them, so you need to check and adjust each of them manually.

If the head-node does not have direct access to the Internet, a http proxy must be configured. Qluman uses this proxy to download packages from the Qlustar repository, when creating a new chroot. Click the checkmark before Http Proxy to enable proxy support and enter the hostname together with the port of the proxy. If the proxy requires authentication, click the checkmark before Authenticate and enter a username and password. Like usual, the new settings will only take affect, when you click the Save button.

⁠Chapter 4. Enclosures

⁠4.1. Enclosure View

The Enclosure View shows an overview of the cluster in a tree structure. The tree is designed to reflect the physical structure of the cluster. At the lowest level are the hosts. A host can be a head, storage or compute node but also a switch e.g. In general, anything in the cluster that has a name, IP and MAC address is a host.

A host is represented by its bare board and should be placed into a host enclosure. 1U, 2U, 3U or 4U enclosures contain exactly one board, while others like a Twin or Blade chassis can have multiple boards. Once defined, host enclosures can be placed into racks, racks grouped into rows, rows into rooms and so on. The tree has a simple drag&drop interface. E.g. you can select a number of nodes (by holding the CTRL key and clicking or holding the shift key and dragging the mouse) and drag&drop them into a Blade enclosure.

Selecting a node in the tree displays its configuration info on the right hand side. Hovering over a host entry in the tree view brings up a tool-tip with additional info about the host. The hostname, IP and MAC can be edited and saved by pressing return. The check-mark to the right of the field shows the state of the change. If the entered value is not valid, the check-box is cleared

The check-boxes tool-tip (you'll see it when the mouse moves on top of it) gives the reason, why the entered value is invalid. If the entered value is valid but has not yet been saved, the check-box is checked but ghosted. Finally, after the value has been saved in the database, the check-box shows a solid check.

For nodes that are not part of a multi-host enclosure (like a Blade or Twin chassis) the enclosure type can be changed to one of the single-slot host enclosures (1U, 2U, etc.). A new enclosure of the chosen type will then be created if the node is not already part of one. If a node is included in a multi-host enclosure, this field will be ghosted.

The template field allows to select a so-called Host Template for the node. Usually, large groups of nodes have an identical hardware and software configuration and will use the same template. Deviations from the properties coming from the template can be set for individual hosts by direct assignment of either a property/config set or individual properties/configs directly to the host through its context menu. In case of unique properties, direct assignments override settings from the template (or property set), for non-unique properties this is additive.

Note

Any changes made in the configuration only affect the active node (as indicated by the hostname in the info part of the enclosure view), and not all selected nodes. Configurations for all selected nodes can be made by using the context menu (right click) in the tree view.

⁠4.2. Managing Enclosures

Similar to host nodes, selecting an enclosure entry displays the physical layout of the corresponding enclosure on the right. Controls to select the visibility level and special slots are available at the top of the display. See below for more details about these. The name of the enclosure and its type (in brackets) is shown in the title. In the above case, both name and type are "Twin². Below the title you have a representation of the physical layout of the enclosure. For this example, you see the 2x2 slots that are characteristic of a Twin² enclosure. Two slots are filled with beo-01 and beo-02 and two slots remain empty, showing only the number of each slot in brackets.

Selecting a rack shows a more complex picture. The current example rack holds ten enclosures in its central 19 inch slots: A FatTwin, a Twin, a Twin², a Blade 1, 3 Blade 2, another Twin² and two 1U enclosures containing beo-11 and beo-12. The special top, left, right and bottom (not visible) slots are empty. In future versions a network switch or power controller, that is mounted at some special position of the rack, can be placed into these special slots.

Now let's explain the effect of the two controls at the top in more detail: The Show special slots check-box controls the visibility of the top, left, right and bottom special slots. Especially if these slots are empty, this will provide a more compact view of the interesting central slots. The other control, the visibility level, controls how many levels of the enclosure hierarchy are shown: Selecting a depth of 2 shows not only the selected rack with its slots but also the contents of the enclosures in each slot.

Since the current version of QluMan only supports host enclosures (Twin, Blade, ...) and racks, a depth larger than 2 has no effect yet. In future versions, it will be possible to group racks into rows, rows into rooms, rooms into buildings and so on. This will allow you to reflect the physical layout of your cluster in as much detail, as you like.

⁠4.2.1. Populating Enclosures

New enclosures can be added through the context menu. The new enclosure must be given a name and its type can be selected. Currently, enclosure types cannot be manipulated yet. This will change in a future version.

Suitable for ordinary servers, a host being selected in the enclosure view can be placed into a single slot host enclosure directly by selecting the correct type in the host info part of the window (see Section 4.1, “Enclosure View”). For host enclosures that can hold more than one server/node (twin servers, blades etc.), drag&drop may be used to move hosts into them. Moreover, it's also possible to create larger (non-host) enclosures (like racks) and move host enclosures into them also by using using drag&drop. Note, that a bare host cannot be placed directly into a non-host enclosure, only if it is already inside a host enclosure.

Another option to place hosts into enclosures is by selecting a number of them and then choosing a host enclosure from the context menu. This way, a new enclosure of the selected type is automatically created and all selected hosts are moved into it. If more hosts than can fit into a single enclosure of the chosen type are selected, additional enclosures of the same type will be created such that all hosts can be placed into one of them. This makes it easy to position large numbers of identical hosts into their enclosures. If the selected hosts were in an enclosure before and that enclosure becomes empty and is not itself part of a larger enclosure then the empty enclosure is automatically removed.

Relocating hosts by selecting a different host enclosure is supported not only on directly selected hosts but also on hosts inside selected enclosures. This allows changing the type of enclosure a group of hosts is in by selecting the old enclosure(s) and choosing a new one from the context menu. Note that this procedure does not change the type of the old enclosure but rather creates a new one, moves all the hosts to it and then deletes the now empty old enclosure(s).

Try it out: Place a number of hosts into a large enclosure (like a blade), then select the enclosure and choose a small enclosure (like 1U) to relocate them. In general, such an operation will create one enclosure of the new type and fill all its slots before creating a second one. Hosts having been in different enclosures before, can end up in the same enclosure and hosts that were in the same enclosure before can end up in different enclosures after this operation.

When using drag&drop for the relocation, the host or enclosure is always placed into the lowest suitable slot of the target enclosure. This reflects our experience, that usually enclosures are simply filled from left to right and bottom to top.

But sometimes this is not the case and a host or enclosure should be in a different slot as compared to the automatic placement. In this case, the host or enclosure can be moved through the context menu. The latter shows all the free slots the host or enclosure can be relocated to and a checked mark indicates the current location. Of course the relocation is only allowed into free slots. Hence, it may require removing (drag&drop out of the enclosure) a host or enclosure temporarily to free space for moving things around.

⁠4.2.1.1. Host Selections

There are situations, where one wants to change a property or config of a whole set of hosts. For example, you may want to change all nodes located in a particular blade to no longer format their disk on boot. This can be achieved by selecting a set of hosts in the enclosure view with the mouse. A range of hosts can be selected by clicking on the first host and then clicking on the last host, while pressing the shift key. Hosts can also be added or removed from the selection by clicking on a host while pressing the CTRL key. Once a set of hosts is selected, changes can be made to all selected hosts through the context menu. For instance, this allows changing the Host Template or add/alter a generic property of a set of hosts.

Note

When a host is part of an enclosure, selecting the enclosure will also select the host(s) inside of the enclosure, provided it is collapsed. However, hosts inside of expanded enclosures must be selected individually.

An alternative and more powerful way to select a set of hosts is available via the Selection button at the bottom of the Enclosure View. When pressed, at the top of the appearing selection menu you'll find 3 items: To select all hosts, clear the selection or to invert the selection.

Below these items is a list of filters by which subsets of hosts were defined according to specific criteria. For more details on how to construct such Host Filters see Chapter 10, Host Filters. When pressing Select, the selection is set to the hosts defined by the corresponding filter, dropping any previously selected hosts. Add to adds, while Remove from removes the hosts defined by the filter from the current selection. Intersection sets the selection to only those hosts in the current selection, that are also part of the set defined by the filter.

⁠Chapter 5. Adding/Configuring Hosts

⁠5.1. Adding Hosts

To add new hosts to the cluster you can either select "New Hosts" from the context menu in the Enclosure View tree or from the "Manage Hosts" menu. This opens the "Hosts Window".

Adding a new host requires the specification of an IP address, hostname and MAC in the corresponding three text fields of the dialog. The entered values are checked for their validity. If one of them is not valid, the check-box to its right remains cleared. The tool-tip of the check-box will then show, why it is invalid. If all the values are valid, all check-boxes will show a solid check and the Add Host button will become selectable.

For convenience and if it makes sense, the IP address and the numeric part of the hostname (if there is one) will automatically be incremented by one, after a host was added. So in most cases, these fields will not have to be changed manually to add the next host. Only the new MAC will need to be entered.

To help adding new hosts, qlumand scans the DHCP log file for unknown hosts that have requested an IP address. For each unknown host found in the logs, the table at the top of the window shows the time of the first and last appearance in the log, its MAC address as well as the hardware vendor this MAC is assigned too (if known). Selecting a MAC in the table copies it into the MAC text field at the bottom and a double-click adds the host with the selected MAC. One can also select multiple lines (by holding the CTRL key and clicking or holding the shift key and dragging the mouse) and then click the Add Selected button at the bottom to add them all using the auto-increment feature for the IP address and hostname. If unsure, try adding a single host first and check the auto-increment does the right thing before adding a group of hosts.

One easy way to add groups of hosts is to power them on one at a time with a short delay (say 30 seconds). The hosts will then appear in the Unknown MACs table in the order they were powered on and can be added as a group with the click of a single button.

At the bottom of the window a Host Template can be selected that will be used as the default for new hosts. Most of the time, no additional configuration is needed for a new host. As an alternative way to make settings for the new hosts, one can select an existing properly configured host and choose to copy its settings to the new ones.

⁠5.2. Configuring Hosts

The configuration of a host consists of the definition and assignment of different types of properties and config classes. Properties are always a key + value pair and are further split into generic, hardware and specific properties.

⁠Hardware Properties

Hardware properties are used to describe the hardware of a host. Amongst others, hardware properties like the amount of RAM or number of CPU cores are used to configure resource management systems like slurm or torque, so jobs can be assigned to the desired hosts. Others, like e.g. the server model or BIOS version, are purely informational and might be used for inventory management.

⁠Generic Properties

A property that is not hardware related and not specific to a host is called generic. Generic properties can be configuration options, like OpenSM Host, or purely informational, like Paid by. While hardware properties are meant to be more rigid, typically with a configurable set of fixed values, generic properties are more flexible and can be defined at will. This will become more apparent in future versions of QluMan. Generic properties are also not necessarily unique, making it possible to assign multiple values for a single generic property. This is useful e.g. to add multiple torque host tags or to put hosts in multiple groups for dsh/pdsh (via the 'Host tag').

⁠Specific Properties

As the name suggests, specific properties are specific to a single host. The best example for such a property is the serial number of the host.

⁠Property/Config Sets

Individual generic properties, hardware properties and config classes can be used to define generic property sets, hardware property sets and config sets. This is simply a means of grouping them together so they can be used as a single entity. A Host Template can then be created by choosing one generic property set, one hardware property set and one config set.

⁠Host Templates

When a correct Host Template exists, a host can be configured by selecting the desired template in the Enclosure View window. For a single host, this can be done by selecting it in the tree view. This brings up the host information on the right and a template can be selected from the drop-down menu. To configure multiple hosts, you would select them in the tree view and choose a Host Template from the context menu. The check-marks in the sub-menu indicate which Host Templates are currently assigned (if any) for the selected nodes. This action will override the previous assignment.

Generic properties can also be assigned to a host individually. Such assigned properties take precedence over ones of the same type selected through the Host Template. This is useful when a property should be changed temporarily for some hosts or when a property should not be changeable globally through the Host Template. Note that by default, every new host has the generic property Schedule Format: always, which is required to format the disk on the first boot. This should be removed after the first successful boot of the host, so that log files will be preserved across boots in the future.

⁠Chapter 6. Hardware Wizard

⁠6.1. Purpose

When setting up new hosts, there are a number of configuration or other settings to be made. They are used to specify their hardware configuration, to determine what OS they should boot and to fine-tune the behavior of applications running on them. All the necessary steps for the desired configuration of the nodes can be done manually and also be changed later through the various dialogs from the main window.

As a convenient alternative, the Hardware Wizard guides you through the necessary configuration steps with a special emphasis on the hardware configuration. It uses the auto-detected hardware properties of hosts to suggest their optimal configuration options. Furthermore, it tries to keep a balance between the available configuration strategies: Using templates, property/config sets or individual properties/config classes.

⁠6.2. Selecting Hosts

The first step is to select the hosts that should be configured. Initially, the lists of hosts is empty. One or more of the four buttons at the bottom have to be pressed to pre-select hosts that should be considered. The Unconfigured button adds all hosts that do not have any hardware configured at all. A freshly added host without an assigned Host Template will fall into this category. The Partially Configured button adds hosts that already have some hardware configured correctly but not all of it. The Wrongly Configured button adds hosts, where the configured hardware properties do not match the hardware detected at boot, e.g. when nodes have been updated with more ram. Finally, the Selected button adds hosts, that have been selected in the enclosure view, including hosts that are configured correctly already.

Once one or more of the buttons are pressed, the affected hosts will show up in the table. To keep things compact, hosts with identically detected hardware are grouped together and shown in hostlist syntax. By default, all shown groups are selected and will be configured using a single Host Template and therefore single Hardware Property, Generic Property and Config Set. The possible differences in hardware configurations within the list of selected hosts will be handled by the wizard with the per host settings. In case all the groups shouldn't use the same Host Template, groups may be selected or deselected individually and the remaining ones can be configured by running the wizard again later. Groups of hosts with identical hardware can't be split up though. If this is required, select the hosts individually in the Enclosure View and use only the Selected button. Once the desired groups of hosts have been selected click Next to continue configuring them.

⁠6.3. Configuring the Host Template

As explained in Section 5.2, “Configuring Hosts” the major part of a hosts configuration is derived from a Host Template. One of the wizard's goals is, to find an existing Host Template with a Hardware Property set that optimally matches the detected hardware for at least some of the selected hosts. If such a Host Template is found, it will be pre-selected and the Use existing Host Template choice will be active.

The settings inherited from this template, are shown underneath in tree format and below the property tree, a list of hosts, that currently use the selected template, is shown for informational purpose.

The individual properties belonging to the Hardware Property Set of the selected Host Template are color-coded, to show how well they fit the detected values of the host groups being configured. Hovering over a hardware property brings up a helpful tool-tip explaining the coloring. A green bulb indicates, that the property matches the detected value for all hosts. A yellow one, that it matches some but not all hosts. This happens, when some of the selected hosts have different hardware and means that the selected template is still a good fit. A red bulb indicates that the value matches none of the hosts and is a bad fit. Such a property value may be changed later in the follow-up pages or a different Host Template can be selected right-away.

In case the pre-selected Host Template is not the desired choice, a different one can be selected from the drop-down menu. The choices are again color-coded to indicate how well they match the detected properties of the selected hosts. Here a green bulb means that the Host Template matches the detected hardware of at least one host perfectly. A yellow one means that not all hardware properties for the hosts are part of the template, but at least nothing is configured wrongly.

Finally, a red bulb indicates, that the Host Template includes a hardware property, that matches none of the hosts and would be a bad fit. Nonetheless such a template might still be the right choice, since it can be modified for an optimal fit in the follow-up page. Alternatively, the correct hardware properties can be set on a per host basis by the wizard at a later stage .

If none of the existing Host Templates are suitable, a new one can be created in one of two ways: Either an existing template can be cloned or a completely new one can be created. In both cases, a name for the new template must be given.

⁠6.4. Selecting a Hardware Property Set

This page selects the HW Property Set to be used in the selected Host Template. It is the main source for the node's hardware configuration. Like in the previous page an existing HW Property Set can be used/cloned or a new one may be created. Most likely an existing set will be suggested by the wizard. Alternatives are selectable from the drop-down menu. The available choices are again color-coded indicating how well they match the detected host properties.

Changing the HW Property Set at this stage, will affect the selected Host Template. If an existing Host Template was chosen in the previous page, changing it might affect hosts other than the ones being configured in the wizard. In such a case, the wizard will ask for confirmation that such a change is desired.

A selected existing HW Property Set may be modified for a better fit by using the auto-detected HW Properties displayed at the bottom right. If multiple groups of hosts are being configured at the same time, the properties, where hosts differ, will have a drop-down menu to select the most suitable value. Once the desired selection is made, the properties can be copied over the existing HW Property Set by clicking the << button. The wizard will ask for confirmation, in case this would impact hosts not currently being configured. Finally, it will set the HW Property Set displayed at the bottom left into edit-mode.

The described behavior is analogous when cloning or creating a new set. The difference between the two cases lies merely in the HW Properties that will be pre-selected: While cloning will start with the properties of the cloned set, creating a new one initially will have none.

In all three cases, the HW Property Set can be further edited by selecting different values for properties, adding new ones or by removing some of them (both from the context-menu). Once the desired HW Properties are selected, click Next to continue.

⁠6.5. Resolving Hardware Conflicts

If more than one group of hosts is being configured at the same time or if the selected HW Property Set doesn't match all properties of the hosts to be configured, then the Resolve Hardware Conflict page will appear next. At the bottom of it, the conflicting or missing HW Properties are listed showing the detected value for each group of hosts. If only a single property is missing, the wizard will suggest to add this property individually per host.

On the other hand, if multiple properties are missing, adding a directly assigned HW Property Set per host might be preferable and will be the pre-selected choice. There is not really a wrong choice here. To some extent, the chosen option is a matter of taste.

Note

One can also choose configure manually later to tell the wizard to ignore the conflict. Be aware, that this will result in hosts that are only partially or wrongly configured and hence will need to be corrected later.

⁠6.5.1. Resolving by per-host Hardware Property Sets

If per-host HW Property Sets was chosen in the previous page, the set to be used for each group must be picked here. The Wizard will try to find an existing HW Property Set that already contains the correct Hardware Properties for each group. If such a set is found, it will be pre-selected. Otherwise, the only option is to generate a new set, for which a name must be entered, before it's possible to continue.

⁠6.6. Selecting a Generic Property Set / Config Set

To complete the setup of the Host Template, a Generic Property Set and a Config Set must be selected. The two wizard pages handling this are very much alike, and similar to the one for selecting the HW Property Set. Again, there are three main options: Using/cloning an existing set, or creating a new empty one. Since there is no auto-detection for the values in these two types of sets, there is no color-coding of the choices in this case.

An existing set can not be modified in the current QluMan version, but if clone existing set or new empty set is chosen, the properties and configs can be added to or removed from the new set. If the hosts have IPMI, the IPMI properties might need to be set in the Select Generic Property Set page. On the other hand, in the Select Config Set page, the Boot, Disk, and Slurm configs, are the most likely candidates for settings that need to be selected and fine-tuned.

⁠6.7. Summary Page

This is the concluding page of the wizard. It asks for the final confirmation of all the choices made, before the corresponding settings will actually be stored in the database. At the top of the page, the configurations derived from the Host Template (hence common to all hosts) are shown in tree-form. At the bottom, the additional Hardware Properties and/or Hardware Property Sets, that will be set for each group of hosts on a per-host basis, are listed. In case of conflicts, they potentially override the values of the Host Template. Host groups with no per-host overrides are not shown here.

Note

If an existing Host Template or Hardware Property Set was modified during the wizard procedure, then this is the last chance to drop out. By clicking Finish, all the modifications are made permanent, and the configuration settings will be assigned to the selected hosts. The latter should then be fully configured, and after writing the changes (see Section 7.2, “Writing Config Files”), will be ready to (re)boot.

⁠Chapter 7. Common Config Classes

⁠7.1. Overview

Config Classes manage configurations that are too complex to fit into the key + value scheme used by properties. Therefore, there is no common interface to configure all classes. Instead, each class has its own configuration dialog, presenting the specific options it provides. Furthermore, some classes depend on sub-classes (e.g. Boot Configs depend on Qlustar Images). Only the top-level Config Classes are directly assignable to a Config Set or a host. Sub-classes are assigned indirectly via their parent class. Most of the functional subsystems of Qlustar have a dedicated Config Class. Currently, there are four of them: Boot, DHCP, Disk, and Slurm Configs (Slurm is optional) complemented by a single sub-class, Qlustar Images.

⁠7.2. Writing Config Files

The configurations managed in the QluMan GUI via Config Classes and sub-classes, are translated into automatically generated configuration files on the head-node(s). While QluMan configuration options are usually saved in the QluMan database immediately after they have been entered in the GUI, the write process of the real configuration files on disk is a separate step, that needs to be specifically initiated and confirmed.

Each configuration dialog of a config class has a Preview and a Write button for its own config files. Additionally, there is a dedicated dialog for writing and previewing all configuration files. You can access the latter from Manage Cluster->Write Files or via the Write Files button at the bottom right of the main window. The button is an indicator for the presence of pending changes: It is greyed out if there aren't any, and fully visible otherwise.

If a config class has no pending changes, the Preview button becomes a View button and the Write button becomes ghosted. The Preview window shows both, the new version of the config file that will be written, as well as a context diff of the changes compared to the current file on disk (if there are any differences). If a Config Class changes only one file, that file will be shown directly. If multiple files are involved, there will be one tab for each file.

Note

Checking the optional Force button, will initiate a rewrite of all config files, even if they haven't changed.

Note

The actual write command is performed via the Qlustar RXEngine. This allows for consistent management of multiple head-nodes e.g. in a high-availability configuration.

⁠7.3. Boot Configs

The Boot Config dialog allows to define settings for the PXE/tftp boot server. A boot configuration determines which Qlustar OS image is delivered to a node, and optionally permits the specification of PXELinux commands and/or Linux kernel parameters. When opened, the Boot Config window shows a collapsed tree-list of all boot configs currently defined, sorted by their names.

Note

Note that the default config is special: It applies to any node without a specifically assigned (either through a template or directly) Boot Config. This means, that in the simplest configuration, where all nodes should boot identically, having just the default config will be sufficient.

By expanding a Boot Config item, the configured Qlustar image, PXELinux command, and kernel parameters become visible. You can change any of the values, by simply selecting a different option from the drop-down menus. In case of kernel parameters, you can also directly edit the entry and save the result by pressing return. Furthermore, it is possible to add more kernel parameters or remove them through the context menu.

The context menu also lets you create new Boot Configs and edit or delete an existing one. Alternatively, a new Boot Config can be created by clicking the New button at the bottom of the dialog. Both, the context menu and the button bring up the New Boot Config dialog. Simply enter the name and description for the new config, select a Qlustar image and (optionally) a PXELinux command. Finally press OK to create it. The new config will then appear in the Boot Config window and will be ready for use.

Pressing the Boot Parameter Editor button at the bottom of the dialog, will bring up a small edit dialog, where kernel parameters can be created, edited, or deleted.

⁠7.4. DHCP Config

The DHCP config dialog allows the configuration of the DHCP server and is provide by the main menu Manage Configs->DHCP Configs. The final DHCP server configuration file on disk is assembled from the header which defines global settings and the host part which contains the MAC/IP address and hostname of all the hosts registered with QluMan. The header can freely be edited in the Global DHCP Template part of the dialog. An initial version of it is created during installation and in most cases doesn't need to be changed. It contains many important parameters required for the cluster to function properly. Please consult the documentation of the DHCP server and the dhcpd.conf man page for the syntax of this file.

To prevent multiple persons from editing at the same time and overwriting each others changes accidentally you must acquire a lock for the template by clicking the Edit button. If another user is already editing the file the button will be ghosted and the tool tip will show which user is holding a lock for the template.

After having finished editing a template, don't forget to save your changes by clicking the Save button. It will be ghosted, if there is nothing to

save. You can undo all your changes up to the last time the template was saved by clicking the Undo button. In case another admin has made changes to a template while you are viewing or editing it, the Refresh button will become enabled. By clicking it, the updated template is shown and you loose any unsaved changes you have already made in your own edit field. To delete a template click the Delete button.

Note

Note that the "Global Template" can not be deleted, since it is needed for the DHCP server to function correctly.

The template lock expires automatically after some time without activity so that the template is not deadlocked if someone forgets to release the lock. In such a case the above dialog will be shown notifying you about it. By selecting OK a new lock will be requested. If another user is editing the template at that time though the request will fail and an error dialog will inform you of the failure.

DHCP options can also be set in separate group templates and targeted to specific hosts. For simple clusters, this is hardly ever needed, but for large clusters e.g. you might want to have more than one boot server to speed up cluster boot time. In this case you could assign different groups of hosts to different boot servers using this method. The defined group templates are available as configs to be added to config sets or hosts directly. You can select a group template from the drop-down menu at the bottom to view or edit it. As an example 2 templates specifying different boot-servers are included.

The drop-down menu also lets you create new templates by selecting the new DHCP group entry. Enter the name of the template in the text field and fill in the contents and description of the template. Pressing return after entering the name will automatically acquire a lock for the new template and go into edit mode. You can then enter the contents of the new template. Don't forget to click the Save button at the end.

When you are satisfied with your changes, you can preview the resulting dhcpd.conf file together with a diff to the old version on disk by clicking the Preview button. The changes will only take full effect when you click the Write button. This will also tell the DHCP server to reload its configuration. The same can also be done through the main menus Manage Cluster->Write Files entry or the Write Files button at the bottom of the cluster window and then selecting Preview or Write button in the DHCP Configs row.

⁠7.5. Disk Configs

Qlustar has a powerful mechanism to manage the configuration of disks on a node. This mechanism is partly based on the setup_storage module of FAI. It basically allows for any automatic setup of your hard drives including kernel software RAID (md) and LVM setups. A detailed description of the syntax for disk configurations is available. Since the OS of a Qlustar node is always running from RAM, a disk-less configuration is obviously also possible. Note, that for flawless operation this requires some extra configuration (handling of log messages and in/output of batch jobs) that will be explained in the Qlustar admin guide. Valid configurations require definitions for two filesystems /var and /scratch as well as swap space (see examples). To permit the initial formatting of a new disk configuration on a node, it must have assigned the Schedule Format: always generic property during the initial boot (see the discussion Property/Config Sets.

Disk configurations can be managed using the Disk Configs dialog accessible from the main menu Manage Configs->Disk Configs. You can select the config to be viewed/edited from the drop-down menu at the bottom left. A couple of example configurations are created during the installation. Note that there are two special configs: (a) "disk-less" (not editable or deletable) and (b) "default" (editable but not deletable). The default config is used for any node that doesn't have a specific assignment to a disk config (via a Host Template, config set). The configuration itself can be edited in the text field at the top of the dialog and should conform to setup_storage syntax (see above). New configs can be created by choosing new disk config from the drop-down menu. As usual, enter the name of the new config in the text field and fill in the contents and description.

⁠Chapter 8. Other Configs

⁠8.1. Qlustar OS Images

Qlustar OS images can be defined and configured in the Qlustar Images dialog accessible via Manage Configs->Qlustar Images. Each image has a unique name, a flavor (e.g. trusty), a version, an optional chroot and one or more image modules.

⁠8.1.1. Image Versioning

Currently available image versions are 9, 9.1, 9.2 (all meta-versions), 9.1.1 and 9.2.0. Note, that selecting meta-versions (like e.g. 9) has implications on the update process. They allow tracking the newest x.y (x.y.z) releases automatically. Example: If you have installed version 9 of the modules, you will currently get the 9.2 (most recent 9.y) versions, but if a 9.3 would become available, apt-get dist-upgrade will update to 9.3 versions automatically. So with this choice, updates will usually include larger changes, since new feature releases (like 9.3) will automatically be installed.

Similarly, if you have selected the 9.2 version (currently default after a fresh installation) you will currently get 9.2.0 (most recent 9.2.z version) and apt-get dist-upgrade will update the modules/images to 9.2.1 automatically once available. So this choice will update to new maintenance releases automatically. The most conservative choice would be to explicitly select a x.y.z version (currently 9.2.0), since then images will only receive bug fix updates without explicitly changing the version in Qlustar. See also the discussion in the general Qlustar Update Guide

⁠8.1.2. Image Properties

A couple of images are pre-defined during the installation process. The dialog shows the images sorted by their names. Expanding an entry shows its configuration and allows to select a UnionFS chroot via the drop-down menu. Each image contains at least the core module. Additional modules can be added or removed using the context menu when hovering over an entry. Only modules that are not already chosen are available for selection.

New images can be added through the context menu or by pressing the New button at the bottom of the dialog. Like before, you should then enter the name for the new config, choose a UnionFS chroot and optionally provide a description for the new image. Existing images can be removed via the context menu.

⁠8.2. NIS hosts

NIS (Network Information System) is used to manage hostname resolution within a Qlustar cluster. For all hosts that are managed within QluMan itself, a corresponding NIS entry is created automatically. However, administrators might want to add other hosts that are not part of the cluster to the NIS database as well. To allow this, the creation of the NIS hosts database is split into a header part that can be freely edited by the admin, and an auto-created part with the hosts managed by QluMan.

To edit the header part, choose Manage Configs->NIS Host Header from the main menu and press Edit. The top part of the window popping up can then freely be edited. When done press Save. Note that entries for the head-node are automatically created upon installation and should remain unchanged unless one of the head-node's IP changes. The final resulting NIS hosts file can then be previewed and written to disk by pressing the corresponding dialogs at the bottom of the dialog. Upon writing the file, the NIS database is automatically rebuilt on the NIS master server.

⁠8.3. SSH host files

To simplify ssh remote logins to cluster nodes, three ssh configuration files are provided and managed by QluMan: (a) ssh_known_hosts (holds ssh host keys of cluster nodes), (b) shosts.equiv (enables login without password between machines within the cluster) and (c) authorized_keys (used to allow password-less root login to nodes with the specified ssh public keys).

The first two config files consist of a configurable header part, where additional hosts can freely be entered and an auto-generated part for the hosts managed by QluMan. The authorized_keys one just has the configurable part. Ssh host info for the head-node and a possibly configured frontend-node are automatically inserted during the installation process.

Management of the three configs is similar to the NIS hosts dialog: To edit the header part of either config, select Manage Configs->SSH Configs from the main menu. Then choose the config to work on by using the drop-down menu at the bottom left and press Edit. The top part of the window popping up can then freely be edited. When done press Save. Finally, the resulting ssh host files can be previewed and written to disk by pressing the corresponding buttons at the bottom of the dialog.

Note

There is no preview of the authorized_keys file, as this is automatically written to /root/.ssh during the boot phase on hosts, that are not head-nodes.

⁠8.4. UnionFS Chroots

In most practical cases, a Qlustar image should be configured with an associated UnionFS chroot. Exceptions are single purpose images e.g. for Lustre servers. By design, images are stripped down to the functionality (programs) that is most often needed on a compute/storage node. This keeps them small while still providing fast, network-independent access to programs/files typically used.

To complement the image and provide the full richness of the packages/programs available in the chosen Linux distribution, the UnionFS chroot (holding a full installation of e.g. Ubuntu) is exported via NFS by one of the head-nodes and technically merged below the content of the Qlustar OS image. In practice, this means that all files belonging to the chroot will be available on the nodes configured to use the chroot, but if a file/program is also in the node's image, that version will be used. Hence, this method combines the compactness and speed of the imaging approach with the completeness of a full OS installation to give you the best of all worlds.

As explained before (see Section 8.1, “Qlustar OS Images”), the chroot associated with an image is easily selectable via the Qlustar Images dialog. The management of the chroots themselves is possible via the Manage Chroots dialog. It is accessible via the main menu (Manage Cluster->Manage Chroots) and provides a number of actions related to chroots. Manipulation of the contents of chroots is explained elsewhere.

To specify a chroot to operate on, select it via the corresponding pull-down menu. This will show its description, as well as its properties like the NFS server that serves it, the filesystem path on the server, the flavor (edge platform, trusty/wheezy/...) and the version of the Qlustar feature release (always being of the form x.y, e.g 9.2).

When generating a new chroot, a name for the chroot must be specified and optionally a description of its purpose. Furthermore, you can select an NFS server where the chroot will be located (currently only one option), a flavor (aka edge platform) and Qlustar version. Finally you have the possibility to select Qlustar tasks. These are topic package bundles, each consisting of a collection of packages relevant to a certain field of HPC applications. Pressing the OK button then starts the generation of the chroot. You can follow the rather lengthy process (count a couple of minutes) in its own window.

Cloning an existing chroot is mostly useful when you want to test an upgrade to a new release or for other tests. Pressing the Clone button, opens a sub-window in which you can specify the name of the new cloned chroot and optionally a description of its purpose. Pressing the OK button then starts the cloning process. You can again watch this in its own window. Editing a chroot allows to modify it's description.

Removal of a chroot, by pressing the Remove button, first asks you for a final confirmation. If you then press the Delete button, the chroot will be removed provided it is not still in use by a Qlustar image. If it is, a list of images that are associated with the chroot is displayed. You would then first have to reconfigure these images to use another chroot before trying to remove again. Renaming of a chroot is not supported directly. To rename, you'd have to clone the original chroot, giving the clone the new desired name and afterwards remove the old chroot.

⁠8.5. Infiniband Network

For most practical purposes, Infiniband (IB) adapters need to be configured with an IP address (IPoIB) just like Ethernet adapters. If you have chosen to configure an IB network during installation, this section is mostly about how to review or change the initial settings. If not, IB has to be activated first in the Network Configuration dialog. An IB Network

address IP and netmask can then be chosen there. The Infiniband network must not collide with the Cluster (Ethernet) or IPMI network. This is prevented automatically in the settings dialog. The Infiniband IP of each host is computed by mapping the host part of its Cluster Network IP to the IB Network. Example: IP Cluster Network 192.168.52.100 - IP IB network 192.168.53.100.

Note

This mechanism requires the IB netmask to be at least as large as the Cluster Network netmask. Hence, smaller values won't be selectable.

In order to have a configured IB adapter during the boot process of a node, additional steps are necessary. It is not uncommon, that a cluster consists of hosts with IB and hosts without. Therefore, the pre-defined hardware property IB Adapter with a value of true must be assigned to a host, to explicitly enable IB for it. This is done most conveniently, by adding this property to the Hardware Property Set(s) used in the Host Template(s) for nodes with IB. If this assignment exists, Infiniband modules will be loaded and IP-over-IB will be configured during the boot process of the corresponding nodes with the IP mapping described above.

Note

The Hardware Wizard auto-detects the presence of IB adapters and allows to conveniently set the IB hardware property.

⁠8.5.1. Activating/configuring OpenSM

In an IB fabric, at least one node (or switch) has to run a subnet manager to manage the IB routing tables. Qlustar provides OpenSM for this task. If the head-node is also part of the IB network, it's usually best to configure it to run OpenSM. This might have been chosen during installation, in which case there is nothing more to be done. If not, you have the option to run OpenSM on ordinary nodes too.

In this case, it is advisable to run OpenSM on two or three nodes (not more) for redundancy reasons. It is therefore best, to configure this directly for the chosen hosts, rather than using a Host Template or generic property set. After selecting the host(s) where OpenSM should run in the Enclosure View, open the context menu and select Set Generic Property->OpenSM Ports->ALL. The next time the host(s) boots, the OpenSM daemon will be started on all its Infiniband ports.

If a host has more than one IB port, OpenSM can also be configured to run only on a specific one rather than on all of them. The port can be specified by its number or by its unique ID. As this is an uncommon configuration and the unique ID is unknown beforehand, there is no preset value for this. To create a new value, first select an existing value, e.g. ALL, for the generic property OpenSM Ports. You can then edit the value in the Generic Properties box of a host. Editing the line and pressing return will create the new value. Beware that this will only affect one shown host. To assign the new value to other hosts, select them and then change the OpenSM Ports property through the context menu.

In some circumstances, it might be necessary to run OpenSM with extra options. This can also be configured via Generic Properties. The only preset value is the empty string, so you need to create a new value for the options you require. First add the empty value of the generic property OpenSM Options to one host. Then edit the value to your requirements and press return to create it. Finally add/change the OpenSM Options generic property for all relevant hosts.

⁠8.6. IPMI settings

Configuring IPMI is similar to Infiniband and also involves multiple steps, because there are a number of options to set. If you have chosen to configure an IPMI network during installation, a larger part of this section is about how to review or change the initial settings. If not, IPMI first has to be activated in the Network Configuration dialog. There you can set the IPMI Network address IP and netmask. The IPMI address of a host is then determined with the same mapping as used when configuring IB and the same restrictions for the choice of netmask apply.

Sometimes, not all nodes in a cluster have IPMI. Therefore per default, no host is configured to setup IPMI in QluMan, unless it is assigned the hardware property IPMI Adapter with a value of true. The easiest way to achieve this, is to add the IPMI Adapter property to the Hardware Property Set(s) used in the Host Template(s) for the nodes with IPMI. With this assignment, a node is ready for monitoring its temperature and fan speeds.

Enabling IPMI nodes for remote control involves two more settings. The first one is the generic property Initialize IPMI. Per default the settings of the IPMI cards are not touched by Qlustar. However, if the Initialize IPMI generic property is assigned and set to true, the IPMI card network settings of the corresponding host will be set every time it boots. Changing the value of this property to true and after booting back to false allows a one-time setup of the cards network properties.

The second generic property is the IPMI Channel to use. Per default channel 1 is used and this is the only preset value for the property. If you need to use a different channel, first add the generic property IPMI Channel to the Generic Property Set (or to a host directly) and then edit the value.

⁠Chapter 9. RXEngine / Remote Execution Engine

⁠9.1. RXEngine Overview

QluMan provides a powerful remote command execution engine, that allows to run shell commands on any number of hosts in parallel and analyze their output/status in real-time. Commands fall into two categories: Pre-defined and custom commands. The RXengine has the following capabilities:

The command can be a single command or a series of commands in bash shell syntax.
The hosts are specified in Hostlist format or through a Host Filter, so that even large groups can be represented by a short string.
The commands run in parallel on all hosts.
The network connection used for remote execution is both encrypted and authenticated. It employs the same high-speed/high-security elliptic-curve cryptography that is used for the connection between the QluMan server and the QluMan GUI.
The output is analyzed and updated in short intervals during the execution phase.
Hosts with equal output are grouped together to display a compact view of command's messages.
The output can further be filtered by the return code of the command and by (de)selecting stdout and/or stderr.

⁠9.2. Executing a pre-defined command

Pre-Defined commands can be created using the Command Editor (see Section 9.5, “Command Editor” for details). To execute a pre-defined command, open the pull-down menu of the Execute button at the bottom of the Enclosure View and select an entry. This opens a new Command Execution window for pre-defined commands. At the very top of it, the selected pre-defined command is shown. It can be changed if required. Below that is a list of the arguments, the selected command accepts. Execute on is always present showing where the command will be executed. If defined, additional arguments of the command are displayed underneath. Further below, the final command is shown, with its arguments inserted at the right places. The command will be executed upon clicking the Execute button.

Arguments to a pre-defined command can be set fixed to a Host Filter , in which case the filter and its resulting hostlist are shown as plain text and can not be edited. Optionally, specification of arguments in Hostlist format may also be left up to the user. In that case, a combo-box is shown, followed by the evaluation of the specified input shown as plain text. When hosts were selected in the Enclosure View, the combo-box will contain the hostlist corresponding to the selection as default. The text can be edited directly or a filter can be chosen from the dropdown menu. Any argument starting with "%" is assumed to be a filter. If this is not intended, the "%" must be escaped by another "%", but only at the start of an argument. For more details about specifying arguments in pre-defined commands see Section 9.5, “Command Editor”.

⁠9.3. Executing a custom command

To execute a custom command, open the pull-down menu of the Execute button at the bottom of the Enclosure View and select custom command from the menu. This opens a new blank Command Execution window.

Note

The initial hostlist is empty in the screenshot examples, since no hosts where selected in the Enclosure View.

In case hosts were selected in the Enclosure View before clicking the Execute button, a hostlist representing these hosts will be present in the Command Execution window. This allows easy selection of hosts to run a command on by selecting them in the Enclosure View. The hostlist can also be updated at a later time from the currently selected hosts in the Enclosure View by clicking the Update button. This makes it simple, to run the same command on different sets of hosts. When a command is executed, it is added to the history and can be accessed again later through the pull-down menu. This allows rerunning recently used commands without having to retype them every time.

Note

Note, that the history is stored in the users home directory, hence every user has his own. The preferred way to manage frequently used commands is by pre-defining them (explained in Section 9.5, “Command Editor”).

⁠ Passing input to a command

Sometimes it is necessary to pass some input to a command. This can be done by clicking the Input button near the top. Another text box will then be added to the window and will allow to specify, what should be passed to stdin of the command on each host.

⁠ Command Syntax

Commands will be interpreted/executed by the BASH shell on every host matching the hostlist. The full bash syntax is supported. Redirection of output to files, as in the last example, and working with variables works as expected. Please refer to the bash documentation (e.g. man bash) for more details.

⁠9.3.1. Custom Command Editor

To edit your custom commands open the Custom Command Editor by clicking the tool button next to the command. This dialog allows you to rearrange the order of how the commands appear in the custom command selectbox, delete and even edit them.

⁠Editing a custom command

To edit a command just select it. Now the Edit button gets enabled. After you click on it, a new dialog appears that lets you edit the command. Click Apply to save your changes.

⁠Deleting a custom command

To delete a custom command just go into its contextmenu and select Delete.

⁠Changing the order of custom commands

The order of the custom commands can be changed via Drag&Drop. You don't have to explicitly save the changes, since this is done automatically after each rearrangement.

⁠9.4. Analysis of Command Status/Output

Once the hostlist is added, a command can simply be run by entering it in the command box and hitting the Execute button. It will then start in parallel on all listed hosts and the command output will be collected. Periodically, in short but increasing intervals, the output will be sorted and displayed. Hence, for short running programs you will see it immediately. Due to the increasing display intervals, long running and noisy commands won't cause constant flickering of the output, allowing you to more easily follow it.

⁠9.4.1. Command Status

After the Execute button has been pressed, all hosts will start in the Pending state. Once a host confirms that it has started its command, it will change to the Running state. When the command concludes, the state becomes one of Failed, Errors or Success. If the command exited with a return code other than 0, the host will enter the Failed state. If the command exited with a return code of 0, but produced output on stderr, it will enter the Errors state. Otherwise, it enters the Success state.

In the screenshot example, the hosts sn-1 and sn-2 were down, so they remained in the Pending state. By clicking the Pending button, a hostlist of the pending hosts is displayed. The QluMan server will start the command on those hosts, when they become online again. If you do not want that to happen, or if the command does not terminate on its own, then the Kill button allows you to stop the command. A killed command counts as failed, so sn-1 and sn-2 now enter that state. The command output also reflects, that the command was killed.

⁠9.4.2. Host Grouping by Status and Output

Hosts executing a command are not only grouped by their execution state, the command output produced by the different hosts is also analyzed and compared to each other. Hosts with identical output are put into a group. Their output is only displayed once, prefixed with the hostlist representing the hosts in each group. For a quick overview, the number of hosts and groups is also displayed below each state button.

In the screenshot example, two hosts (sn-1 and sn-2) have failed, because they where offline and the command was killed before starting. The output of both was identical, so they form one group. Similar, one host (ql-head-pr-t) completed the command successfully and builds its own group.

The S buttons next to the numbers add or remove the hosts in each state to form a new hostlist. Press the button to include the corresponding hosts and press it once more to exclude them again. This is convenient, e.g. to quickly select only the hosts for which a command failed: Analyze the errors and later relaunch with an adjusted command. Another example: Select only the successful hosts to run a follow-up command etc.

⁠9.4.3. Filtering by stdout and stderr

Commands usually output regular text to stdout and warnings as well as errors to stderr. In the latter case, the command ends up in the Errors state, because this is usually something that needs further inspection. The screenshot example prints two lines, one to stderr and one to stdout. Unfortunately Unix does not enforce any order between output to stdout and stderr. Therefore, as in this example, it can happen, that a small delay between the command output and reading from the file descriptors causes the order to slightly change.

Some commands produce a lot of output. Error messages are then easily overseen in between the lines. Similarly a command might report a lot of harmless errors, that hide the interesting output going to stdout. To simplify an analysis of the command output for such cases, the two buttons stdout and stderr at the bottom of the window allow toggling the visibility of stdout and stderr output selectively on and off.

⁠9.5. Command Editor

The command editor shows all the pre-defined commands in a tree view on the left. A number of useful commands are already defined by default. Selecting a command shows its definition on the right-hand side, where it can also be edited. Every command has a unique name/alias under which it appears in the tree view on the left, the execute menu in the Enclosure View and in the drop down menu of the pre-defined commands execution window. In the future, it will also be possible to limit commands to specific user roles, but for now all commands are unrestricted. A user either has rights to execute any pre-defined commands or none. Below the role selector, the command itself is defined.

⁠9.5.1. Sorting commands

Commands are kept in a tree structure, grouping similar commands together. They can be sorted freely using the context menu to make frequently used commands easier to select. The Move Up and Move Down menu items move a command or group up or down within the group respectively. The Move to group sub-menu allows moving a command up or down in the tree hierarchy. Groups can be created, renamed and deleted to achieve any desired hierarchy of commands.

⁠9.5.2. Defining or editing a command

To define a new command, select New Command from the context menu and set its name. The new command will be created in the group, where the context menu was opened or in the root, if the mouse is outside of any group. Initially, the command will have no definitions.

To edit a command, it needs to be selected first. Then its definitions will be shown on the right-hand side. The name/alias of a command can be edited by clicking in the text box at the top and entering the new name. The check-box to the right of the name indicates, whether your name is valid. Press return, to save the new name and the check-box will become fully checked again. To undo editing, simply reselect the command in the tree view.

A command can be executed on any host or set of hosts in the cluster. The Execute on field governs how that host or set of hosts is constructed. The default is User input. This means, the user will have to choose the hostlist, where the command will run, at the time, when it will be executed. Alternatively, the hostlist of the command can be preset by selecting one of the filters from the dropdown menu. If a filter is selected, the hostlist, it currently evaluates to, is displayed below it.

Editing the command itself may take a while. To avoid conflicts from concurrent editing attempts by different QluMan users, only one person can edit a command at a time. To start the editing process, click the Edit button at the bottom. After that, changes to the command can be entered. Commands will be interpreted/executed by the BASH shell on every host matching the hostlist. The full bash syntax is supported. Redirection of output to files and working with variables works as expected. Please refer to the bash documentation (e.g. man bash) for more details. There is one exception to this: A "%" character followed by a number specifies additional arguments for the command, as explained in more detail below.

Sometimes it is necessary, to pass some input to a pre-defined command. This can be done by clicking the Input check-box. It will bring up an input text-box, where the desired input text can be entered.

To finish editing the command, click the Save button at the bottom. This actually saves the command text and input, if any, in the database and releases the lock on the command. This also scans the command text for argument placeholders and updates the entries in the Arguments box.

The definition of command arguments use the same mechanism as detailed for the Execute on definition. They can either be left up to the user, to be filled in when the command is executed or be specified by a filter selectable from the drop-down menu. When executed, the num placeholders in the command text are replaced by the user specified arguments or the resulting hostlist of the filter. There are always as many arguments as there are placeholders in the command. To add an argument, edit the command text and add a placeholder there. To remove an argument, edit the command text and remove the placeholder.

In the screenshot example, the test command is defined to execute on all head-nodes (qlu-dev is the only head node in the cluster). It has some input and two extra arguments. The first one is fixed to a Hostname filter that evaluates to any host starting with beo. The second one is left for the user to be specified, hence, when executing the command, only the second argument is editable. In the screenshot, the ONLINE filter was chosen for this argument, but any other text would have been possible too. For easy verification, the command text, with all the arguments substituted, is shown together with the command input (if defined).

Note

In the example, the specified input is simply output by the cat command, so in the output shown, it appears between the two echo commands.

⁠Chapter 10. Host Filters

⁠10.1. Overview

Host filters define a set of hosts by specifying any number of criteria. The set of hosts defined by a filter is dynamic: Changes made to the properties of hosts are automatically reflected in the hostlist a filter evaluates to. Every time a filter is used, the criteria defining it are evaluated from scratch. Hence, host filters provide a powerful tool to classify hosts into groups, in a way that will dynamically take into account changes made to the cluster. They can be used in various ways within QluMan:

In pre-defined commands, to either specify, the set of hosts, where a command should be executed or to supply the resulting hostlist as an argument to the command.
As user input for pre-defined or custom commands.
In the Enclosure View to modify the selection.

⁠10.2. Host Filter Editor

The filter editor window is split into two areas. At the top, the definition of the currently selected filter is shown. You can select the filter to be displayed from the drop-down menu. At the bottom, the hosts that currently pass all the filters are displayed in the compact hostlist format. This format is used by a number of other programs including pdsh and SLURM (the pdsh Wiki has a detailed discussion on the syntax).

Select new filter from the drop-down menu to start defining a new filter. Then add specific sub-filters from the context menu, until the desired subset of hosts is displayed in the bottom half of the window. Using their context-menu, filters can be edited or removed and sub-filters be added.

The Reset filter menu item clears the filter, so one can start from scratch. To finally create (save) the new filter click Save as and enter a name for it.

⁠10.2.1. Editing a Filter

Editing a filter is similar to creating a new one. First select the filter from the drop-down menu to display it's current definition. Then add, edit or remove individual filters as desired. Finally click Save as to save the altered filter, Using an existing name will replace the old filter. Using a different name will create a new filter.

⁠10.2.2. Types of Filters

A filter can be added from the context menu (right mouse click) in the top area. For a host to show up in the filtered list (bottom part), it must pass all the filters added. Each filter may narrow down the list. Any number of filters can be added and they do not have to be unique. For example you can add a Hostname filter that selects all hosts that begin with beo and a Host Template filter that selects all Demo VM nodes. A host has to pass all top-level filters to show up. Currently, QluMan provides six top-level filters: Hostname, HostTemplate, Enclosure, HEADNODE, HEADNODES and ONLINE. Additional ones will be added in the future.

⁠10.2.2.1. Hostname Filter

Adding a Hostname filter opens up a pop-up dialog asking for the hostname or a regular expression to filter for. The input must be a regular expression in python syntax and is matched against the beginning of the hostname. If a match against the full hostname is desired then "$" should be added at the end. A ".*" can be added to the front, to match anywhere in the hostname instead of matching against the beginning.

Note

Multiple hostname patterns can be added to a Hostname filter through the context menu. This is additive: If a host matches at least one pattern, it will be included in the resulting list.

⁠10.2.2.2. Host Template Filter

Adding a Host Template filter does not pop up a dialog. Instead it adds an empty Host Template filter. This simply selects all hosts with an assigned Host Template. Hosts that do not have a Host Template will not pass this filter. The filter can be made more specific by adding Host Template patterns to it through the context menu. This opens up a pop-up dialog, from where an existing Host Template name can be selected.

The result is a list of hosts, for which the associated Host Template matches the given pattern. Adding multiple Host Template names is again additive, just like with Hostname patterns.

⁠10.2.2.3. Enclosure Filter

Adding an Enclosure filter does not bring up a dialog either. Like a Host Template filter, it selects all hosts that are part of an enclosure. Unlike the Hostname and Host Template filters though, an Enclosure filter allows for two different specifications: The name and/or the type of an enclosure can be matched. Just like Hostname and Host Template filters the Enclosure filter is additive. Adding sub-filters for both the Enclosure name and the Enclosure

type will filter hosts that match at least one of those criteria. To filter for hosts that match both, an Enclosure name and an Enclosure type, two separate Enclosure filters have to be used to get the intersection of both filters. The first one to filter the name and the second one to filter the type.

⁠10.2.3. Inverting a Filter

Every filter, sub-filter and pattern can be inverted through the context menu. The context menu for a pattern contains menu entries for both, the pattern and the enclosing filter separated by a line. The first Invert entry will invert the specific pattern that was selected, while the second Invert will invert the whole filter.

Besides the obvious, this can also be useful in finding hosts that are not configured correctly. For example, adding an empty Host Template filter and inverting it, will show all hosts without a Host Template. Adding a second filter, that selects all switches, power controllers and other special devices (they usually don't need a Host Template) and also inverting that, results in a list of all hosts, that are neither properly configured nodes (missing Host Template) nor special devices.

⁠10.2.4. Additive versus subtractive

When constructing a filter, it is important to remember, that all top-level filters are subtractive. A host must pass all top-level filters to show up in the result. On the other hand, all patterns and sub-filters are additive. Matching any one of them within a top-level filter adds the host to the result of that filter. Hence, when subtractive behavior is desired for patterns or sub-filters, each pattern or sub-filter must be added to its own top-level filter. For example, to select all hosts that start with beo as well as end on "1", two Hostname filters have to be added.

⁠Chapter 11. QluMan User and Rights Management

⁠11.1. Overview

QluMan is multi-user capable and provides an interface to configure and control users as well as their permissions when they work with QluMan. The QluMan users are not connected to system users in any way. To simplify permission management, the concept of user roles can be used. User roles allow to pre-define a collection of permissions for QluMan operations. Once defined, they can be assigned to a user.

⁠11.2. Managing QluMan Users

The admin user is pre-defined and has the admin role, meaning all possible rights. Roles for the admin user can not be changed, just like the root user in a Linux system always has all rights. When running QluMan for the first time, you should set the correct email address for the admin user.

⁠11.2.1. Adding a User

To create a new user, click New User and enter the name for the

new user to create it. Then select the user from the drop-down menu and fill out the remaining fields. The changes will be saved automatically when return is pressed or the input field looses the focus. New users have no roles assigned to them and will have no rights to change something. They can only inspect the cluster config (read-only mode). See Section 11.3, “Managing User Roles/Permissions” for how to create new roles and assign them to the user by checking the respective check-boxes. If the

New User button is not selectable, then the user lacks sufficient rights to create new users. The Roles buttons will then also be disabled, preventing an unauthorized user from giving himself or others extra roles.

⁠11.2.2. Generating the Auth Token

A new user also lacks login credentials, so initially, he can't connect to QluMan. Hence, the next step is to generate a one-time token for the user, by clicking New Auth Token. Generating the one-time token may take a little time to finish and happens before the New Auth Token dialog opens. The dialog shows a certificate containing the

generated one-time token, as well as the other login information required to connect to the server. The certificate is protected by an auto-generated 8 digit pin, so that it can be transferred over unencrypted communication channels like e-mail or chat programs. In such a case, the pin should be sent over a second, different, communication channel, e.g. reading it over the phone.

Note

If a new cluster has been setup, an initial auth token for the admin user needs to be generated on the cmdline of the cluster head-node. This is explained in detail in the Qlustar First Steps Guide.

As a special case, when a user clicks New Auth Token for himself, the generated token is imported into his running client and replaces the current login credentials. A reconnect of the GUI client is then triggered automatically. It forces the client to generate a new random public/private key pair and use the new one-time token to authenticate itself to the server. This procedure should be used to invalidate the old keys and replace them with fresh ones, in case a user suspects the certificate safe might have been compromised by an attacker.

The New Auth Token dialog also has 3 useful buttons at the right bottom corner. The Import button allows adding the certificate directly to the running client. The use case for this is when creating a user account for oneself when working as admin. It is recommended, that for clusters with multiple users having the admin role, that every user has his own user account and the admin user is only used to initially create the new users.

The Save button allows saving the certificate into a file and the Mail button sends the certificate to the email configured for the user. In both cases, only the certificate is saved or mailed and the password needs to be send separately.

For optimal security, it is recommended to leave a new user without roles, until he has logged in using the one-time token. That way, if the certificate was intercepted, it will be useless to an attacker, since he won't be able to perform any actions within QluMan. Also, if the attacker manages to intercept and use the certificate before the real intended user does, the real user won't be able to use it anymore, and notice that something is wrong, most likely reporting to the main cluster administrator.

Note

The certificate contains the connection information of the cluster and the public key of the qlumand server. The latter ensures that the client will only talk to the desired server and can't be eavesdropped. The certificate also contains a one-time token, allowing any client to log in exactly once within the next 48 hours.

On the first login with a correct one-time token, the client's public key (generated randomly and uniquely for the cluster/user pair) is stored by the server and used to authenticate the user in the future. When establishing a connection, the client's and server's public and private keys are used, to safely exchange session keys enabling encryption with perfect forward-security.

⁠11.2.3. Removing a User

A user other than admin can be deleted by clicking the Delete User button. Just like the New User button, it is only enabled if the current user has sufficient rights.

⁠11.3. Managing User Roles/Permissions

The QluMan server performs many individual rights checks, before it allows/performs an operation. Many of those correspond directly to a specific window in the GUI, giving the user the right to alter settings in that window. For example, the right to configure Qlustar images corresponds directly to operations available from the Qlustar Images window opened from Manage Configs->Qlustar Images. Others govern the right to specific actions or to alter specific properties. For example, the right to configure OpenSM on hosts, enables the user to add, alter or delete the OpenSM Ports and OpenSM Options property of hosts in the Enclosure View.

The rights are grouped into 4 categories: Admin rights covers rights with global impact and root access to nodes, Booting covers all settings that affect how nodes will boot, Services covers the configuration of daemons and Host Config covers the general configuration of hosts.

Creating and editing roles is simple: Click New to create a new role, fill in a name and description for it and click OK. To change the rights associated with a role, first select it using the dropdown menu at the top. Next, click the checkmark boxes to the left of the rights you want to change, grant or remove from the role. Click Save, to save the changes, or Undo to reset the rights to the last saved settings.

⁠Chapter 12. Log Viewer

⁠12.1. Purpose

QluMan comes with a Log Viewer that allows to inspect important events in the cluster. Messages are categorized depending on the type of event, when it occurred, which component(s) it involved and how important it was.

⁠12.2. Messages indicator button

At the right bottom of the main window the QluMan GUI displays a Messages indicator. The button shows the highest priority of uninspected messages, as well as their number. Clicking the button opens the Messages window. The Messages window can also be opened through the Manage Cluster -> Messages menu item.

⁠12.3. Log Viewer window

Opening the Messages window shows a list of messages sorted by time, the oldest message displayed at the top. The messages can be sorted ascending and descending by clicking on any of the column headers. Only the short text of each message is shown to keep the window compact. Hovering over a row will show the long text for that row as a tool-tip. The long text can also be seen in a separate window by clicking the Details button. The extra window makes it easier to read multi-line messages and allows copy+paste.

⁠12.4. Message Filter

Not every message is of interest to a user, especially messages that have already been seen. Therefore, each user can create his own filter for messages by clicking on the Edit Filter button. A filter consist of a number of matches shown as rows, with an action, as well as a default action. The filtering process goes through the rows one by one. If all fields set in a row match a message, then the action set for that row is executed: Either the message will be hidden or included in the messages window. If none of the rows match a message, the default action applies to it.

There is one message filter per cluster connection tab. It can be freely edited. The message filter remains in effect till the tab for the cluster is closed. The filter can also be saved as a user-specific setting, so it is reloaded the next time a connection to the cluster is opened again. Alternatively, the filter can be reset to the last saved config or cleared so that the viewer starts without any filtering.

⁠12.4.1. Default Action

A filter can be constructed as a positive or negative filter. This means it can hide all messages that are not specifically matched or show all messages that are not specifically chosen as hidden. The default action can be chosen at the bottom left corner of the message filter window.

⁠12.4.2. Adding a Filter

A new filter row can be added by selecting Add filter from the context menu. The new filter has an action of hide and ignores all fields. It therefore hides all messages. To be useful, at least one column should be changed through the context menu, to match only some messages. The context menu in each column contains the possible values the filter can match against in that column. The Origin and Short columns can also be edited freely by double clicking them. The action for the row can be changed between Hide and Show.

⁠12.4.3. Filtering Seen Messages

The most common filter is to hide messages with the Seen flag. It is recommended, to always start a new filter by adding a row with action Hide and the seen column set to Seen. If none of the filter rows match against the Seen flag, then it will have no effect in the Messages window. The Seen filter can also be toggled between Seen and Unseen by clicking the checkmark. The column can only be disabled by selecting Ignore from the context menu.

⁠12.4.4. Filtering by Priority

Messages can be purely informational, warnings or errors. Informational messages include information about nodes coming online or the server being restarted. There are usually a lot of informational messages and they can be safely ignored. On the other hand, warnings and errors should be inspected more carefully. In the Log Viewer, the priority of a message is color-coded for quicker visual recognition. Informational messages are green, warnings yellow and errors red. The highest priority of any shown message is also shown in the Messages button in the lower right corner of the main window. This indicates at a single glance, if anything important happened recently.

⁠12.4.5. Filtering by Origin

The origin of a message shows the node or service that generated the message. When configuring the filter, the origin can also be expressed as a hostlist to match multiple hosts.

⁠12.4.6. Filtering by Category

Messages fall into different categories, pooling similar messages for easier filtering. Generally information is categorized under Misc, while messages about nodes becoming online or going offline under category Online. The Licensing category includes all messages concerning changes in the license status. This could be something simple as a reminder that the license key expires soon. Or more important, a warning or error, that the cluster, as currently configured, exceeds the available license count. The last category is Exception. It usually signals a software error, that should be reported.

⁠12.4.7. Filtering by Short text

Messages may also be filtered by their short description. Like Origin, this column can be edited by double clicking. Short descriptions are matched using standard regular expressions. To match only part of a short description, prefix and/or suffix the text by ".*" to match any remaining characters.

⁠12.4.8. A Filtering Example

The example filter shows a more involved setup: It contains five rows showing how rows can be combined to achieve the desired filtering result. The default action for this filter is set to show messages. Hence, only messages that are explicitly filtered as not wanted will be hidden.

Row 1 excludes messages with the seen flag set. Rows number 2 and 3 might look odd at first, because their action is the same as the default action: Show. But these two rows prevent any of the later rows from hiding messages with priority error or warning. In other words, warnings and errors will always be shown, no matter what additional filter rows follow. Row number 4 hides messages in the category online and row 5 hides messages that originate from hosts matching the hostlist "vm-[0-9]".

⁠Chapter 13. Optional Components

The fact that Qlustar is a modular Cluster OS with standard core functionality and many optional add-on components is also reflected in QluMan. Depending on the Qlustar modules installed and activated for a cluster, the QluMan GUI will have optional functionality accessible via its Components submenu. These optional components are documented below.

⁠13.1. Slurm Configuration and Management

⁠13.1.1. Slurm Configuration

⁠13.1.1.1. Overview

The slurm configuration module comes in four parts:

The overall slurm configuration, controlled via two templates in the Config Header tab.
The configuration of slurm nodes, done via the Node Groups tab.
The configuration of partitions, achieved by using the Partitions tab.
The configuration of GRES (generic resources) groups, settable using the Gres Groups tab.

Assignment of hosts to node groups and/or partitions is possible by adding the latter to the relevant Config Sets and Host Templates or by direct assignment through the config (set) context menu in the enclosure view.

⁠13.1.1.2. Slurm Config Header

The overall slurm configuration is split into two templates, the slurm config and cgroups.conf. On write, QluMan adds the NodeName and PartitionName lines at the end of the slurm config template to generate the slurm.conf file, while the cgroup.conf file gets written as is. For the syntax of both templates, please refer to the slurm documentation (e.g. man slurm.conf). To edit one of the templates, select it, click the Edit button and start making changes. Click Save to save the changes or Undo to discard them. Use the Preview button to check changes before writing them.

⁠13.1.1.3. Slurm Node Groups

Slurm node properties are configured from two sources:

The Hardware Properties assigned to a host. The number of CPUs, sockets, cores and the size of its main memory is derived from there.
The slurm node groups. Every host can belong to at most one such group. The membership is assigned (see Section 13.1.1.7, “Assigning Hosts to Slurm Node Groups, Partitions and Gres Groups”) by adding the desired node group to the Config Set that is assigned to the node via its Host Template or via the alternative ways to assign config classes.

Each Node Group is a collection of slurm node properties, that will be set for the members of the group. Per default, only the MemSpecLimit property is defined, but other properties like Feature or Weight can be added by using the Slurm Property Editor.

A new node group can be created by clicking the New Node Group button or selecting New Node Group from the context menu. This opens a dialog asking for the name of the new group. An existing node group can be renamed or deleted from the context menu.

The context menu also allows to add properties to a group. Note, that some properties are unique, i.e. only one value can be selected for the property. Adding a second value of the same property will automatically replace the old value in that case. Other properties are not unique. Adding multiple values to such properties results in a comma separated list of values in the

slurm.conf file. An example for this is the Feature property. Properties can also be changed directly using the pull-down menu. If a change will cause a duplicate value, the previous (duplicate) value is automatically removed.

⁠13.1.1.4. Slurm Partitions

The management of Slurm partitions works exactly the same way as that of slurm node groups. Please see Section 13.1.1.3, “Slurm Node Groups” for how to create, rename and change partitions.

⁠13.1.1.5. Slurm Property Editor

The Slurm property editor for node or partition properties can be opened by clicking the Properties button at the bottom of the Slurm main dialog. If the Node Groups tab is selected, the editor for node properties will be opened. If the Partitions tab is selected, the editor for partition properties will be opened.

To add a new property, enter the name of the property in the name field. If the name does not already exist, the New Property button will be

enabled. Click on it to create the property. QluMan has a white-list of known valid properties, e.g. Weight and allows adding such a property without further questions. In this case, QluMan will also set the unique flag and add all known property values automatically.

When a property is created that is not part of the white-list (Gres in the screenshot) a dialog opens up, asking for confirmation. Note that adding an unknown property can lead to a failure when trying to restart slurm. Therefore make sure to only add properties you are certain slurm will know about. A property without values can be deleted by clicking the Delete button.

To add values to a property, first select the desired property using the pull-down menu from the name. Then enter the new property using Add Value at the bottom and finally press return to add it. To delete a value, select Delete value from the context menu.

⁠13.1.1.6. Slurm Gres Groups

Currently, Slurm Gres Groups are used in Qluman mainly to handle the setup of GPUs for slurm. The GPU Wizard is the most convenient and accurate way to create such resource groups. Supplementing the wizard, the Gres Groups tab allows creating and managing any type of resource group, as well as binding GPUs to specific CPU sets, which is not possible via the wizard. To view or modify a Gres Group, select the group from the drop down menu. Use the Preview button to check the resulting config file changes before writing them.

A new Gres Group can be created by clicking the New Gres Group button. This opens a dialog asking for the type, name and description of the new group. An existing type can be selected from the drop down menu or a new type can be

entered directly. After entering a new unique group name the OK button becomes selectable. A group that is not in use can be deleted by clicking Delete Group.

A Gres Group can have multiple entries. A new entry may be added to a group by clicking on New Entry. Initially, the entry is blank and at least the type column must be filled in. For resources that can be allocated in multiple pieces, a count can be set, indicating the number of resource

pieces available. For resources that have a device file associated with it, its path can be set in the file column.

Note

For resources that have an associated file, the count is not applicable, since there is always only exactly one file.

Optionally, an entry can also be associated with a set of CPUs. The CPUs to be used can be entered as a comma-separated list or, for recurring sets, selected from the drop-down menu. An entry can be deleted from the group by selecting Delete Entry. A group that is no longer in use can be deleted by selecting Delete Group.

⁠13.1.1.7. Assigning Hosts to Slurm Node Groups, Partitions and Gres Groups

Hosts are assigned to Slurm Node/Gres Groups and Partitions by use of a Host Template and its corresponding Config Set or by direct assignment. A Config Set may contain at most one Node Group but any number of Gres Groups or Partitions,

since a host can be member of an arbitrary number of slurm partitions. They can all be assigned by selecting them via Add Config in the context menu of the Config Set or via the Enclosure View context menu of hosts.

⁠13.1.1.8. GPU Wizard

⁠13.1.1.8.1. Purpose

When setting up Slurm, the basic node config is derived from the hosts Hardware Properties. However, configuring GPUs is more complex: This is done through the Slurm Gres Groups as part of the slurm config class. Gres Groups are used to specify the type and number of GPUs of a host. When submitting jobs that require GPUs, this information is then used to determine the nodes that satisfy the job requirements. All the necessary settings for the desired configuration of the nodes may also be done manually and can be changed later through the slurm config dialog from the main window.

As a convenient alternative, the GPU Wizard guides you through the necessary configuration steps. It uses the auto-detected GPUs of hosts to suggest their optimal configuration options. Furthermore, it attempts to establish a balance between the available configuration strategies: Using templates or individually assigned config sets and/or config classes.

Note

For Nvidia GPUs to be detected on a host, it must have booted a Qlustar image that includes the nvidia module. Otherwise GPUs will be missed. Only nodes on which GPUs have been detected, can be configured through the GPU Wizard.

⁠13.1.1.8.2. Selecting Hosts

The first step in the wizard is to select the hosts that should be configured. Initially, the lists of hosts is empty. One or more of the four buttons at the bottom have to be pressed to pre-select hosts that should be considered.

The Unconfigured button adds all hosts that do not have any GPU configured at all. The Partially Configured button adds hosts that already have some GPUs configured correctly, but not all of them. The Wrongly Configured button adds hosts, where the configured GPUs do not match the GPUs detected at boot, e.g. when the GPU cards have been swapped for a newer model on the hosts. Finally, the Selected button adds hosts, that have been selected in the enclosure view, including hosts that are already configured correctly.

Note

Only hosts with auto-detected GPUs will be shown, even if others are selected.

Once one or more of the buttons are pressed, the affected hosts will show up in the table. To keep things compact, hosts with identically detected GPUs are grouped together and shown in hostlist syntax. Select one of the shown groups by clicking on the corresponding row and then press Next to start the configuration.

⁠13.1.1.8.3. Choosing the assignment option

There are three different ways, how the GPU configuration can be achieved: On the wizard's Config Set Page you have the option to a) add (modify) the GPU config to the Config Set of the currently assigned Host Template, b) clone the Config Set currently active or c) assign Gres Groups directly to the group of selected hosts. Select the desired method and press Next to continue to the next step.

In case the clone Config Set option is selected, the Host Template Page will appear and offer the choice to either modify the currently used Host Template or to create a clone of it for further modification.

Note

For the options that would modify an existing entity (Config set or Host template), the wizard dialogs always show other non-selected hosts, that would also be affected by the modifications.

⁠13.1.1.8.4. Creating/assigning Gres groups

The next step is to possibly create and finally assign Gres Groups to the list of selected hosts. The corresponding wizard page shows the unconfigured GPUs, each in a separate column. If an existing Gres Group exists that includes all or a subset of the unconfigured GPUs, the context menu allows to select it. This would conclude the assignment process.

Alternatively, when one or more GPUs are selected, a new Gres Group can be created that the GPUs will be a member of. The new group will have to be given a name and optionally a description. Once all GPUs are assigned to a Gres Group, you can finish the process by pressing Finish.

In case direct assignment has been selected, one more wizard page allows to fine-tune the assignment. An additional Action column appears that allows to a) either use and assign an existing Config Set, b) create and assign a new one

or c) directly assign the Gres Groups to the selected hosts. When choosing option b), the blank field of the New Config Set column becomes editable by double-clicking.

Like with other properties, the optimal way for configuring (via template or different direct assignment variations) is often a matter of taste and a trade-off between simplicity, clarity and precision concerning your individual configuration policy.

⁠13.1.2. Slurm Management

The QluMan Slurm Component provides extensive functionality to manage and operate more or less all aspects and features of the Slurm workload manager. In Qlustar 9.2, QluMan supports the included Slurm version 16.05. All QluMan Slurm functionality is accessible underneath the Components->Slurm top-level menu entry.

The following management and operation sub-components are available:

⁠13.1.2.1. Slurm Overview

The Slurm Overview window provides a summary of the utilization of the cluster. It is split into 2 parts: The Cluster Usage Overview tab and the Job Overview tab.

⁠Cluster Usage Overview

The Cluster Usage Overview provides continuously updated information and charts about Node, CPU Core and Memory utilization by Slurm jobs. Every information field in the tables has a tool-tip that supplies more detailed information about it.

Note

The colors used in the Cluster Usage Overview can be changed in the Preferences Dialog.

⁠Job Overview

The Job Overview display consists of two tables and four charts being continuously updated. The Running table provides summary information about running jobs of users. It shows the color representing the user (if his share is displayed in one of the charts), his username, the count of utilized CPU cores, the number of used nodes and the number of running jobs. The Pending table provides the total number of requested CPU cores and the number of pending jobs for the same user.

The job statistics is graphically displayed in the four pie-charts Allocated CPU Cores by User, Used Nodes by User, Pending CPU Cores by User and Pending Jobs by User. Every slice of the pie-chart has a tool-tip showing the name of the user it corresponds to together with his share in percentage of the corresponding resource. The used colors change randomly with every new invocation of the window.

Note

Only the users with the highest percentage of jobs are shown in the pie-charts (a maximum of 10 users being displayed).

⁠13.1.2.2. Job Management

The Job Management window shows a continuously updated table with all current jobs of the cluster. Since a single job has about 100 properties, every QluMan user is able to customize the job properties he wants to be displayed and which ones should be hidden in the table (see Customize Columns for more detailed information).

To sort the job table entries, one just has to click on the title of the property one wants to sort for (for example Job Id). Clicking the title again changes the sort order. You can also move a column with drag and drop and change its width.

These settings can be stored in layouts. Just modify the Job Management the way you want it and hit the Save Button. You can restore a layout by selecting it in the Layout combo box and press Load. When the Job Management gets opened it always uses the last state as layout. This is the layout that was set when you closed the Job Management the last time.

If you want to change the state of a job you just have to open its context-menu and select one of the following actions:

Kill Job: This kills a job and sets its state to CANCELED.

Suspend Job: This suspends a job and sets its state to SUSPENDED.
Resume Job: This resumes a suspended job and sets its state to RUNNING.
Requeue Job: This kills a job and puts it back into the queue with state PENDING.
Requeue and Hold Job: This kills a job, puts it back in the queue with state PENDING and places a hold on it.
Hold Job: This prevents a pending job from getting started.
Release Job: This releases a job that was in the HOLD state.
Set Priority: This allows to manually set the priority of a job.

Depending on the state of a selected job some actions might be disabled (e.g. a job cannot be released if it wasn't on hold before). As long as there is no conflict concerning their job states, it is possible to collectively manipulate either a list of jobs selected with the mouse or all jobs of the user of the currently selected job. If you want to get more information about a job, open the context-menu and select More Information (see More Job Information for details).

Clicking on Activate Filter at the bottom of the window, allows to activate one or more custom filters (created using the Job Filter Editor) by checking the corresponding entry. This can be useful to restrict the list of displayed jobs according to some criteria (e.g. a certain user). All currently active filters are shown in the bottom left corner of the Job Management window. They can be deactivated again by unchecking their entry in the Activate Filter sub-window.

Note

The column height of the job table is customizable in the Preferences Dialog.

⁠13.1.2.3. Customize Columns

The Customize Columns dialog displays all known columns (properties of a job) in two lists. The columns in the left list will be shown in the jobs table, the ones in the right list won't. To show or hide columns just select them and drag them either into the left or right list. Confirm your changes with OK.

Note

The order of the columns in the left list is not important, because it is not the order how they will be shown in the Job Management table.

⁠13.1.2.4. More Information

This dialog opens after you select Get more information in the context-menu of a job. It shows the properties and their corresponding values of the selected job in a table. There are two filters that may be applied: One is for hiding all properties with a value of 0, None, False or empty, the other one for hiding exotic properties which one is rarely interested in. Per default, both filters are enabled. To disable them, you have to check the corresponding entry at the bottom of the dialog.

Note

The column height of the table is editable in the Preferences Dialog.

⁠13.1.2.5. Activate Filter

If you created some custom filters, they will be listed here (For information about creating custom filters see Job Filter Editor). Select one or more filters to be applied to the current job table. All active filters are shown as a comma-separated list in the bottom-left corner of the Job Management window.

⁠13.1.2.6. Job Filter Editor

As mentioned before, in the Job Filter Editor dialog it is possible to create custom filters for the Job Management table. After it has been opened, a new filter may be created by clicking New Filter and then insert a name for the filter. After confirming with OK the filter is created and a new window comes up, where properties can be assigned to it. To add properties, right-click for the context-menu and select the property you want to filter with.

In the current example, we chose to filter by Job Id. A new dialog pops up. Now one can select a range of job ids to be displayed. Since a job id is always an integer, one has the option to select among the filter types between x and y, bigger than x and less than x.

Choose the filter type you want, set the values and confirm with OK. Consequently, the property is now part of the new filter. One can combine multiple properties in one custom filter. Each additional property narrows down the possible jobs to be displayed. After adding all desired properties, hit the Save button. Now the new filter can be applied in the Job Management window.

⁠13.1.2.7. Node State Management

The Node State Management dialog lists all hosts that are registered with Slurm. There are three different kind of views showing the existing hosts. The color of the LED in front of the hostname indicates the Slurm state a node is in. When hovering over a particular node, a tool-tip describing the state appears.

Partition View: This tree shows all Slurm partitions and their assigned compute nodes when uncollapsed. This can be used to act on all nodes found in one or more partitions.
Enclosure View: This tree has the same structure as the Enclosure View dialog. It is useful when acting on a group of nodes located in specific enclosures (e.g. to drain all nodes in a certain rack, because of a planned maintenance for that rack).
NodeState View: This tree shows all current node states in the cluster and their corresponding nodes when uncollapsed. It can be used to conveniently act on all nodes in a specific state (e.g. to undrain all previously drained nodes).

To manage one or more nodes, they have to be selected first. Use the preferred view and move the node(s) to the right list via drag&drop. One can also move a whole group of nodes, for example all nodes from a rack by dragging the name of the rack to the right tree. All nodes in this list are available for later actions. You

can also select multiple nodes for drag&drop or enter a hostlist in the Hostlist field (e.g. beo-[01-04]). The nodes will appear in the right list, if the hostlist is valid.

There are seven possible actions that may be applied to the selected nodes:

Drain: The node is currently executing a job, but will not be allocated additional jobs. The node state will be changed to state DRAINED when the last job on it completes.
Undrain: This will undrain all selected nodes.
Set to POWER SAVE: The nodes will be put into power save mode. Power management mode needs to be configured in the slurm config for this to work.
Power up: The nodes will be powered up. Power management mode needs to be configured in the slurm config for this to work.
Start Slurmd: This starts the Slurmd on the selected nodes.
Stop Slurmd: This stops the Slurmd on the selected nodes.
Restart Slurmd: This restarts the Slurmd on the selected nodes.

Once the desired nodes are selected, an action can be chosen and then executed by clicking the Execute button. In case the action was operating on the nodes slurmd, an RXEngine window comes up, in which one can track the success of the remote slurmd operation. To clear the complete list of selected nodes, one can click the Clear button. To remove only a subset of nodes, one can select them in the right list and remove them via the context-menu.

⁠13.1.2.8. Slurm Reservations

The Slurm Reservations window shows a table of all active reservations and their most important properties. Furthermore, it allows to manipulate the reservations and create new ones.

⁠Creating a new Reservation

To create a new reservation, click the Add reservation button. A new dialog pops up. The following parameters can be specified:

Name

Here a custom name can be specified for the reservation. If no custom name is given Slurm automatically creates one based on the first user or account name chosen for the reservation and a numeric suffix.

Account(s)

To create a reservation, one has to either select one or more accounts and/or one or more users who will be allowed to use it. Select one or more accounts by checking their entries in the pop-up. All users of the selected accounts may utilize the reservation.

User(s)

To create a reservation, one has to either select one or more accounts and/or one or more users who will be allowed to use it. Select one or more users by checking their entries in the pop-up. In case accounts are also set, the Select User dialog shows only the users belonging to the selected accounts.

Partition

The partition the reservation applies to.

Start Time

The start time of the reservation. The default value is now. By changing the Start Time, Duration or End Time all timing values will be recalculated.

Duration

The duration of the reservation. Set a count of days and/or hours and minutes. By changing the Start Time, Duration or End Time, all timing values will be recalculated.

End Time

The End Time of the reservation. By changing the Start Time, Duration or End Time, all timing values will be recalculated.

Nodes and Cores

One may either choose to set a Node Count and Core Count or a fixed Node List and Cores per Node. In the former case, Slurm will randomly select the nodes and cores for your reservation. By choosing the second variation one can explicitly select the nodes for the reservation and the number of cores from every node.

Node Count / Core Count: Number of nodes and cores to be reserved.
Node List / Cores per Node: Identify the node(s) to be reserved. For every node you can set the number of cores.

Flags

Flags associated with the reservation. The following flags can be set:

ANY_NODES: Use any compute nodes
DAILY: Set DAILY flag
FIRST_CORES: Use only first cores on each node
IGNORE_JOBS: Ignore running jobs
MAINT: Set MAINT flag
OVERLAP: Permit to overlap others
PART_NODES: Use partition nodes only
STATIC_ALLOC: Static node allocation
TIME_FLOAT: Time offset is relative
WEEKLY: Set WEEKLY flag

Confirm by clicking the Add reservation button.

⁠Updating a Reservation

To update a reservation one just has to select it and open its context-menu. Choose Update Reservation. A window pops up with all the properties set to the values of the existing reservation. To modify the reservation just make the desired changes and click the Update Reservation button.

Note

Not all properties are changeable. To edit the Start Time of a reservation, the current and the new Start Time have to be in the future. In case a value for Nodes per Core was set, the reservation will not be updateable anymore.

⁠Deleting a Reservation

To delete a reservation one just has to choose Delete Reservation from its context-menu .

⁠13.1.2.9. Slurm Accounting

⁠13.1.2.9.1. Manage Slurm Accounts

To open the Manage Slurm Accounts dialog go Components-> Slurm->Manage->Accounting->Manage Accounts. There will be a tab for every cluster known to the Slurm accounting database. Each tab contains a tree with the accounts and users that are registered in the corresponding

Slurm instance. To better distinguish between accounts and users, they are identified by pre-defined color codes (See Preferences Dialog for changing the corresponding colors). At the bottom of the dialog you can see a legend for the color codes.

⁠Adding an Account

Clicking the Add Account button will open a new dialog. Here you have to specify a name for the new account. Optionally, you can also specify a parent account and a description. If an account had been selected before, the Add Account button was clicked, this account will be pre-filled as the parent account. When you are finished, confirm with the OK button.

Note

Account names have to be unique!

⁠Deleting an Account

Before being able to delete an account, it has to be assured, that the account contains no more users (See below to learn how to remove users from an account). Optionally, one can remove users from an account in the Manage Slurm Users dialog.

After all users are removed from the account, one can delete it via its context-menu by selecting Delete Account.

⁠Deleting a user from an Account

To delete a user from an account use its context-menu and select Delete User.

Note

You can't remove a user from his default account. First change the default account of the user and then delete the old one.

⁠Show Account/User Properties

To show the properties of an account or user bring up its context-menu and select Show Account Properties or Show User Properties depending on what was selected. Two filters are available in this dialog: One for hiding all properties with a value of 0 or empty and one for hiding exotic properties which are not of interest in most cases. By default, both filters are enabled. To disable them, their corresponding entry has to be checked at the bottom of the dialog.

⁠13.1.2.9.2. Manage Slurm Users

The Manage Users dialog allows to assign accounts to a user, set and change a user's default account, register new users and delete users. When a user is selected, the accounts he is a member of are checked in the Accounts list displayed at the right. His default account is highlighted with the specific color set for default accounts in the Preferences Dialog. By default, system users are hidden. To show them, just check the Show system users (UID < 1000) checkbox.

⁠Registering a User with Slurm

To register a user with Slurm, uncollapse the Unregistered Users and select the desired user. Every user needs a default account, so this has to be defined first. To do so, select Set as Default Account in the context-menu of the account you want to be the default. By doing this, the user will be registered with this default account. If you just select some accounts for an unregistered user by checking them and then pressing the

Create button, the user will be registered with a default account set randomly among the chosen ones.

⁠Deleting a User

To delete a user, bring up his context-menu and select Remove User.

Note

Be sure that the user has no active jobs.

⁠Assigning a User to Accounts

Selecting a registered Slurm user displays the accounts he is member of in the Accounts list to the right. To add/remove him to/from an account (un)check it and hit the Activate Changes button.

⁠Changing the Default Account of a User

To change the default account of a user, select him in the Registered Slurm Users tree and bring up the context-menu of the account you want to set as the new default. Then select Set as Default Account.

⁠13.1.2.9.3. Cluster Usage

The Cluster Usage display uses the Slurm utility sreport to generate reports of job usage and cluster utilization. For detailed information about the type of reports and options read the sreport manpage. Select your report type (for example cluster) in the left combo box and then the report options from the combobox right to it. Per

default the time period used for the report is the past day. You can change this by modifying the start and the end time. The colors used in the window are customizable in the Preferences Dialog.

Note

sreport will only be able to show utilization data if Slurm Accounting is activated. This is the default on Qlustar clusters.

⁠13.1.2.9.4. Fair Share

The Fair Share view uses the Slurm utility sshare to display Slurm fair-share information. We provide two versions of views, a basic and a long one. The long version shows additional information that is needed less often. By default we show the basic view, but you can easily switch by checking the long checkbox at the bottom right of the window.

⁠Account View

The Account View shows the Slurm fair-share information for all registered Slurm accounts. The used colors are customizable in the Preferences Dialog.

⁠Detailed Account View

The Detailed Account View shows the Slurm fair-share information for all registered Slurm accounts including the information for individual users that are member of the accounts. The used colors are customizable in the Preferences Dialog.

For more information about sshare and the meaning of the displayed quantities, read the sshare manpage.

Note

sshare will only be able to show fair-share data if the fair-share option is activated in the Slurm config. This is the default on Qlustar clusters.

⁠13.1.2.9.5. Job Priorities

The Job Priorities dialog uses the Slurm utility sprio to display the values of the individual factors that are used to calculate a job's scheduling priority when the multi-factor priority plugin is installed. This is information needed, when analyzing why certain pending jobs run earlier than others.

We provide two versions of the view, a basic and a long one. The long version shows additional information that is needed less often. By default we show the basic view, but you can easily switch by checking the long checkbox at the bottom right of the window. For more information about sprio read the sprio manpage.

⁠13.1.2.9.6. QluMan Slurm Settings

The QluMan Slurm Settings dialog allows to customize the update intervals for information about jobs, nodes, partitions and Slurm accounting. This information flow is provided by the QluMan Slurm daemon running on the cluster and the QluMan GUI automatically subscribes to it. Shorter update intervals mean more server load and more network traffic. In most cases, the default values should be adequate.

Note

Whenever you modify some property/value in the QluMan GUI for example for a job the GUI will always get an immediate update for that. The update intervals only affect changes that are not the consequence of an explicit action by a QluMan user.

⁠Chapter 14. Customizing the Look&Feel

⁠14.1. Overview

There are a number aspects of QluMan's appearance that can be customized: Specific component dependent customization is possible as well as choosing general fonts, colors and the widget style.

⁠14.2. QluMan Preferences

In the QluMan Preferences dialog, one is able to customize specific parts of the QluMan GUI Look&Feel. The tree on the right shows all the settings available for customization. Each QluMan component may have its specific settings, so the options available there depend on the components installed on a particular cluster.

To change a setting, select the component to be customized, e.g. Slurm->Accounting->Colors. In this example, one can set the colors that are used to indicate Slurm accounts, users, users in their default accounts and the root user. To change a color, select the property in question and hit the Edit button. A color-picker dialog will then come up. Select the new color and click OK. Among others, one is also able to customize the column height of the Job Management and More Information tables here.

⁠14.3. Customizing general Properties

Since QluMan is a QT application, it's general Look&Feel can be controlled with KDE tools. Select the Manage Cluster->Preferences menu entry to bring up the KDE System Settings dialog. Now click on the Application Appearance icon and you'll have the options to modify fonts, colors and style.

⁠14.3.1. Customizing general Fonts

When you click on the Fonts icon, you'll see a list of different font identifiers, for which you can change the font settings. The relevant identifiers affecting QluMan are: General, Menu and Window Title. Changing one of the values and clicking the Apply button changes the corresponding font on the fly.

⁠14.3.2. Customizing general Colors

Click on the Colors icon and choose the Colors tab. There you can adjust the color of the different elements of the QluMan GUI. You can narrow down the color identifiers to the ones affecting particular GUI elements, by choosing a specific color set with the corresponding pull-down menu. Changing one of the values and clicking the Apply button, changes the corresponding color on the fly.

⁠14.3.3. Cloning KDE Settings

If you're using KDE4 on you're desktop, instead of configuring using the System Settings dialog, you can also move /root/.kde/share/config to /root/.kde/share/config.bak and copy your personal configured .kde/share/config directory to /root/.kde/share. As long as you're not using any non-standard KDE themes, this should just apply the favorite desktop settings you're familiar with to QluMan, when running it on a remote machine like the cluster head- or FE-node (restart of QluMan GUI required).

⁠14.3.4. Customizing the Widget Style

Changing the widget style can be a little more involved. First you need to start the QT configurator qtconfig and choose a GUI style (default is QtCurve). The following assumes, you're running qluman-qt on the head- or FE-node. In case you have it installed on your workstation, just execute qtconfig there.

	  0 user@workstation ~ $
	  ssh -X root@servername qtconfig

When you're done, select File->Save and you'll already see the changes. After this, you can exit qtconfig. If you want further customization of the widget style (note that only some styles are configurable, among them QtCurve), you can now go back to the Application Appearance dialog (see above), click on the Style icon, choose the style you've selected in qtconfig as Widget style and press the Configure... button. You'll then see a large number of options for

customization. When you're satisfied with your modifications, press the OK button and finally the Apply button of the Style - System Settings window. Note, that you will see the resulting changes only after performing some actions (pressing a button, etc.) in the QluMan GUI.

For additional widget style variants apart from the default of QtCurve, you can install additional kde-style packages (.e.g kde-style-oxygen) on the machine, where you're executing the QluMan GUI.

⁠Appendix A. Revision History

Revision History

Revision 3.0.0-0

April 18 2017

Qlustar Doc Team

Updates for QluMan 3.0.0 / Qlustar 9.2.0 /

Revision 2.1.1-0

July 22 2016

Qlustar Doc Team

Updates for QluMan 2.1.1 / Qlustar 9.1.1 /

Revision 2.1-0

Aug 27 2015

Qlustar Doc Team

Initial version 2.1.0

⁠Index

F

feedback

contact information for Qlustar, Feedback requested