Adding/Configuring Hosts

Adding Hosts

The New Hosts Window

To add new hosts to the cluster you can either select New Hosts from the context menu in the Enclosure View tree or from the Manage Hosts menu. This opens the 'New Hosts' window.

Adding a new host requires the specification of an IP address, hostname and MAC in the corresponding three text fields of the dialog. The entered values are checked for their validity. If one of them is not valid, the check-box to its right remains cleared. The tool-tip of the check-box will then show, why it is invalid. If all the values are valid, all check-boxes will show a solid check and the Add Host button will become selectable.

Lacking a boot network definition

The validity of the IP is determined by 2 factors:

  1. The IP must not be used by another host.

  2. It must be part of the boot network to be used for the host.

When the New Hosts window is first opened, the initial config for a new host is empty and only the properties and configs from the Global Template will affect it. Selecting a template or copy from Host for the initial configuration will add additional configuration settings for it.

The chosen initial configuration must include a Network Config. Otherwise no IP will be valid and no host can be added.

For convenience and if it makes sense, the IP address and the numeric part of the hostname (if there is one) will automatically be incremented by one, after a host was added. So in most cases, these fields will not have to be changed manually to add the next host. Only the new MAC will need to be entered.

Table of unknown MACs

To help adding new hosts, qlumand scans the DHCP log file for unknown hosts that have requested an IP address. For each unknown host found in the logs, the table at the top of the window shows the time of the first and last appearance in the log, its MAC address as well as the hardware vendor this MAC is assigned too (if known). Selecting a MAC in the table copies it into the MAC text field at the bottom and a double-click adds the host with the selected MAC. One can also select multiple lines (by holding the Ctrl key and clicking or holding the Shift key and dragging the mouse) and then click the Add Selected button at the bottom to add them all using the auto-increment feature for the IP address and hostname. If unsure, try adding a single host first and check the auto-increment does the right thing before adding a group of hosts.

Selecting a template for new hosts

Select host to copy settings from

Copy settings to new hosts

One easy way to add groups of hosts is to power them on one at a time with a short delay (say 30 seconds). The hosts will then appear in the Unknown MACs table in the order they were powered on and can be added as a group with the click of a single button.

Import MACs from file

Import MACs file dialog

Imported MACs

Another option is to import a list of mac addresses from a file by clicking Import MACs. Network switches with a management interface often have an option to list the MAC addresses for each port, so you could capture this list and save it in a file. The file might need some editing to conform to the syntax qluman-qt expects, which is as follows: Lines starting with an '#' and empty lines are treated as comments. Everything else must start with a MAC address in the standard hexadecimal notation using ':' as separator. Any text following the MAC address is displayed in the comment column after importing. Example (see also the corresponding screenshot):

# Rack 1
02:00:00:00:01:01 # node 1
02:00:00:00:01:02 # node 2
# Rack 2

Failure to import MACs

In case the file can not be parsed an error is shown with the line number at which parsing failed. Otherwise the MAC addresses will be shown in place of the unassigned MAC addresses detected by the DHCP server. Adding single hosts or groups of host from the list works the same way as with the detected MACs as described above. Clicking the Clear MACs button clears the imported MACs and returns to the list of MACs detected by the DHCP server.

At the bottom of the window a Host Template can be selected that will be used as the default for new hosts. Most of the time, no additional configuration is needed for a new host. As an alternative way to make settings for the new hosts, one can select an existing properly configured host and choose to copy its settings to the new ones.

Configuring Hosts

Four Levels of configuration

The configuration of a host results from the assignment of different types of properties and Config Classes to it. A property is always a key + value pair. They are further divided into Generic Properties and Hardware Properties. Generic/hardware properties and config classes can be individually assigned to a host. This is the configuration level with highest priority, meaning that such an assignment will always be effective.

They can also be used to define Generic Property Sets, Hardware Property Sets and Config Sets. This is simply a means of grouping them together, so they can be used as a single entity. These sets may also be individually assigned to a host. This is the configuration level with second highest priority.

The third level of configuration are Host Templates. A Host Template consists of exactly one Config Set, one Generic Property Set and one Hardware Property Set.

The fourth and most generic level of configuration is the Global Template. It applies to all hosts in the cluster and consists of the Global Generic/Hardware Property and Config Set. In principle, the latter are just like any other set, with the one difference that they always apply to all hosts. This is useful when defining a base configuration for a cluster.

Overwritten properties/configs.

If a generic/hardware property or config is defined and assigned to a host in a particular hierarchy level, it overwrites the corresponding assignment(s)/value(s) from all lower priority levels. This introduces a lot of flexibility while retaining consistency.

For example it allows setting a property in the Global Template that is right for most hosts and then replace it in a particular Host Template being used for a few exceptional hosts. The screen-shot shows two examples of this strategy: The assigned Host Template of the chosen host replaces the configs for Net Config, Slurm Partition and Slurm Node Group of the Global Template (red box) and the number of CPU cores is directly assigned replacing the value from the Host Template (green box).

The tree representation of a host’s properties/configs in the Enclosure View clearly indicates overwritten assignments by displaying them in a light gray.

Generic and Hardware Properties

Hardware Properties

Hardware Properties are used to describe the hardware of a host. Among others, hardware properties like the amount of RAM or number of CPU cores are used to configure the Slurm workload manager, so jobs can be assigned to the desired hosts. Others, like e.g. the HW type, are purely informational and might be used for inventory management.

Hosts with no/wrong configuration Detected Hardware Properties

Hardware Properties usually don’t have to be entered manually. When a host boots, its basic hardware properties are detected and reported to the head-node. They are shown as a tool-tip in the Enclosure View when hovering over a host. Hosts that have none or only partially configured Hardware Properties will show a blue wrench, while hosts for which the configured Hardware Properties do not match the detected ones will show a red wrench. The Hardware Wizard uses the reported values to generate the correct configuration for a set of hosts and is the best way to create their initial configuration. It can also be used to correct the configuration of hosts in case their hardware was changed.

Details of Hosts with wrong configuration Details of Hosts with wrong configuration

Selecting a node in the Enclosure View tree displays its configuration info on the right hand side. This includes the red or blue wrench if there is a problem with the configuration of Hardware Properties or GPUs. In the latter case, a few more details about the GPUs are included. Individual Hardware Properties that don’t match will be shown in red in the Properties and Configs tree and their tool-tip will show the detected value.

Problems Report

Problems Report

Using a host entry’s tool-tip or selecting the host to get further details about a configuration problem works only for one host at a time. With many misconfigured hosts, an easier way to get an overview of problems is by opening the Problems Report from the Manage Cluster menu. This will show the hardware and GPU configuration problems of all nodes in a list, displaying both the detected and configured values for each problematic GPU or hardware property.

At the bottom of the dialog an overall host-count of the problem is shown together with two check-boxes that allow to toggle the display of problems with Hardware Properties or GPUs selectively.

Generic Properties

A property that is not hardware related is called generic. Generic Properties can be configuration options, like OpenSM Host, or purely informational, like Paid by. While hardware properties are meant to be more rigid, typically with a configurable set of fixed values, generic properties are more flexible and can be defined at will. Generic Properties are also not necessarily unique, making it possible to assign multiple values for a single generic property. This is useful e.g. to put hosts in multiple groups for dsh/pdsh (via the 'Host tag').

Editing Properties

Property Editor

Creating a new Property

Generic/Hardware properties and their values are managed by the corresponding Property Editor. It is reachable from the Generic/Hardware Property Sets windows. A new Generic/Hardware property can be created by clicking the New button. Each property has a name and a description. The name must be unique and the Ok button will only be enabled if that is the case. The description is optional and will be shown as a tool-tip when hovering over the property in other windows.

Select Property

Add property value

Once a property exists, values can be added to it. To accomplish this, the property must first be selected from the drop-down menu at the top of the Property Editor. Then values may be added or deleted using the context-menu.

Editing name of property

With the exception of essential pre-defined properties, a property can be edited by clicking the Edit button. Examples of properties that can not be edited (or deleted) are the ones used to generate the slurm.conf. Changing their name would lead to errors in the generated file. Similarly, some property values can not be deleted and will be shown grayed out.

Property/Config Sets

Hardware Property Sets

Generic Property Sets

Config Sets

Generic/hardware sets and config sets simplify the configuration of hosts substantially. They are a means of grouping generic/hardware properties or configs, so they can be used as a single entity and be assigned to individual hosts directly or via a Host Template. There is also a global set of each type which is always assigned to every host. The generic/hardware and config sets can be managed by opening the corresponding window from the Manage Sets menu.

Hardware/Generic Property Sets

New Property Set

New Property Set dialog

Property sets are shown in tree form with the name of each set as the top level item and the key/value pairs of assigned properties as children. The first entry is always the global property set. A new property set can be created by clicking the New button or from the context menu. An existing property set can be renamed or the description changed by selecting Edit Property Set from the context menu. Deletion of an unused property set is also possible via the context menu.

Add Property to Set

New properties can be added to a property set by opening the Add Property sub-menu in the context menu: Select the desired property and its value. Only properties and values that don’t conflict with already included properties or values are shown in the sub-menu. Already included properties may be changed in two ways: Either a different value is selected from the drop-down menu or the value is edited directly, in which case the change must be confirmed by pressing Enter. If the entered value for the property doesn’t exist yet, a new value is automatically created. Another way to create new values and the only way to delete them is through the property editor by clicking Edit Properties.

Config Sets

Add Config Class to Set

Config sets are managed the same way as property sets. They are displayed as a tree with the global set at the top and the key/value pairs of assigned config classes as children. Creating a new config set can be done by clicking the New button or from the context menu just like for property sets. Adding and removing config classes to/from a set also works the same way.

Unlike property sets though, the list of usable config classes is fixed and values can not be edited directly from this window. Config classes require a more complex and individual config. So for each config class there is a menu item in the Manage Configs menu. Depending on the type of config class, one or more values for that class can be assigned to a Config Set (slurm partitions are an example where multiple values are assignable).

The sub-menu only shows config classes and values that can be added without conflicting with already assigned entries.

Assigning a configuration to hosts

As mentioned in the beginning of this chapter there are 4 levels of configuration. Going from lowest to highest priority these are: Global Template, Host Template, directly assigned sets and directly assigned properties and configs.

Global Template

The Global Template is the most generic way to assign configurations to hosts. It applies to all hosts in the cluster and consists of the Global Generic/Hardware Property and Config Set.

Host Templates

Selecting a Host Template

Selecting a Host Template for multiple hosts

The next level of configuration is the Host Template. When a correct Host Template exists, a host can be configured by selecting the desired template in the Enclosure View window. For a single host, this can be done by selecting it in the tree view. This brings up the host information on the right and a template can be selected from the drop-down menu. To configure multiple hosts, you would select them in the tree view and choose a Host Template from the context menu. The check-marks in the sub-menu indicate which Host Templates are currently assigned (if any) for the selected nodes. This action will override the previous assignment for all selected hosts.

Alternatively, especially when no correct Host Template exists yet, the Hardware Wizard can be used to to create a new or modify an existing Host Template and assign it to hosts.

Directly assigned properties, configs and sets

Selecting a Generic Property

Generic/hardware properties, configs and their corresponding sets can also be individually assigned to a host. Such assigned properties take precedence over ones of the same type selected through the Host or Global Template. This is useful when a particular (or a few) node(s) require a special property/config (set) while everything else should be set the same as for other hosts with the same template.

By default, every new host has the generic property Schedule Format: always, which is required to format the disk on the first boot. This property should be removed (or changed to never) after the first successful boot of the host, so that log files will be preserved across boots in the future.

Host Overview

Selecting a Network The Host Overview dialog

The Host Overview window provides a tabular overview of all the hosts in a cluster split by networks. Tabs at the top allow to select the network one is interested in. In each tab, the IP ranges allocated in that network are shown at the top and a list of hosts with their most important config parameters below.

The Allocated IPs box at the top always shows all used IPs in the network regardless of the active filter (see below). The addresses are shown as start and end IP for each continuous range of allocated IPs. When planning to add a new set of hosts this can be helpful to find unused IPs in a range and then using them to obtain a more contiguous block of addresses.

At the bottom is the list of hosts with a NIC in the selected network. For each host, its QluMan name, hostname (if overwritten), IP, MAC, CPU Model, size of RAM, GPU (if any), Boot Config and Host Template are shown. Empty columns are omitted, e.g. the Hostname column in the Boot network.

Filtering hosts Sorting by a different column

Since the list of hosts for a network can be long, there are two ways to help navigate it: a) The list can be sorted by any column simply by clicking at the respective column header. b) More powerful, an on-the-fly filter can be applied to the lists. By entering a string or Python regular expression into the Filter text-box, the lists will be reduced to lines that contain the entered string or match the regex.

Clicking on the header of a new column will retain the previous sorting of rows for which the entries in the newly selected column are identical. This way a list can be sorted by multiple columns. For example selecting the CPU Model column followed by the Host Template column will show hosts sorted by the Host Template column and hosts with the same Host Template will remain sorted by CPU Model.