NixOS configuration for HPC cluster https://docs.hpc.informatik.hs-fulda.de/
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 

3.0 KiB

Infrastructure Deployment

The whole cluster infrastructure is build using NixOS. The configuration repository is hosted at {{ config.repo_url }} and is deployed using colmena.

Building the configuration

To build the configuration, as system with Nix installed is required.

To activate the environment, run nix develop inside the configuration folder. This will fetch all required build dependecies and makes them available in the environment.

Building the whole configuration is as easy as running:

colmana build --verbose --show-trace

Go grap a coffee, this can take a while

Deploying

Note: Deployment requires SSH access as the root user to all machines.

To deploy a configuration change or updates to the cluster, run the following command:

colmena apply switch

Using the manager as a SSH jump host

SSH access to the nodes is limited. Therefore it the manager system can be used as a jump host. To do so, add the following lines to your local ~/.ssh/config file (before the the Host * entry):

Host 10.32.47.1??
  IdentitiesOnly yes
  ProxyJump root@10.32.47.10

Updating

Updating all systems can be done by running the following command in the configuration repository:

nix flake update

This will update all dependencies including the NixOS operation system.

After doing the update, the changed config (with the updated dependencies) must be deployed.

Gather node information

The configuration repository relies on some information gathered from the machines itself. After bootstrapping a machine, these information need to be gathered from the machines into the configuration repository.

To gather there data, run the following command:

./gather.sh

Secret management

The config repository contains several secrets which are secured by sops and the according Nix integration.

To edit a config file, run the following command:

sops <path/to/secrets/file>

This requires the editor to have its PGP-key fingerprint be part of the adminKeys list in sops.nix.

Altering the list requires one of the previous members to update the keys.

Update keys

Whenever a key, either the SSH key of a machine or the PGP key of an administrator, changes, the secret files need updating. To do so, run the following command:

find \( -name "secrets.yaml" -or -path "*/secrets/**" -type f \) -exec sops updatekeys {} \;

Bootstrapping a node

Compute nodes can be bootstrapped using PXE boot. The manager will provide a touchless boot image which will install the node with the current deployment automatically. Booting the node from PXE (network boot) is enough to activate the bootstrapping process.

After bootstrapping a node, make sure to gather the node data and update the secret keys.