4. Scheduling Processes¶
Objective¶
The Scheduler¶
The Slurm Workload Manager, or simply Slurm, allocates resources to users for a duration of time, provides a framework for starting, executing, and monitoring work, and manages a queue of pending jobs. Slurm is essentially the de-facto job scheduler for Linux and is used by most of the world’s supercomputers, this guide will show you how to install it on your own cluster.
Head Node Installation¶
Install MariaDB¶
Resources: mariadb
While Slurm does work without MariaDB, it’s fairly common to set it up to use MariaDB as it’s useful for archiving account records and easily accessing these records.
rpm --install --verbose /apps/pkgs/mariadb-server/*.rpm
Note: dnf may complain about failures the first time you run this - running it a second time will usually be successful.
Install Slurm¶
The head node will be responsible for accepting jobs from users, scheduling jobs on the cluster, and keeping a record of all jobs that ran. Packages installed are slurm, slurmdbd, and slurmctld and their dependencies.
rpm --install --verbose /apps/pkgs/slurm-head/*.rpm
Head Node Configuration¶
Configure Munge¶
I gotta be honest. I pulled a sneaky on ya’. when you installed the slurm packages, I snuck munge in as well. Munge is a cryptographic authentication suite that uses a “key” and the current time. we need to create a key before slurm will start. thankfully the folks who make munge were kind enough to include a script. as root, run the following
create-munge-key
Configure Slurm¶
Resources: slurmd, Slurm control group, slurm.conf
Now edit the config to reflect your configuration (only changed and added lines are shown):
## General
SlurmctldHost=boxocluster-node-1
MpiDefault=pmi2
# ^- this will need to be updated to pmix
ProctrackType=proctrack/linuxproc
## Logging and Accounting
AccountingStorageHost=boxocluster-node-1
AccountingStoragePort=6819
AccountingStorageType=accounting_storage/slurmdbd
ClusterName=<your_cluster_name>
JobAcctGatherType=jobacct_gather/linux
## Compute Nodes
NodeName=boxocluster-node-[1-4] CPUs=4 Sockets=1 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
PartitionName=<partition_name> Nodes=boxocluster-node-[1-4] Default=YES MaxTime=INFINITE State=UP
We won’t go into too much detail about what all the options mean just yet. The goal is to get the cluster working. You may chose what you want to call your cluster and what the default partition is called.
Finally, we need to create the /etc/slurm/slurmdbd.conf file for the Slurm database:
AuthType=auth/munge
DbdHost=boxocluster-node-1
DbdPort=6819
SlurmUser=slurm
DebugLevel=verbose
LogFile=/var/log/slurmdbd.log
PidFile=/var/run/slurmdbd.pid
StorageType=accounting_storage/mysql
StoragePass=<slurmdb_password>
StorageUser=slurm
StorageLoc=slurm_acct_db
You can set the StoragePass password to be anything you want. Just remember what this is. Now, create the slurm user:group and change the permissions of slurmdbd.conf to read/writeable only by the slurm user:
groupadd -g 990 slurm
useradd -u 990 -g 990 slurm
chown slurm:slurm /etc/slurm/slurmdbd.conf
chmod 600 /etc/slurm/slurmdbd.conf
Now copy the slurm config to where all of the nodes can see it:
cp /etc/slurm/slurm.conf /shared/slurm.conf
cp /etc/munge/munge.key /shared/munge.key
Compute Node Installation¶
Install Slurm¶
Use the following command to enter the compute node container chroot with the right binds:
sudo su
wwctl container exec --bind /shared:/shared base-rocky9 /bin/bash
Inside the Warewulf contianer chroot, install the required packages.
rpm --install --verbose --force /apps/pkgs/slurm-compute/*.rpm
Now, move the slurm config files and the munge key to their respective places:
cp /shared/slurm.conf /etc/slurm/slurm.conf
cp /shared/munge.key /etc/munge/munge.key
chown munge:munge /etc/munge/munge.key
chmod 600 /etc/munge/munge.key
Enable the services so that they start on boot:
systemctl enable slurmd
And finally, exit and rebuild the contianer with exit
Compute Node Configuration¶
Copy Munge Key to Nodes¶
Resources: munge
To provide the necessary authentication between the head node and compute nodes, all nodes will need the same munge.key. Copy the files to the nodes and restart munge on all the nodes.
Setup Slurm Database¶
First, log into mysql as root:
mysql
The prompt should now show MariaDB [(none)]> . Create the slurm_acct_db:
create database slurm_acct_db;
Confirm that it was created:
show databases;
Now create the slurm mysql user and set the password (use the one you used to configure /etc/slurm/slurmdbd.conf):
create user 'slurm'@'localhost';
set password for 'slurm'@'localhost' = password('<yourpassword>');
Grant this user privileges for slurm_acct_db:
grant all privileges on slurm_acct_db.* to 'slurm'@'localhost';
You can exit out using exit (semi-colon is not needed). Now, check to see if the slurm user is able to log in and see the database.
mysql -u slurm -p
Type in your password, and if you able to get the MariaDB [(none)]> prompt, then show databases again.
show databases;
Your output should be:
+--------------------+
| Database |
+--------------------+
| information_schema |
| slurm_acct_db |
+--------------------+
2 rows in set (0.002 sec)
And finally:
exit;
Start Slurm¶
On the Head Node¶
If everything is good, then the following should work.
systemctl enable slurmdbd slurmctld
systemctl start slurmdbd slurmctld
If you encounter errors, you can look at systemctl status [service], where [service] is either slurmd, slurmctld, slurmdbd. Additionally, there should be logs under /var/log/slurm.
Finally, setup accounting and create the cluster within sacctmgr. It may already be created and give you an error. This is fine.
sacctmgr -i add cluster <your cluster name>
On the Compute Nodes¶
Resources: srun
Reboot the nodes with the updated image. Slurm should start automatically. In the case that they need to be nanully restarted, run the following:
sudo pdsh -g nodes systemctl start slurmd
If all is good, the output of sinfo -Nl should look like the following:
NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON
boxocluster-node-1 1 boxocluster* idle 4 1:4:1 3794 0 1 (null) none
boxocluster-node-2 1 boxocluster* idle 4 1:4:1 3794 0 1 (null) none
boxocluster-node-3 1 boxocluster* idle 4 1:4:1 3794 0 1 (null) none
boxocluster-node-4 1 boxocluster* idle 4 1:4:1 3794 0 1 (null) none
Finally, you should be able to run commands on the compute ndoes without using pdsh:
srun --nodes=4 hostname
Output should look like this:
boxocluster-node-2
boxocluster-node-1
boxocluster-node-3
boxocluster-node-4
Note
The Hostnames will likely appear out of order due to being run in parallel