[HOW TO] install Slurm Worload Manager in Zorin OS Core 16

Hi,

Hi, You can find these instructions also on the Ubuntu forum, but as I recently switched from Ubuntu to Zorin OS it turns out that these instructions also work for Zorin OS 16, so I am posting these (slightly updated) instructions here for reference.

Install munge and slurm

$ sudo apt install munge slurm-wlm

Open /usr/share/doc/slurmctld/slurm-wlm-configurator.easy.html in a browser and generate the configuration file in the browser.
I am using just one node, so I used the host name for the SlurmctldHost, the NodeName and the ClusterName. Use Pgid for Process Tracking, Cgroup requires more configuration. I used Cons_res and CR_Core for Resource selection and None for Task Launch.
The unit for RealMemory seems to be MB, so use 65536 for example if the node has 64GB.
However, my queue got stuck in status Draining due to Low Real Memory at my first attempt, so I did not specify RealMemory on my second attempt.

$ vi /etc/slurm-llnl/slurm.conf and copy/paste the configuration file from the browser.

A copy of my generated configuration file for reference

# slurm.conf file generated by configurator easy.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
SlurmctldHost=<YOUR-HOST-NAME>
#
#MailProg=/bin/mail
MpiDefault=none
#MpiParams=ports=#-#
ProctrackType=proctrack/pgid
ReturnToService=1
SlurmctldPidFile=/run/slurmctld.pid
#SlurmctldPort=6817
SlurmdPidFile=/run/slurmd.pid
#SlurmdPort=6818
SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd
SlurmUser=slurm
#SlurmdUser=root
StateSaveLocation=/var/lib/slurm-llnl/slurmctld
SwitchType=switch/none
TaskPlugin=task/none
#
#
# TIMERS
#KillWait=30
#MinJobAge=300
#SlurmctldTimeout=120
#SlurmdTimeout=300
#
#
# SCHEDULING
FastSchedule=1
SchedulerType=sched/builtin
SelectType=select/cons_res
SelectTypeParameters=CR_Core
#
#
# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/none
ClusterName=<YOUR-HOST-NAME>
#JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
#SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
#SlurmdDebug=info
SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
#
#
# COMPUTE NODES
NodeName=<YOUR-HOST-NAME> CPUs=16 Sockets=1 CoresPerSocket=16 ThreadsPerCore=1 State=UNKNOWN
PartitionName=long Nodes=<YOUR-HOST-NAME> Default=YES MaxTime=INFINITE State=UP

Enable and start the manager slurmctld and the agent slurmd

$ sudo systemctl enable slurmctld
$ sudo systemctl start slurmctld
$ sudo systemctl enable slurmd
$ sudo systemctl start slurmd
# Check the status of the manager and the agent:
$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
long*        up   infinite      1   idle amd1950x
$ scontrol show node
NodeName=<YOUR-HOST_NAME> Arch=x86_64 CoresPerSocket=16
..

Submit a job

# Create a shell script and make it executable:
$ vi submit.sh
#!/bin/bash
sleep 30
env
$ chmod +x submit.sh
# Submit the shell script:
$ sbatch submit.sh
# Check that the job is running
$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                 3      long submit.s    <YOUR-USER-NAME>  R       0:04      1 <YOUR-HOST-NAME>
# Check the output of the job after 30 seconds
$ cat slurm-<JOBID>.out

Regards,
GW

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.