Job management with SLURM

In order to submit jobs on the cluster, you must describe the resources (cores, time) used to the Slurm task scheduler. Slurm will execute jobs on remote compute node(s) as soon as the resources described will be available.

Important

You should not run compute code on login (frontend) server cholesy-login. This is not suited for computations.

There are 2 ways to run a compute code on Cholesky :

  • using a interactive SLURM job : open a terminal on a compute node where you can execute your code. This method is well-suited for light tests and environment configuration (especially for GPU accelerated codes). See the section Interactive jobs.

  • using a Slurm script : submit your script to the scheduler, which will run it when the resources are available. This method is well-suited for production runs.

Slurm is configured with a fairshare policy among the users, which means that the more resources you have asked for in the past days and the lower your priority will be for your jobs if the task manager has several jobs to handle at the same time.

SLURM script

Using submission script is the typical way of creating jobs. In a slurm script, you have to describe :

* this resources must be mentioned.

The batch environment is set by loading modules (see Module command) and setting the proper bash variables (PATH, OMP_NUM_THREAD, etc.).

SLURM partitions

SLURM directives

You describe the resources in the submission script, using sbatch instructions (scripts lines beginning with #SBATCH). These options can be used directly with the sbatch command, or listed in a script.

Important

The #SBATCH directives must appear at the top of the submission file, before any other line except for the very first line which should be the shebang (e.g. #!/bin/bash). See SLURM examples

SBATCH directives to define resources

partition

You must specify partition name :

#SBATCH --partition=<PartitionName>

With PartitionName in partition names list

nodes

Number of nodes:

#SBATCH --nodes=<nnodes>

ntasks

Number of tasks (MPI processes):

#SBATCH --ntasks=<ntasks>

ntasks-per-node

Number of tasks (MPI processes) per node:

#SBATCH --ntasks-per-node=<ntpn>

cpu-per-tasks

Number of threads per task (Ex: Openmp threads per MPI proces):

#SBATCH --cpus-per-task=<ntpt>

gres=gpu

Number of gpus:

#SBATCH --gres=gpu:<ngpus>

mem

Memory per node:

#SBATCH --mem=<memory>

Default memory is 4 GB per core.

time

You must specify the walltime for your job. if your job is still running after the walltime duration, your job will be killed:

#SBATCH --time=<hh:mm:ss>

account

You must scpecify the account (or project) name for your job.

#SBATCH --account=<name>

SBATCH additional directives

job-name

Specify the job's name:

#SBATCH --job-name=<jobName>

output

Specify the standard output (stdout) for your job:

#SBATCH --output=<outputFile>

By default, a slurm-.out file is created which jobid is a unique identifier used by Slurm.

error

Specify the error output (stderr) for your job:

#SBATCH --error=<errorFile>

By default, a slurm-.err file is created which jobid is a unique identifier used by Slurm.

mail-user

Set an email address:

#SBATCH --mail-user=<emailAddress>

mail-type

To be notify by mail when a step has been reached :

#SBATCH --mail-type=<arguments>

Arguments for --mail-type option are :

  • BEGIN : send an email when the job starts
  • END : send an email when the job stops
  • FAIL : send an email if the job fails
  • ALL : equivalent to BEGIN, END, FAIL.

export

Export user environment variables from environment to batch environment :

  • By default all user environment variables will be loaded (--export=ALL).
  • To avoid dependencies and inconsistencies between submission environment and batch execution environment, disabling this functionality is highly recommended. In order to not export environment variables present at job submission time to the job's environment :

    #SBATCH --export=NONE
    
  • To select explicitly exported variables from the caller's environment to the job environment :

    #SBATCH --export=VAR1,VAR2
    

Submit and monitor jobs

submit job

You have to submit your script (ex. slurm_job.sh) with sbatch command:

$ sbatch slurm_job.sh 
Submitted batch job 755

which responds with the jobid attributed to the job. For example here, jobid is 755. The jobid is a unique identifier that is used by many Slurm commands.

monitor job

The squeue command shows the list of jobs :

$ squeue 
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               756  cpu_dist singular username PD       0:00      4 (None)

cancel job

The scancel command cancels job.

To cancel job job0 with jobid 757 (obtained through squeue), you would use :

$ scancel 757

interactive jobs

  • Example 1: access one node in interactive for 30 minutes.
$ srun --nodes=1 --time=00:30:00 -p cpu_seq --account=YourAccountProject --pty /bin/bash
[user@node001 ~]$ hostname
node001
  • Example 2: access on a node with a GPU for 30 minutes.
$ srun --nodes=1 --time=00:30:00 -p gpu --gres=gpu:1 --account=YourAccountProject --pty /bin/bash
[user@cholesky-gpu01 ~]$ hostname
cholesky-gpu01

job arrays

TODO

chain jobs

TODO

Accounting

Use the command sacct to get info on your finished jobs.

Note

On Cholesky, the accounting information is restricted to your jobs only.