Scheduling Batch Scripts (Example)

sbatch scripts are the conventional way to schedule work on the supercomputer.

Below is an example of an sbatch script, that should be saved as the file myjob.sh.

This script performs performs the simple task of generating a file of sorted uniformly distributed random numbers with the shell, plotting it with python, and then e-mailing the plot to the script owner.

#!/bin/bash #SBATCH -N 1 # number of nodes #SBATCH -c 1 # number of cores #SBATCH -t 0-01:00:00 # time in d-hh:mm:ss #SBATCH -p general # partition #SBATCH -q public # QOS #SBATCH -o slurm.%j.out # file to save job's STDOUT (%j = JobId) #SBATCH -e slurm.%j.err # file to save job's STDERR (%j = JobId) #SBATCH --mail-type=ALL # Send an e-mail when a job starts, stops, or fails #SBATCH --export=NONE # Purge the job-submitting shell environment # Load required modules for job's environment module load mamba/latest # Using python, so source activate an appropriate environment source activate scicomp ## # Generate 1,000,000 random numbers in bash, # then store sorted results # in `Distribution.txt` ## for i in $(seq 1 1e6); do printf "%d\n" $RANDOM done | sort -n > Distribution.txt # Plot Histogram using python and a heredoc python << EOF import pandas as pd, seaborn as sns sns.mpl.use('Agg') sns.set(color_codes=True) df = pd.read_csv('Distribution.txt',header=None,names=['rand']) sns.distplot(df,kde=False,rug=True) sns.mpl.pyplot.xlim(0,df['rand'].max()) sns.mpl.pyplot.xlabel('Integer Result') sns.mpl.pyplot.ylabel('Count') sns.mpl.pyplot.title('Sampled Uniform Distribution') sns.mpl.pyplot.savefig('Histogram.png') EOF

Submit this script from the supercomputer’s command-line with: sbatch myjob.sh.

This script has several main sections: a header prefixed with the #SBATCH pattern (which specifies default scheduler options) and the main script.

All options that follow the pattern #SBATCH are interpretable by the sbatch program. The options that are specified are interpreted as default requests for the job when passed to the scheduler via sbatch myjob.sh.

Any options passed to sbatch at execution time will override the defaults specified in the script. For example, sbatch -c 2 -t 5 -q debug myjob.sh would request two cores for five minutes in the debug QOS.

The actual script contents (interpreted by the path provided by the shell-bang line, #!/bin/bash) then follow (lines 13-38). Some of the lines are highlighted here:

  • Line 14: module load mamba/latest

    • This loads a python environment manager, mamba, which is used to create application-specific environments or activate existing ones. In this case, we’re going to activate an admin provided environment (Line 16).

  • Line 16: source activate scicomp

    • This activates a base scientific computing python environment

  • Lines 23-25

    • This loop generates one million uniformly-distributed random numbers and saves them to a file.

  • Lines 27-38

    • This section of the code runs a python script (provided via a heredoc for in-line visibility on this documentation page).

Scheduling the script for execution (sbatch myjob.sh) results in the scheduler assigning a unique integer, a job id, to the work. The status of the job can be viewed in the command-line via myjobs. Once the job is running, the status will change from PENDING to RUNNING, and when the job completes it will no longer be visible in the queue. Instead, the filesystem will contain slurm.%j.out and slurm.%j.err (not literally %j, this is a symbol for the job id) and the python generated plot Histogram.png(shown below).

 


How do I run this script?

Assuming that the above has been saved as the file myjob.sh, and you intend to run it on the conventional compute nodes, you can submit this script with the following command:

sbatch myjob.sh

The job will be run on the first available node. N.B.: flags passed to sbatch will override the defaults in the header of myjob.sh.


Problems with batch files created under Windows

If you are creating an sbatchscript on a Windows system which you then move to a Linux-based supercomputer, you may see the following error when you try to run it:

sbatch: error: Batch script contains DOS line breaks (\r\n) sbatch: error: instead of expected UNIX line breaks (\n).

This is because of a difference between the way that Windows and Unix systems define the end of a line in a text file.

If you see this error, the dos2unix utility can convert the script to the proper format.


Adding files to the supercomputer can be found here: https://asurc.atlassian.net/wiki/spaces/RC/pages/1643839509

Additional SBATCH options

The manual page for sbatch contains many flags. Below are some advanced flags that are commonly utilized but are advanced:

  1. To schedule a job array: -a, --array=<indices>

  2. To request all cores and GPUs on a node: --exclusive

  3. To request all memory (RAM) on a node: --mem=0 (or a specific amount, e.g., --mem=4G for 4 GiB)

  4. To constrain work to nodes with specific features: -C, --constraint=<list>

  5. To request a GPU: -G, --gpus=[type:]<number> (e.g., --gpus=a100:1)