Scheduling Batch Scripts (Example)
sbatch
scripts are the conventional way to schedule work on the supercomputer.
Below is an example of an sbatch
script, that should be saved as the file myjob.sh
.
This script performs performs the simple task of generating a file of sorted uniformly distributed random numbers with the shell, plotting it with python
, and then e-mailing the plot to the script owner.
#!/bin/bash
#SBATCH -N 1 # number of nodes
#SBATCH -c 1 # number of cores
#SBATCH -t 0-01:00:00 # time in d-hh:mm:ss
#SBATCH -p general # partition
#SBATCH -q public # QOS
#SBATCH -o slurm.%j.out # file to save job's STDOUT (%j = JobId)
#SBATCH -e slurm.%j.err # file to save job's STDERR (%j = JobId)
#SBATCH --mail-type=ALL # Send an e-mail when a job starts, stops, or fails
#SBATCH --export=NONE # Purge the job-submitting shell environment
# Load required modules for job's environment
module load mamba/latest
# Using python, so source activate an appropriate environment
source activate scicomp
##
# Generate 1,000,000 random numbers in bash,
# then store sorted results
# in `Distribution.txt`
##
for i in $(seq 1 1e6); do
printf "%d\n" $RANDOM
done | sort -n > Distribution.txt
# Plot Histogram using python and a heredoc
python << EOF
import pandas as pd, seaborn as sns
sns.mpl.use('Agg')
sns.set(color_codes=True)
df = pd.read_csv('Distribution.txt',header=None,names=['rand'])
sns.distplot(df,kde=False,rug=True)
sns.mpl.pyplot.xlim(0,df['rand'].max())
sns.mpl.pyplot.xlabel('Integer Result')
sns.mpl.pyplot.ylabel('Count')
sns.mpl.pyplot.title('Sampled Uniform Distribution')
sns.mpl.pyplot.savefig('Histogram.png')
EOF
Submit this script from the supercomputer’s command-line with: sbatch myjob.sh
.
This script has several main sections: a header prefixed with the #SBATCH
pattern (which specifies default scheduler options) and the main script.
All options that follow the pattern #SBATCH
are interpretable by the sbatch
program. The options that are specified are interpreted as default requests for the job when passed to the scheduler via sbatch myjob.sh
.
Any options passed to sbatch
at execution time will override the defaults specified in the script. For example, sbatch -c 2 -t 5 -q debug myjob.sh
would request two cores for five minutes in the debug QOS.
The actual script contents (interpreted by the path provided by the shell-bang line, #!/bin/bash
) then follow (lines 13-38). Some of the lines are highlighted here:
Line 14:
module load mamba/latest
This loads a python environment manager, mamba, which is used to create application-specific environments or activate existing ones. In this case, we’re going to activate an admin provided environment (Line 16).
Line 16:
source activate scicomp
This activates a base scientific computing python environment
Lines 23-25
This loop generates one million uniformly-distributed random numbers and saves them to a file.
Lines 27-38
This section of the code runs a python script (provided via a heredoc for in-line visibility on this documentation page).
Scheduling the script for execution (sbatch myjob.sh
) results in the scheduler assigning a unique integer, a job id, to the work. The status of the job can be viewed in the command-line via myjobs
. Once the job is running, the status will change from PENDING
to RUNNING
, and when the job completes it will no longer be visible in the queue. Instead, the filesystem will contain slurm.%j.out
and slurm.%j.err
(not literally %j
, this is a symbol for the job id) and the python generated plot Histogram.png
(shown below).
How do I run this script?
Assuming that the above has been saved as the file myjob.sh
, and you intend to run it on the conventional compute nodes, you can submit this script with the following command:
sbatch myjob.sh
The job will be run on the first available node. N.B.: flags passed to sbatch
will override the defaults in the header of myjob.sh
.
Problems with batch files created under Windows
If you are creating an sbatch
script on a Windows system which you then move to a Linux-based supercomputer, you may see the following error when you try to run it:
sbatch: error: Batch script contains DOS line breaks (\r\n)
sbatch: error: instead of expected UNIX line breaks (\n).
This is because of a difference between the way that Windows and Unix systems define the end of a line in a text file.
If you see this error, the dos2unix utility can convert the script to the proper format.
Adding files to the supercomputer can be found here: https://asurc.atlassian.net/wiki/spaces/RC/pages/1643839509
Additional SBATCH options
The manual page for sbatch
contains many flags. Below are some advanced flags that are commonly utilized but are advanced:
To schedule a job array:
-a, --array=<indices>
To request all cores and GPUs on a node:
--exclusive
To request all memory (RAM) on a node:
--mem=0
(or a specific amount, e.g.,--mem=4G
for 4 GiB)To constrain work to nodes with specific features:
-C, --constraint=<list>
To request a GPU:
-G, --gpus=[type:]<number>
(e.g.,--gpus=a100:1
)