Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

sbatch scripts are the conventional way to submit a non-interactive job to schedule work on the supercomputer.

Below is an example of an sbatch script, that should be saved as the file myscriptmyjob.sh.

This script performs performs the simple task of generating a file of sorted uniformly distributed random numbers with the shell, plotting it with python, and then e-mailing the plot to the script owner.

Code Block
languagebash
#!/bin/bash
 
#SBATCH -N 1            # number of nodes
#SBATCH -nc 1            # number of "tasks" (default: 1 core per task)cores 
#SBATCH -t 0-01:00:00   # time in d-hh:mm:ss
#SBATCH -p general      # partition 
#SBATCH -q public       # QOS
#SBATCH -o slurm.%j.out # file to save job's STDOUT (%j = JobId)
#SBATCH -e slurm.%j.err # file to save job's STDERR (%j = JobId)
#SBATCH --mail-type=ALL # Send an e-mail when a job starts, stops, or fails
#SBATCH --mail-user=%u@asu.edu # Mail-to address
#SBATCH --export=NONE   # Purge the job-submitting shell environment

# AlwaysLoad purgerequired modules tofor ensure consistent environmentsjob's environment
module purgeload mamba/latest
# Using python, #so Loadsource requiredactivate modules for job'san appropriate environment
modulesource loadactivate anaconda/latestscicomp

##
# Generate 1,000,000 random numbers in bash,
#   then store sorted results 
#   in `Distribution.txt`
##
for i in $(seq 1 1e6); do
  printf "%d\n" $RANDOM
done | sort -n > Distribution.txt
# Plot Histogram using python and a heredoc
python << EOF
import pandas as pd, seaborn as sns
sns.mpl.use('Agg')
sns.set(color_codes=True)
df = pd.read_csv('Distribution.txt',header=None,names=['rand'])
sns.distplot(df,kde=False,rug=True)
sns.mpl.pyplot.xlim(0,df['rand'].max())
sns.mpl.pyplot.xlabel('Integer Result')
sns.mpl.pyplot.ylabel('Count')
sns.mpl.pyplot.title('Sampled Uniform Distribution')
sns.mpl.pyplot.savefig('Histogram.png')
EOF
# E-mail diagnostic results to yourself using mailserver and a heredoc
mail -a 'Histogram.png' -s "Histogram Plot" ${USER}@asu.edu << EOF
Hello me,
See the attached Histogram.png for a visualization of the computed results.
EOF

This script uses the #SBATCH flag to specify a few key options:

  • The number of tasks the job will create:

    • #SBATCH -n 1

  • The runtime of the job in Days-Hours:Minutes:

    • #SBATCH -t 0-01:00

  • A file based on the jobid %j where the normal output of the program (STDOUT) should be saved:

    • #SBATCH -o slurm.%j.out

  • A file based on the jobid %j where the error output of the program (STDERR) should be saved:

    • #SBATCH -e slurm.%j.err

  • That email notifications should be sent out when the job starts, ends, or when it fails:

    • #SBATCH --mail-type=ALL

  • The address where email should be sent (%u is automatically replaced with submitter’s username, which is by default the user’s asurite):

    • #SBATCH --mail-user=%u@asu.edu

  • A purge of the shell environment otherwise inherited from the job-submission shell

    • #SBATCH --export=NONE

  • It is a good practice to always purge any modules that may have been loaded before an sbatch script was submitted as they will still be active:

    • module purge

  • This loads python so that it is the active version:

    • module load anaconda/py3

...

Tip

Submit this script from the supercomputer’s command-line with: sbatch myjob.sh.

This script has several main sections: a header prefixed with the #SBATCH pattern (which specifies default scheduler options) and the main script.

All options that follow the pattern #SBATCH are interpretable by the sbatch program. The options that are specified are interpreted as default requests for the job when passed to the scheduler via sbatch myjob.sh.

Any options passed to sbatch at execution time will override the defaults specified in the script. For example, sbatch -c 2 -t 5 -q debug myjob.sh would request two cores for five minutes in the debug QOS.

The actual script contents (interpreted by the path provided by the shell-bang line, #!/bin/bash) then follow (lines 13-38). Some of the lines are highlighted here:

  • Line 14: module load mamba/latest

    • This loads a python environment manager, mamba, which is used to create application-specific environments or activate existing ones. In this case, we’re going to activate an admin provided environment (Line 16).

  • Line 16: source activate scicomp

    • This activates a base scientific computing python environment

  • Lines 23-25

    • This loop generates one million uniformly-distributed random numbers and saves them to a file.

  • Lines 27-38

    • This section of the code runs a python script (provided via a heredoc for in-line visibility on this documentation page).

Scheduling the script for execution (sbatch myjob.sh) results in the scheduler assigning a unique integer, a job id, to the work. The status of the job can be viewed in the command-line via myjobs. Once the job is running, the status will change from PENDING to RUNNING, and when the job completes it will no longer be visible in the queue. Instead, the filesystem will contain slurm.%j.out and slurm.%j.err (not literally %j, this is a symbol for the job id) and the python generated plot Histogram.png(shown below).

...

...

How do I run this script?

Assuming that the above has been saved as the file myscriptmyjob.sh, and you intend to run it on the conventional compute nodes, you can submit this script with the following command:

Code Block
sbatch myscriptmyjob.sh

The job will be run on the first available node. N.B.: flags passed to sbatch will override the defaults in the header of myjob.sh.

...

Problems with batch files created under Windows

If you are creating an sbatchscript on a Windows system which you then move to a Linux-based supercomputer, you may see the following error when you try to run it:

...

If you see this error, the dos2unix utility can convert the script to the proper format.

Code Block
dos2unix myscriptmyjob.sh


Adding files to the supercomputer can be found here: Transferring Files to and from the Supercomputer

Additional SBATCH options

The below options are advanced and not needed in the majority of cases. They are provided here for reference.

--exclusive

This requests from slurm that no other jobs be running on the same node as your job allocation. This is generally not recommended unless you have a specific need; very few HPC workloads benefit from --exclusive.

Even though it sounds unfavorable to be “sharing” a node with another user, other users' jobs will not affect the performance of your job. At the start of each job, the resources for that job (on every node assigned to the job) are “boxed in”. Your job will be assigned specific memory ranges and cores, your job cannot use any of the resources outside of what it was assigned, and other users cannot use the resources assigned to your job.

For example, if a user requested a single core, then spins up 1000 threads, those 1000 threads will all be fighting over that single core leaving your resources unaffected.

Compute nodes are not over-provisioned. Unlike a virtual environment where you can create more computer cores than you physically have, we assign actual physical resources and do not go over that amount. If a system has 28 cores and 128G of ram, once jobs have been assigned that total either of those two numbers, no more jobs will be assigned to that node. By doing so we ensure we don't overload any nodes and provide the user with the full capabilities of the resources.

If the --exclusive option is used, all cores and memory on that node will be assigned to the user and be counted in fairshare for the full amount.

-C

This option constrains your resource allocation to only nodes that meet particular features as have been designated by the Research Computing team. This option is useful when you have a rigid requirement on hardware your job can run on, such as a specific GPU or CPU.

Please note that not all partitions have all features available. Specifying a partition and requesting a feature not available in that partition will cause your job to fail.

Forcing specific hardware types may also delay your job from starting by a considerable amount, as the pool of resources is much smaller when requesting specific featuresmanual page for sbatch contains many flags. Below are some advanced flags that are commonly utilized but are advanced:

  1. To schedule a job array: -a, --array=<indices>

  2. To request all cores and GPUs on a node: --exclusive

  3. To request all memory (RAM) on a node: --mem=0 (or a specific amount, e.g., --mem=4G for 4 GiB)

  4. To constrain work to nodes with specific features: -C, --constraint=<list>

  5. To request a GPU: -G, --gpus=[type:]<number> (e.g., --gpus=a100:1)