Creating a job array SBATCH script

This document is to help provide a minimum, viable SBATCH script that employs job arrays to help dispatch numerous sub-jobs from a single SBATCH script and input file. Job arrays can help parallelize a wide variety of job formats: this is merely one method of parallel-izing work whose input is highly tabular.

This specific job is using a mapping file, input_data which correlates each job to its respective input parameters.

This tutorial will use three (3) separate files in your $HOME directory:

  1. jobarray.sh (the SBATCH script which defines the resource allocation and the subjob script)

  2. input_data (a simple, text-only input file, for which each individual line translates to one subjob)

  3. the_work.sh (a bash shell script which defines all the actual work needing to be done)

jobarray.sh

This is the script that is submitted via sbatch jobarray.sh. It’s successful execution will spawn 20 separate instances of the_work.sh, with a maximum of 10 concurrent jobs at once.

#!/bin/bash #SBATCH --job-name=array #SBATCH --array=1-20 #number of total jobs to spawn #SBATCH --time=0-00:00:10 #upper bound time limit for job to finish #SBATCH --partition=general #SBATCH --qos=public #SBATCH --ntasks=10 #number of concurrently running jobs srun -n 1 ./the_work.sh $SLURM_ARRAY_TASK_ID

Note, job arrays do not--and are not meant to--promise work is completed in a reliable order. The key behind job arrays is that all fully-independent of eachother and order of completion of each sub-job is not important.

If subsequent processing steps do need to occur in order, consider job dependencies.

With --array=1-20, we expect the compute work to be invoked as in the following (in the backend):

./the_work.sh 1 ./the_work.sh 2 ./the_work.sh 3 ./the_work.sh 4 ... ./the_work.sh 20

input_data

The following sample data shows latitude and longitude pairs, one per line. Based on this example, only lines 1-20 will be computed, with 21-24 omitted from the --array= directive.

25.78642, 20.40898 -74.72462, -160.2885 54.37842, 146.27341 -11.2249, -4.36196 30.70403, 97.13326 -54.3891, -88.66462 -63.55478, 136.06104 36.98927, -25.04504 42.28435, 125.48806 -39.07869, -58.36421 -39.10534, 162.02482 -12.08924, -117.58317 87.46025, 171.04233 -63.30626, 120.42349 62.53605, -92.54694 -83.28753, 117.9856 32.16439, -22.9092 81.5248, 151.89049 77.69849, 112.53154 -67.29104, -18.73863 61.78938, -120.69627 -37.00708, -140.53009 45.08802, 66.40128 -24.61395, 28.29682

the_work.sh

 

This is the processing script which does the computation component of this jobscript.

As we saw in job_array.sh, this script receives a single parameter $SLURM_ARRAY_TASK_ID which is interpreted by the_work.sh as $1.

Consider the input_data, comma separated latitude/longitude coordinates. This script will:

  1. retrieve bash argument $1, which corresponds to $SLURM_ARRAY_TASK_ID (1-20).

  2. open input_data and extract out exactly one line from that file, corresponding to $1.

  3. clean up the data, which in this example means simply removing the comma.

  4. create an array which contains the contents of coordinates, splitting the two values by a whitespace character.

  5. return the processed results, appending one line at a time to a new file, my.out

  6. finally, my.out will be produced, showing you the results of each job’s work:

Note that the order here is not sequential, but all lines can be cleaned up to be, if necessary, in post-processing. 

Remember, jobarrays are in no way limited to simply having all input squeezed onto a single line.

It is also possible for you to have 20 consistently-named files, such as input1, input2, input20… and to avoid parsing (using sed) altogether. It is also possible to have your input data map together files and numbers. The possibilities are limitless correlating incrementing numbers ($SLURM_ARRAY_TASK_ID) to arguments. Another such approach that could go inside the_work.sh:
python guess_country.py $arr[0] $arr[1] >> my.out