Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Here are some real use cases of Slurm Job Array.

Case one: Bioinformatics Essential - Bulk BLAST Query

In bioinformatics research, “BLAST” is a tool to search some input DNA or RNA sequence files (.fasta files in this example, which are formatted text files) again genomic databases. This example is about using blastx on a folder containing 100 fasta files, and search these files against four existing databases. So the job array will invoke totally 100*4=400 sub-jobs. The example will cover all the steps needed, but assuming the mamba env, the blast software suit, and the blast databases are setup correctly.

  1. Design the workflow

We will have a manifest file to feed inputs, which should be the location of the fasta files, and the name/location of the databases. Then a sbatch script to call blastx and generate the job array, and a single line of command to submit this sbtach script.

  1. Generate the manifest file

Code Block
parallel -k echo {} ::: db1 db2 db3 db4 ::: /dir/to/all/the/*.fasta > manifest

There are other ways to generate a manifest file with one to multiple columns, using parallel is one of the easiest ways. This line of code will mix and match the attributes {db1, db2, db3, db4} with all the fasta files in the given directory, then write the outputs line by line into a text file called manifest. Below is an example of the output. The -k flag means the sequence of the output lines should be the same order as the order of the given attributes.

Code Block
# parallel -k echo {} ::: db1 db2 db3 db4 ::: /scratch/spock/dataset/*.fasta > manifest

db1 /scratch/spock/dataset/sample1.fasta
db2 /scratch/spock/dataset/sample1.fasta
db3 /scratch/spock/dataset/sample1.fasta
db4 /scratch/spock/dataset/sample1.fasta
db1 /scratch/spock/dataset/sample2.fasta
db2 /scratch/spock/dataset/sample2.fasta
db3 /scratch/spock/dataset/sample2.fasta
db4 /scratch/spock/dataset/sample2.fasta
  1. Create the sbatch script

  2. Run Simple Benchmarking Jobs to Help Estimate Wall Time