Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In bioinformatics research, “BLAST” is a tool to search some input DNA or RNA sequence files (.fasta files in this example, which are formatted text files) again genomic databases. This example is about using blastx on a folder containing 100 fasta files, and search these files against four existing databases. So the job array will invoke totally 100*4=400 sub-jobs. The example will cover all the steps needed, but assuming the mamba env, the blast software suit, and the blast databases are setup correctly.

Info

Check out the /data/datasets/community/directory for already downloaded blast databases.

  1. Design the workflow

We will have a manifest file to feed inputs, which should be the location of the fasta files, and the name/location of the databases. Then a sbatch script to call blastx and generate the job array, and a single line of command to submit this sbtach script.

...