Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
module avail parallel.  # to find the correct module name
module load parallel-20220522-gcc-12.1.0  # for Sol
module load parallel-20220522-ie          # for Phoenix

...

This line of code will mix and match the attributes {db1, db2, db3, db4} with all the fasta files in the given directory, then write the outputs line by line into a text file called manifest. Below is an example of the output. The -k flag means the sequence of the output lines should be in the same order as the order of the given attributes.

...

Expand
Code Block
#!/bin/bash
#SBATCH -c 8            # number of "cores"
#SBATCH -t 4:00:00     # time in d-hh:mm:ss
#SBATCH -p serial       # partition
#SBATCH -q normal       # QOS
#SBATCH -e slurm.%A_%a.err # file to save STDERR for each sub-job
#SBATCH --export=NONE   # Purge the job-submitting shell environment
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=spock@asu.edu

# process the manifest file
manifest="${1:?ERROR -- must pass a manifest file}"
taskid=$SLURM_ARRAY_TASK_ID
case=$(getline $taskid $manifest | cut -f 1 -d ' ')
fasta=$(getline $taskid $manifest | cut -f 2 -d ' ')

# set up sbatch parameters and env
module purge
module load anaconda/py3
source activate bat

# set up input and output file names
base=$(basename -s .fasta $fasta)
out1=/scratch/name/"$base"_${case}_fmt11.txt
out2=/scratch/name/"$base"_${case}_fmt6.txt

# put all the blastx flags here just for the sake of formatting
args=(
 -query $fasta
 -db /scratch/name/path/db/$case/$case
 -evalue 1e-3
 -num_threads $(nproc)
 -max_target_seqs 5
 -max_hsps 1
 -outfmt '11'
 -out $out1
)
blastx "${args[@]}"

blast_formatter -archive $out1 -outfmt "6 qseqid sseqid evalue" -out $out2

In this step, the fmt11 output is the archive format, which is not human-readable but contains all of the query results. Then the desired columns are parsed from the archive file to a human-readable table format via the fmt6 output. The fmt11 file can be stored for future reference. This is one of the best practices for using BLAST, and more formatting info can be found here: https://www.ncbi.nlm.nih.gov/books/NBK279684/table/appendices.T.options_common_to_all_blast/

  1. Benchmarking

This step is not a part of setting up the job array, but is very important for estimating a good wall time and core numbers. Although lots of sbatch parameters are given in the sbatch script above, they can be overwritten in the command line directly. In the commands below, -a means the sub-job number in the job array, -c will overwrite the core number required in the script.

...

First, find out how many rows there are in the manifest file, it is the total sub-job number. For this example, we have a total 8 sub-jobs, so the command to submit the job array is:

...

The sbatch script runs from the submitting directory, the manifest file should be in the same directory of submission. After the run, there will be two output files generated for each fasta file, one is the fmt11 archive file, and the other is the readable fmt6 file. The path used in the codes needs to be carefully changed to reflect the actual directory structure.

...