...
Expand |
---|
Code Block |
---|
#!/bin/bash
#SBATCH -c 8 # number of "cores"
#SBATCH -t 4:00:00 # time in d-hh:mm:ss
#SBATCH -p serial # partition
#SBATCH -q normal # QOS
#SBATCH -e slurm.%A_%a.err # file to save STDERR for each sub-job
#SBATCH --export=NONE # Purge the job-submitting shell environment
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=spock@asu.edu
# process the manifest file
manifest="${1:?ERROR -- must pass a manifest file}"
taskid=$SLURM_ARRAY_TASK_ID
case=$(getline $taskid $manifest | cut -f 1 -d ' ')
fasta=$(getline $taskid $manifest | cut -f 2 -d ' ')
# set up sbatch parameters and env
module purge
module load anacondamamba/py3latest
source activate batexample_env
# set up input and output file names
base=$(basename -s .fasta $fasta)
out1=/scratch/name/"$base"_${case}_fmt11.txt
out2=/scratch/name/"$base"_${case}_fmt6.txt
# put all the blastx flags here just for the sake of formatting
args=(
-query $fasta
-db /scratch/name/path/db/$case/$case
-evalue 1e-3
-num_threads $(nproc)
-max_target_seqs 5
-max_hsps 1
-outfmt '11'
-out $out1
)
blastx "${args[@]}"
blast_formatter -archive $out1 -outfmt "6 qseqid sseqid evalue" -out $out2 |
|
Info |
---|
The “getline” is a customized command, not a standard Linux command. It can be replaced with “sed” commands. |
In this step, the fmt11 output is the archive format, which is not human-readable but contains all of the query results. Then the desired columns are parsed from the archive file to a human-readable table format via the fmt6 output. The fmt11 file can be stored for future reference. This is one of the best practices for using BLAST, and more formatting info can be found here: https://www.ncbi.nlm.nih.gov/books/NBK279684/table/appendices.T.options_common_to_all_blast/
...
This is a more complicated example, and we need a manifest file, a sbatch script, and a python script to run the actual machine learning training process. Please refer to case 1 for more explanations.
...
Image Added
Generate the manifest file
...