Table of Contents | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
Overview
Both Sol and Phoenix use the same partition and QoS options. Reference this page to help select the best option for your job.
If not explicitly defined, jobs will default to the HTC partition and public QoS.
Partitions
general
The general-use partition comprises all Research Computing-owned nodes. This partition has a wall time limit of 7 days. CPU-only, GPU-accelerated, and FPGA-accelerated jobs will typically use this.
Code Block |
---|
#SBATCH -p general #SBATCH -t 7-00:00:00 interactive -p general |
htc
The high-throughput partition is aimed at jobs that can complete within a four-hour walltime. Jobs that can fit within this window have a scheduling advantage, as this partition includes not only Research Computing-owned nodes, but also privately-owned nodes. As part of the arrangement with infrastructure-purchasing labs, htc
jobs can run without risk of pre-emption when using the public
qos.
Code Block |
---|
#SBATCH -p htc #SBATCH -t 0-04:00:00 interactive -p htc -t 0-4 |
highmem
The highmem partition is aimed at jobs that can require an extra amount of memory that cannot be satisfied by the regular compute nodes. The computing power is the same as regular compute nodes, just with greater upper-end memory capacity (up to 2TB). The highmem partition is currently capped to 48 hours (2 days) walltime for any given job. If longer than two days is required, a lengthier QOS is available to extend that to 7 days. To use this QOS, we invite you to reach out to the Research Computing staff, where we can help evaluate your job and its fitness to be on these nodes.
Code Block |
---|
#SBATCH -p highmem #SBATCH -t 2-00:00:00 interactive -p highmem --mem=1000G |
lightwork
The lightwork partition is aimed at jobs that require relatively less computing power than typical supercomputing jobs and may stay idle for larger amounts of time. Great examples of this would be creating mamba environments, compiling software, VSCode tunnels, or basic software tests. The aux-interactive command will automatically allocate on the lightwork partition
Code Block |
---|
#SBATCH -p lightwork #SBATCH -t 1-00:00:00 interactive -p lightwork --mem=1000G |
Info |
---|
Jobs that utilize cores to their full potential are more appropriately used in |
QOS
public
The public
QOS should be used for any jobs that use public Research Computing Resources. This includes any job submitted to the above-listed partitions. Under most circumstances, public
is the preferred QOS.
Code Block |
---|
#SBATCH -q public interactive -q public |
debug
debug
is a special QOS for testing your sbatch
jobs for syntax errors and pre-compute testing. When setting up your workflows, rather than submit to the normal compute partitions, submitting to debug offers the following advantages:
Provides a much quicker turnaround in troubleshooting your scripts.
Are there syntax errors as your script progresses?
Are all paths and modules properly set to ensure a successful run?Has shorter expected times for jobs to start
Using the
debug
partition with a limited (small) dataset means you can quickly confirm the validity of your pipelines: you can then switch to your full dataset with greater confidence it will complete with the desired output.
Code Block |
---|
#SBATCH -p general #SBATCH -q debug #SBATCH -t 0-00:15:00 interactive -q debug -t 15 |
The debug
QOS works with the general
and htc
partitions for walltimes up to 15 minutes.
private
The private
QOS signifies a willingness to have your job flagged as pre-emptable (cancellable by hardware owners) as a trade-off for being able to use that privately owned resource longer than the protected four-hour htc
partition offering. This can sometimes greatly reduce the time-to-start for jobs, but is heavily dependent on the aggregate supercomputer load. Pre-emptable jobs will be canceled if private owners submit a job to those resources.
Examples of this might be to use privately owned GPUs, especially where checkpointing is used.
Code Block |
---|
#SBATCH -p general #SBATCH -q private interactive -p general -q private |
Job preemption will occur in either of the following circumstances:
a member of the hardware-owning group schedules a job that cannot be satisfied because of the
private
job allocation, e.g.,grp_labname
.
grp_labname
Any resources purchased by independent labs will be given a grp_labname
QOS. For the users of the group, there is no fairshare impact for using owned hardware.
Code Block |
---|
#SBATCH -p general #SBATCH -q grp_labname interactive -p general -q grp_labname |
Note, that the general
partition is still used by hardware owners, but the QOS itself limits the job to the group’s eligible hardware. grp_labname
jobs will not preempt public
jobs 4 hours or shorter, but will preempt private
jobs of any length.
long
The long
QOS permits runtimes on Research Computing-owned hardware to exceed the 7-day maximum and extends this to 14 days. It should not be used with interactive, only SBATCH
Code Block |
---|
#SBATCH -p general #SBATCH -q long |
This is a special-case QOS not available by default, it is granted on a user-by-user basis; if you are interested in using this QOS, please be ready to share a job ID demonstrating the need and effective use of existing core allocations. If you have any questions, feel free to ask staff and also explore the Slurm EFFiciency
reporter: /wiki/spaces/RC/pages/395083777
class
If you are using Research Computing supercomputers as part of coursework, you will have been granted access to the class
QOS.
Code Block |
---|
#SBATCH -p general #SBATCH -q class interactive -p general -q class |
Jobs submitted with the class
QOS are limited to 1 CPU, 1 A100 MIG slice of 10GB, and 40GB of system memory at any given time. This helps to ensure that classes can have faster access to compute resources for coursework, albeit at a limited capacity.
Additional Help
Insert excerpt | ||||||||
---|---|---|---|---|---|---|---|---|
|