Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Both Sol and Phoenix use the same partition and QoS options. Reference this page to help select the best option for your job.

Info

If not explicitly defined, jobs will default to the HTC partition and public QoS.

Partitions

general

The general-use partition comprises all Research Computing-owned nodes. This partition has a wall time limit of 7 days. CPU-only, GPU-accelerated, and FPGA-accelerated jobs will typically use this.

...

The lightwork partition is aimed at jobs that require relatively less computing power than typical supercomputing jobs and may stay idle for larger amounts of time. Great examples of this would be creating mamba environments, compiling software, VSCode tunnels, or basic software tests. The aux-interactive command will automatically allocate on the lightwork partition.

Code Block
#SBATCH -p lightwork
#SBATCH -t 1-00:00:00

interactive -p lightwork --mem=1000G

The maximum job time is one day, and the maximum CPU cores per node are 8:

Code Block
[spock@sg008:~]$ scontrol show partition lightwork
PartitionName=lightwork
   AllowGroups=ALL AllowAccounts=ALL AllowQos=public,debug
   AllocNodes=ALL Default=NO QoS=public
   DefaultTime=04:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
   MaxNodes=UNLIMITED MaxTime=1-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=8 MaxCPUsPerSocket=UNLIMITED
   Nodes=sc[001-002]
Info

Jobs that utilize cores to their full potential are more appropriately used in htc /or general partition partitions, where cores are not shared/oversubscribed. Jobs that use full cores to > 99% for a continued duration of time, or jobs that request excessive resources prohibiting other users from using this partition, are subject to cancellation, as this is not the appropriate use of lightwork nodes. Repeated misuse of this partition will result in ineligibility from using lightwork going forward.

QOS

public

The public QOS should be used for any jobs that use public Research Computing Resources. This includes any job submitted to the above-listed partitions. Under most circumstances, public is the preferred QOS.

...

The private QOS signifies a willingness to have your job flagged as pre-emptable (cancellable by hardware owners) as a trade-off for being able to use that privately owned resource longer than the protected four-hour htc partition offering. This can sometimes greatly reduce the time-to-start for jobs, but is heavily dependent on the aggregate supercomputer load. Pre-emptable jobs will be canceled if and only if the private owners submit a job to those resources.

Examples of this might be to use privately owned GPUs, especially where checkpointing is used.

...

This is a special-case QOS not available by default, it is granted on a user-by-user basis; if you are interested in using this QOS, please be ready to share a job ID demonstrating the need and effective use of existing core allocations. If you have any questions, feel free to ask staff and also explore the Slurm EFFiciency reporter: /wiki/spaces/RC/pages/395083777

class

If you are using Research Computing supercomputers as part of coursework, you will have been granted access to the class QOS.

Code Block
#SBATCH -p general
#SBATCH -q class

interactive -p general -q class

Jobs submitted with the class QOS are limited to 1 CPU, 1 A100 MIG slice of 10GB, and 40GB of system memory at any given time. This helps to ensure that classes can have faster access to compute resources for coursework, albeit at a limited capacity.

Additional Help

Insert excerpt
Contact RC Support excerpt
Contact RC Support excerpt
nameContact RC Support
nopaneltrue