Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Info

There is privately owned hardware that may have slightly different specs. See the Sol Status Page for the full features of every node

Note

Requesting too many resources would led to a long job queueing time.

Using too many resources would cost a large amount of fairshare points, then led to a long job queueing time. Check the efficiency of a completed test job can help with determining an appropriate amount of resource to request.

Requesting Resources

Excerpt

Requesting CPUs

To request a given number of CPUs sharing the same node, you can use the following in your SBATCH:

Code Block
#SBATCH -N 1 # Number of Nodes
#SBATCH -c 5 #Number of Cores per task
or
interactive -N 1 -c 5

This will create a job with 5 CPU cores on one node.

To request a given number of CPUs spread across multiple nodes, you can use the following:

Code Block
#SBATCH -N 2-4    # number of nodes to allow tasks to spread across (MIN & MAX)
#SBATCH -n 10    # number of TASKS
#SBATCH -c 5     # CPUs per TASK
or
interactive -N 2-4 -n 10 -c 5

The above example will allocate a total of 50 cores spread across as few as 2 nodes or as many as 4 nodes.

Take note of the inclusion or omission of -N:

Code Block
#SBATCH -c 5     # CPUs per TASK
#SBATCH -n 10    # number of TASKS
or
interactive -n 10 -c 5

This reduced example will still allocate 50 cores, 5 cores per task on any number of available nodes. Note, that unless you are using MPI-aware software, you will likely prefer to always add -N, to ensure that each job worker has sufficient connectivity.

-c and -n have similar effects in Slurm in allocating cores, but -n is the number of tasks, and -c is the number of cores per task. MPI processes bind to a task, so the general rule of thumb is for MPI jobs to allocate tasks, while serial jobs allocate cores, and hybrid jobs allocate both.

See the official Slurm documentation for more information: https://slurm.schedmd.com/sbatch.html

Requesting Memory

Cores and memory are de-coupled: if you need only a single CPU core but ample memory, you can do so like this:

Code Block
#SBATCH -c 1
#SBATCH -N 1
#SBATCH --mem=120G
or
interactive -N 1 -c 1 --mem=120G

If you do not specify --mem, you will be allocated 2GiB per CPU core OR 24GiB per GPU

To request more than 512GiB of memory, you will need to use the highmem partition

Code Block
#SBATCH -p highmem
#SBATCH --mem=1400G

To request all available memory on a node:

Warning

This will allocate all CPU cores memory (up to 2TiB depending on the node) to your job. This will prevent any other jobs from landing on this node. Only use this if you truly need that much memory

Code Block
#SBATCH --exclusive 
#SBATCH --mem=0

Requesting GPUs

To request a GPU, you can specify the -G option within your job request.

This will allocate the first available GPU that fits your job request:

Code Block
#SBATCH -G 1
or 
interactive -G 1

To request multiple GPUs specify a number greater than 1:

Code Block
#SBATCH -G 4
or 
interactive -G 4

To request a specific number of GPUs per node when running multi-node:

Code Block
#SBATCH -N 2              # Request two nodes
#SBATCH --gpus-per-node=2 #Four total GPUs, two per node

To request a specific type of GPU (a100 for example):

Code Block
#SBATCH -G a100:1
or
interactive -G a100:1

GPU Varieties Available

Below is a table demonstrating the available GPU instance sizes you can allocate:

GPU Name

GPU Memory

Slice Count

a100

80GB, 40GB

4 per node, NVLINKed

a30

24GB

4 per node, NVLINKed

1g.20gb

20GB

4 per node

2g.20gb

20GB

12 per node

h100 (Privately Owned)

96GB

4 -8 per node, NVLINKed

The a100s can come in two varieties, as seen above.

To guarantee a 80GB a100, include this feature:
#SBATCH -C a100_80. This can be done also with interactive -C a100_80 (a100_40 is also possibleprovided). To request more than one a100s while specifying the variety:

Code Block
$ interactive -G a100:2 -C a100_80
or
#SBATCH -G a100:2
#SBATCH -C a100_80

Requesting FPGAs

Sol has two nodes with a Field Programmable Gate Array (FPGA) accelerator. One is an Intel-based node with a Bitaware 520N-MX FPGA, the other is an AMD-based node with a Xilinx U280. Because there is only FPGA per node, it is recommended to allocate the entire node.

...

Note there should not be a space between ā€œ-Lā€ and the FPGA name on the web portal

image-20240320-174636.pngimage-20240320-174758.png

Requesting the Grace Hopper ARM

The GraceHopper is a specialized unit running the ARM architecture aarch64, which is separate and non-compatible with x86_64 applications. While this node frequently is idle, unless your application is designed for this less-common architecture, you should expect compiled applications to fail on execution.

Requesting this node requires doing so exclusively.

Code Block
#SBATCH --exclusive
#SBATCH -p highmem
#SBATCH -L gracehopper
#SBATCH -G 1
or
interactive --exclusive -L gracehopper -G 1 -p highmem

Additional Help

Insert excerpt
Contact RC Support excerpt
Contact RC Support excerpt
nameContact RC Support
nopaneltrue