Selecting a Partition

There is a single partition (general) and a single QOS (normal) in the Aloe environment. To improve ease of use, these values will be used by default and may be left out of jobscripts for simplicity.

Requesting CPUs

When requesting resources, it is best for performance to keep all the resources close together. This is best accomplished by requesting each of the cores you request stay on the same node with -N.

To request a given number of cpus sharing the same node, you can use the following in your SBATCH:

#SBATCH -c 5     # CPUs per TASK (in this ex. will get 5 cores)
#SBATCH -N 1     # keep all tasks on the same node

To request a given number of cpus spread across multiple nodes, you can use the following:

#SBATCH -c 5     # CPUs per TASK
#SBATCH -n 10    # number of TASKS
#SBATCH -N 10    # allow tasks to spread across multiple nodes (MIN & MAX)

The above example will allocate 50 cores, 5 cores per task on 10 independent nodes.

Take note of the inclusion or omission of -N:

#SBATCH -c 5     # CPUs per TASK
#SBATCH -n 10    # number of TASKS

This reduced example will still allocate 50 cores, 5 cores per task on any number of available nodes. Note, that unless you are using MPI-aware software, you will likely prefer to always add -N 1, to ensure that each job worker has lowest latency to each other.

As a general rule, CPU-only nodes have 128 cores and GPU-present nodes have 48 cores.

Requesting Memory

On Aloe, cores and memory are de-coupled: if you need only a single CPU core but ample memory, you can do so like this:

#SBATCH -c 1
#SBATCH -N 1
#SBATCH --mem=120G

Every single node has at least 500GB of memory, with High Memory Nodes having 1TB.

Requesting GPUs

Aloe offers 20 Nvidia A100s (2xGPU per node). These GPUs are publicly available and usable up to the maximum of 28 day walltime.

Requesting GPUs can be done with interactive:

salloc -G a100:1
  -- or --
salloc -G a100:2

SBATCH scripts can request GPUs with this format: -G [type]:[qty]

#SBATCH -N 1
...
#SBATCH -G a100:1

Minimum Memory Allocation

To best maximize GPU throughput, requested system memory (RAM) should be set to match at least the aggregate amount of GPU RAM allocated. To help ensure this efficiency, all GPUs are accompanied with a minimum of 40GB of system RAM per GPU.

For example, if you request 2 A100s, your job can have no less than 80GB of system RAM; naturally, you may choose to ask for more than 80GB, e.g., 500GB.

#SBATCH --gres=gpu:a100:2
#SBATCH --mem=50G   # Even though you requested 50GB, it will be MAX(50,80) = 80GB

Techniques to Determine Resource Usage

At the end of each job, you can execute the seff command to get a Slurm EFFiciency report. This report will tell you how much of your requested cores and memory were utilized:

Job ID: 12345
Cluster: cluster
User/Group: wdizon/wdizon
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 23
CPU Utilized: 4-13:43:50
CPU Efficiency: 2.84% of 161-01:57:18 core-walltime
Job Wall-clock time: 7-00:05:06
Memory Utilized: 133.85 GB
Memory Efficiency: 66.93% of 200.00 GB

Understanding `seff` efficiency numbers

CPU Utilized: 4-13:43:50
CPU Efficiency: 2.84% of 161-01:57:18 core-walltime

The above job ran for 4 days, ~14 hours of the 7 days requested.

Efficiency is not about the amount of runtime for a job (4d vs 7d), but the amount of CPU utilized compared to the amount allocated (this will not include unused time after the completion of execution). This job shows a clear affinity for being a memory-bound job, with an over-allocation of CPUs.

Such a job might instead be modified to the following sbatch:

#SBATCH -c 23
#SBATCH -N 1
#SBATCH --mem=200G
#SBATCH -t 7-00:00:00

Modified to:

#SBATCH -c 1          # roughly only one core was used
#SBATCH -N 1          
#SBATCH --mem=150G    # 67% of 200GB, rounding up a bit

While it is possible to reduce -t 5-00:00:00, it is not required for efficiency calculations.

Requesting Resources on Aloe