Selecting a Partition
There is a single partition (general
) and a single QOS (normal
) in the Aloe environment. To improve ease of use, these values will be used by default and may be left out of jobscripts for simplicity.
Requesting CPUs
When requesting resources, it is best for performance to keep all the resources close together. This is best accomplished by requesting each of the cores you request stay on the same node with -N
.
To request a given number of cpus sharing the same node, you can use the following in your SBATCH
:
#SBATCH -c 5 # CPUs per TASK (in this ex. will get 5 cores) #SBATCH -N 1 # keep all tasks on the same node
To request a given number of cpus spread across multiple nodes, you can use the following:
#SBATCH -c 5 # CPUs per TASK #SBATCH -n 10 # number of TASKS #SBATCH -N 10 # allow tasks to spread across multiple nodes (MIN & MAX)
The above example will allocate 50 cores, 5 cores per task on 10 independent nodes.
Take note of the inclusion or omission of -N
:
#SBATCH -c 5 # CPUs per TASK #SBATCH -n 10 # number of TASKS
This reduced example will still allocate 50 cores, 5 cores per task on any number of available nodes. Note, that since there is no MPI capability in the Aloe environment, you will likely prefer to always add -N 1
, to ensure that each job worker has the lowest latency to eachother.
As a general rule, CPU-only nodes have 128 cores and GPU-present nodes have 48 cores.
Requesting Memory
On Aloe, cores and memory are de-coupled: if you need only a single CPU core but ample memory, you can do so like this:
#SBATCH -c 1 #SBATCH -N 1 #SBATCH --mem=120G
Every single node has at least 500GB of memory, with High Memory Nodes having 2TB.
Requesting GPUs
Aloe offers 20 Nvidia A100s (2xGPU per node). These GPUs are publicly available and usable up to the maximum of 28 day walltime.
Requesting GPUs can be done with interactive:
salloc -G a100:1 -- or -- salloc -G a100:2
SBATCH scripts can request GPUs with this format: -G [type]:[qty]
#SBATCH -N 1 ... #SBATCH -G a100:1
Minimum Memory Allocation
To best maximize GPU throughput, requested system memory (RAM) should be set to match at least the aggregate amount of GPU RAM allocated. To help ensure this efficiency, all GPUs are accompanied with a minimum of 40GB of system RAM per GPU.
For example, if you request 2 A100s, your job can have no less than 80GB of system RAM; naturally, you may choose to ask for more than 80GB, e.g., 500GB.
#SBATCH --gres=gpu:a100:2 #SBATCH --mem=50G # Even though you requested 50GB, it will be MAX(50,80) = 80GB
Techniques to Determine Resource Usage
At the end of each job, you can execute the seff
command to get a Slurm EFFiciency
report. This report will tell you how much of your requested cores and memory were utilized:
Job ID: 12345 Cluster: cluster User/Group: wdizon/wdizon State: COMPLETED (exit code 0) Nodes: 1 Cores per node: 23 CPU Utilized: 4-13:43:50 CPU Efficiency: 2.84% of 161-01:57:18 core-walltime Job Wall-clock time: 7-00:05:06 Memory Utilized: 133.85 GB Memory Efficiency: 66.93% of 200.00 GB
Understanding seff
efficiency numbers
CPU Utilized: 4-13:43:50 CPU Efficiency: 2.84% of 161-01:57:18 core-walltime
The above job ran for 4 days, ~14 hours of the 7 days requested.
Efficiency is not about the amount of runtime for a job (4d vs 7d), but the amount of CPU utilized compared to the amount allocated (this will not include unused time after the completion of execution). This job shows a clear affinity for being a memory-bound job, with an over-allocation of CPUs.
Such a job might instead be modified to the following sbatch
:
#SBATCH -c 23 #SBATCH -N 1 #SBATCH --mem=200G #SBATCH -t 7-00:00:00
Modified to:
#SBATCH -c 1 # roughly only one core was used #SBATCH -N 1 #SBATCH --mem=150G # 67% of 200GB, rounding up a bit
While it is possible to reduce -t 5-00:00:00
, it is not required for efficiency calculations.