Summer Maintenance 2024 Information
General Updates:
Module loading cuda is now required
CUDA is no longer installed in the nodes directly. This allows more control over the CUDA version, for greater compatibility. This now requires module loading a specific version of CUDA.
A new version of CUDA 12.5 is available.
Current jobs relying on unspecified CUDA may fail and sbatch scripts adjusted
Node naming convention update
All nodes are now prefixed with an “s”
Login nodes are now named sol-login0[1-3]
HTC jobs will only run on public nodes
This affects previously submitted and future jobs.
This change will temporarily make the 32 private nodes on the Sol supercomputer unavailable for HTC jobs.
This will allow private node job recovery.
This change will last up to a week.
Rocky OS update
8.9 to 8.10 for security updates
Slurm version update
23.11 to 24.05 for security updates
Additional 16 GPU MIG instances are available
Modernized interactive script
The “interactive script” has been updated to provide a more seamless experience on compute nodes, enhancing the overall user experience.
The old version of interactive is still available via the command “classic-interactive”.
Benchmarks completed https://top500.org/
Green500 - results posted in November
Top500 - results posted in November
Technical Updates:
Implementation of Warewulf
Deployment tool for OS images
Keeping consistency across all of the compute nodes
Revamped account creation script for better use and functionality across the supercomputers
The arbiter tool for monitoring the login nodes has been updated to version 2.1
The Slurm scheduler has been moved to being in a container
Proxy servers for Sol has been updated to being in a container
Cholla storage array updated software version from 6.2.0.1 to 7.1.0.1