Systems Changelog
RC frequently performs maintenance and improvements on the systems under our support, including supercomputers, collocated systems, supporting storage, and networking infrastructure. Oftentimes these require the systems to be brought offline to safely perform these changes, thus creating maintenance periods.
This page has been created to highlight these changes to improve transparency and share our solutions. Feel free to reach out to rtshelp@asu.edu for any questions or feedback on this page
Table of Contents:
2024-01-30 - VASP license compliance
Software:
grp_vasp implemented for usage restriction according to license
2023-12-19 - Emergency Maintenance
Slurm:
Upgraded Slurm to 23.11.2, for big CVE fix
2023-10-11 - Fall Maintenance
Slurm:
Upgraded Slurm to 23.02.5, to support improved MPI
Changed resources limits (TRES) from a per-group to a per-user bases
Web Portal:
Upgraded OpenOnDemand to versions 3
New applications: Available Modules, Storage Quotas
Dynamic forms and memory options on the interactive apps
Fixed text formatting
Updated Jupyter
Software:
Updated Mamba from version 1.2 to 1.5.1
Rebuilt UCX and PMIx to improve multi-node and MPI performance
Hardware:
Completed several hardware swaps to replace non-functioning systems and systems that were out of warranty
Moved 12 Agave nodes to create the initial setup for the Phoenix Cluster
Added 3 new Condo GPU nodes to Sol (moved from Agave)
Misc:
Ran HPL benchmarks to improve our placement on the TOP500 (results public in November)