Systems Changelog

RC frequently performs maintenance and improvements on the systems under our support, including supercomputers, collocated systems, supporting storage, and networking infrastructure. Oftentimes these require the systems to be brought offline to safely perform these changes, thus creating maintenance periods.

This page has been created to highlight these changes to improve transparency and share our solutions. Feel free to reach out to rtshelp@asu.edu for any questions or feedback on this page

Table of Contents:

 

2024-01-30 - VASP license compliance

  • Software:

    • grp_vasp implemented for usage restriction according to license

2023-12-19 - Emergency Maintenance

  • Slurm:

    • Upgraded Slurm to 23.11.2, for big CVE fix

2023-10-11 - Fall Maintenance

  • Slurm:

    • Upgraded Slurm to 23.02.5, to support improved MPI

    • Changed resources limits (TRES) from a per-group to a per-user bases

  • Web Portal:

    • Upgraded OpenOnDemand to versions 3

    • New applications: Available Modules, Storage Quotas

    • Dynamic forms and memory options on the interactive apps

    • Fixed text formatting

    • Updated Jupyter

  • Software:

    • Updated Mamba from version 1.2 to 1.5.1

    • Rebuilt UCX and PMIx to improve multi-node and MPI performance

  • Hardware:

    • Completed several hardware swaps to replace non-functioning systems and systems that were out of warranty

    • Moved 12 Agave nodes to create the initial setup for the Phoenix Cluster

    • Added 3 new Condo GPU nodes to Sol (moved from Agave)

  • Misc:

    • Ran HPL benchmarks to improve our placement on the TOP500 (results public in November)