Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Overview

This document provides a detailed overview of the updates and improvements made during the recent maintenance period. These changes are aimed at enhancing the system's reliability, performance, and user experience. Below is a breakdown of each major improvement.

Notable Changes

System Software and Security Upgrades

The cluster's operating system was upgraded from Rocky Linux 8.9 to 8.10. This upgrade includes improved security features, performance optimizations, and support for newer libraries and tools. These updates enhance system stability and compatibility with modern software requirements.


Slurm Scheduler Upgrade


The Slurm Scheduler was upgraded from version 23.11.5 to 24.5.0. This update brings better job scheduling algorithms, improved resource management, and compatibility with newer Slurm features. The new version also resolves several bugs, enhancing the overall user experience.


Mamba and Jupyter Environment Updates


The Mamba package manager was updated from version 1.5.1 to 1.5.9, alongside updates to the Jupyter environments. These updates improve compatibility with newer Python libraries and address performance and stability issues.

  • If you need to use the older Mamba environment, you can load it with the command:

    • module load mamba/.1.5.1 instead of module load mamba/latest.


High-Availability Networking Repairs


Critical repairs were completed on the high-availability networking infrastructure to address reliability issues. These changes ensure a more robust and fault-tolerant network, reducing the risk of disruptions and improving overall connectivity for compute nodes and services.


Improved Zsh Compatibility


Updates were made to improve the compatibility of the Zsh shell:

  • Bash functions were migrated to standalone bash scripts, ensuring they work as expected regardless of the shell being used.


Rebuilt OpenMPI for Broader Application Support


OpenMPI was rebuilt to expand compatibility and resolve prior issues:

  • Previously, OpenMPI was linked against compilers optimized for AVX512 instructions, causing silent failures on nodes lacking AVX512 support.

  • The new version (4.1.7) is available via module load openmpi/4.1.7.

  • The older version remains accessible via module load openmpi/4.1.5.

  • Users are encouraged to try the new module, as it will become the default in the future. However, the older module will remain available for now.


Other Notable Changes

  • The thisjob script has been enhanced to automatically check $SLURM_JOB_ID if no job ID is provided.

  • Automated node health checks have been revised to include additonal checks

If you would like any additional information about these changes, or find these changes are negatively impacting your work, please feel free to reach out to us.

Error rendering macro 'excerpt-include' : No link could be created for 'Contact RC Support'.

We also offer a series of Educational Opportunities and Workshops.

  • No labels