Recommended Reading
- 1 Overview
- 1.1 Recommended Reading
- 1.1.1 Modules and Software
- 1.1.2 The FairShare score
- 1.1.3 Using GPUs on Sol
- 1.1.4 Command-line switches
- 1.1.5 Sol Node Status
- 1.1.6 Sol XdMod (Job Statistics)
- 1.1.7 File Systems
- 1.1 Recommended Reading
- 2 Additional Help
Overview
The below information will stregnthen your understanding of the supercomputer enviroment and potenetially give you insight on the jobs you are trying to run.
Recommended Reading
That covers the basic steps, but you may still be wondering “How do I get my specific work done”. Here’s a little more reading that may help you get fully started.
Modules and Software
RC already has many software packages and many versions of the same software available. These are referred to as modules within the supercomputer.
Users can also install software to their home directory so long as it does not require a license. Users can also request a software install if they prefer to have a module available and the module is not already present. Software that is free for ASU but requires a license is acceptable for modules. Paid licenses are not covered by RC.
The FairShare score
Computational resources on the supercomputer are free for ASU faculty, students, and collaborators. To keep things fair, computational jobs are prioritized based on computational usage through a priority multiplier called FairShare, which ranges from 0 (lowest priority) to 1 (highest priority). Usage is “forgotten” via exponential decay with a half-life of one week, e.g., if a researcher instantaneously consumed 10,000 core-hour equivalents (CHE), then after one week the system would only “remember” 5,000 core hours of usage. See more on the dynamics here. CHE are tracked based on a linear combination of different hardware allocations, i.e.,
CHE = (core-hour equivalents) = (
(number of cores)
+ (total RAM allocated) / (4 GiB)
+ 3 * (number of Multi-Instance GPU slices)
+ 20 * (number of A30 GPUs)
+ 25 * (number of A100 GPUs)
) * (wall hours)
Thus, using one core with four GiB of RAM and one A100 GPU allocated for four hours would be tracked as roughly 108 CHE. Researchers who are more careful with their hardware allocations will see lower impacts on their FairShare as a result of the CHE FairShare system. Currently, the system dynamically determines the impact of CHE on FairShare as a function of total system utilization (10,000 CHE might halve FairShare this month, but only cost a quarter the following month). As the system approaches full utilization, the impact is more stable.
All jobs will always eventually run, however, researchers with higher utilization of the system may have to wait longer for their new jobs to start.
Using GPUs on Sol
Scientific research increasingly takes advantage of the power of GPUs. See our page on Requesting Resources on Sol using GPUs
Command-line switches
Interactive and sbatch can take some command line switches which greatly affect the resources a job is assigned.
See our Scheduling Jobs on Sol wiki or cheat sheet for a brief (but not complete) list of commonly used switches, as well as a list of partitions
Sol Node Status
See the supercomputer’s node-level status here.
Sol XdMod (Job Statistics)
You can see day-to-day system utilization details at https://xdmod.sol.rc.asu.edu/
File Systems
There are two, primary file systems, referred to as home and scratch. These are accessed at paths /home/<username> and /scratch/<username>. Home provides a default 100 GB of storage and scratch is provided for compute jobs: only actively computed data may reside on the scratch filesystem.
ASU provides limited cloud storage through an enterprise license for Google Drive, which may be used for archiving data (Google Drive & Globus)
Additional details are provided on this page: Storage & Filesystems.