Slurm Fairshare Score
Fairshare and Equitable Use of the Supercomputer
Computational resources on the supercomputer are free for ASU faculty, students, and collaborators. To keep availability equitable, submitted jobs in the queue are prioritized based on recent usage via the submitting user’s Fairshare score, which scales a base priority by a factor dependent on recent usage from zero to one. A user’s fairshare halves for every 10,000 core-hour equivalents (CHE) of usage, e.g., 20,000 CHE consumed today would reduce a user’s fairshare to 0.25. More specifically, FairShare = 2^{-current_core_hour_equivalent_usage / 10,000}
.
Usage is “forgotten” at an exponential rate, with a half-life of one week. For instance 20,000 CHE consumed today would be “remembered” as 10,000 a week from now, and after 26 weeks, as 1.07 core-second equivalents. For more details on the way fairshare is dynamically controlled on RC’s systems, see this 2020 PEARC proceedings paper.
CHE are tracked based on a linear combination of different hardware allocations, i.e.,
CHE = (core-hour equivalents) = (
(number of cores)
+ (total RAM allocated) / (4 GiB)
+ 3 * (number of Multi-Instance GPU slices)
+ 20 * (number of A30 GPUs)
+ 25 * (number of A100 GPUs)
+ 40 * (number of H100 GPUs)
) * (job runtime)
Thus, utilizing a single core with four GiB of RAM and one A100 GPU for four hours would equate to approximately 108 CHE. Researchers who carefully manage their hardware allocations to maximize job efficiency (see our documentation page on seff
and mysacct
for tools to track efficiency) will experience less impact on their FairShare.
All jobs will eventually run; however, researchers with higher recent utilization (resulting in a lower score) may experience longer wait times as other jobs are prioritized.
Requesting more or fewer resources does not affect your position in line. However, if your job requires resources that do not hinder any other user with a greater fairshare, it becomes eligible for back-filling, allowing it to be prioritized for immediate processing.
Checking Fairshare Score
The fairshare score is printed out in the login welcome text. And there are two commands available on both Sol and Phx to check the fairshare score (they are equivalent):
myfairshare
mybalance
The value in the last column is the real fairshare score or the final calculation result. Please re-run these commands if the output is broken or abnormal, and you may need to re-run them multiple times.
Working with Fairshare Score
Here are two examples of how to work around a low fairshare score and get jobs started as soon as possible:
# | Scenarios | Consequence | Workaround |
---|---|---|---|
1 | A job asking for 300 CPUs and 7 days on Sol. | This is a very bulky job and will take a very long waiting time even with a perfect fairshare score, given how busy Sol is. | Break the job into 300 * 7 * 24 = 50400 |
2 | Launching 50400 small jobs one after one using a python script. | Each submission will take a deduction on the fairshare score, so the waiting time will get longer and longer, some of the jobs will wait for days. | Instead of submitting the jobs one by one, launching a job array with 50400 sub-jobs will one take a one-time deduction on the fairshare score. So all the sub-jobs can start queuing using the same fairshare score. |