FairShare Scheduling on Agave

On Agave, FairShare is a usage-dependent score between zero and one that affects a researcher's job priority. A researcher may expect their FairShare to halve for every 10,000 hours of research recently conducted. That is, FairShare = 2^{-current_core_hour_usage / 10,000}. Researchers will see their FairShare score when logging into Agave, or may check it with the command-line command mybalance.

A researcher's “current usage” decays with a half-life of one week. For example, if a researcher were to suddenly run a job for 1 second that used 10,000 core hours, then their initial current usage would be 10,000 core hours and their FairShare would be 0.5. Exactly one week later, that “current usage” would be half its initial value (5,000), and the FairShare would be 2^{-0.5} (which is approximately 0.71). After four weeks, the original 10,000 core hours of usage would decay to 650 core hours (for a FairShare of 0.96).

Since usage decays exponentially with a half-life of one week, FairShare recharges exponentially. For more details see this PEARC20 proceedings paper which determines the dynamical nature of FairShare on Agave.

This FairShare factor contributes, along with job size and age, to a job's final priority. The command-line tool sprio may be utilized to breakdown a job’s priority score by these factors (after multiplication by admin defined “Priority Weights”). The next section is taken from the SLURM documentation pages to help clarify the role of FairShare in job priority calculations.

Use sq on the command line to see the job queue sorted by priority! For instance, to see all pending jobs on the main cpu partitions, run: sq -t PD -p serial,parallel,htc | less

Job Priority Factors In General

The job's priority at any given time will be a weighted sum of all the factors that have been enabled in the slurm.conf file. This is reflected in the output of sprio. Job priority can be expressed as:

Job_priority = (PriorityWeightFairshare) * (fair-share_factor) + (PriorityWeightAge) * (age_factor) + (PriorityWeightJobSize) * (job_size_factor) + (PriorityWeightQOS) * (QOS_factor) - nice_factor

All of the factors in this formula are floating point numbers that range from 0.0 to 1.0. The weights are unsigned, 32 bit integers. The job's priority is an integer that ranges between 0 and 4,294,967,295. The larger the number, the higher the job will be positioned in the queue, and the sooner the job will be scheduled. A job's priority, and hence its order in the queue, can vary over time. For example, the longer a job sits in the queue, the higher its priority will grow when the age_weight is non-zero.

Age Factor

The age factor represents the length of time a job has been sitting in the queue and eligible to run. In general, the longer a job waits in the queue, the larger its age factor grows. However, the age factor for a dependent job will not change while it waits for the job it depends on to complete. Also, the age factor will not change when scheduling is withheld for a job whose node or time limits exceed the scheduler’s current limits.

Job Size Factor

The job size factor correlates to the number of nodes or CPUs the job has requested. The single node job will receive the 1.0 job size factor.

Quality of Service (QOS) Factor

On Agave, researchers that contribute nodes to the supercomputer will be provided an exclusive QOS to access those nodes. These exclusive QOS's have very large QOS factors, as to ensure that hardware contributors have top priority onto their own equipment. All other QOS factors are set to zero.

Nice Factor

This is an atypical factor that can be set with the sbatch --nice flag, which allows users to decrease the priority of their own jobs. Like the Unix system nice, positive values negatively impact a job's priority. The adjustment range is up to 2,147,483,645.