Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

With faculty governance board approval, on September 4th, 2019, ASU Research Computing began running a new compute job scheduling algorithm called FairShare on the main cluster.

...

  • FairShare will only be implemented on the public Agave cluster.  No changes are being made to the legacy Saguaro cluster or private Ocotillo cluster.
  • No jobs that were running at the time we made the change were interrupted.  
  • FairShare replaced the monthly 30k CPU hours crediting system, where jobs submitted beyond this balance became pre-emptable.
  • CPU Hours are now relative.  Under FairShare, all jobs which are started on public cluster resources will run to completion.  No jobs run on public resources will be preempted.
  • FairShare is not a free-for-all, however.  When the cluster is under heavy load and jobs are waiting to start, the order in which those waiting jobs will be scheduled to run is determined by each researcher's FairShare score.
  • The more CPU hours a researcher has used in a given month, the lower his or her score becomes, and the lower that user's priority is for newly-submitted jobs.  This user's jobs will have a lower priority than those submitted by a researcher with a higher FairShare score.
  • All submitted jobs will eventually run, but the order will be determined by the FairShare scores of each user, not the order in which those jobs are submitted.
  • When the cluster has available, unused resources–and no jobs are waiting to run–then having a low FairShare score will not prevent a researcher's jobs from running.  FairShare only becomes important when there are jobs waiting to be run.  However, jobs that run when the cluster is relatively idle will still affect a customer's FairShare score.All users start with the same FairShare score at the beginning of each month.  At the start of each month, every user’s FairShare score is reset, much the same way that CPU hours reset under the previous system
  • A researcher may expect their FairShare to halve for every 10,000 hours of research recently conducted. FairShare recharges exponentially, as a researcher's usage history decays with a half-life of one week.
  • The Wildfire queue will no longer exist on public resources, but will still exist on private cluster resources, such as faculty-owned GPU systems.
  • Wildfire jobs run on private cluster resources are still subject to preemption by the owners of these resources.
  • Wildfire jobs run on private cluster resources DO NOT affect a researcher's FairShare score.
  • Privileged jobs run by the owners of private resources DO NOT affect the owner’s FairShare score.
  • The aggressive queue will still exist in name, but using this queue will have no impact on job scheduling.  The name is being kept active for the convenience of our customers.

...