Poll Job Runtime Stats

sstat

This command allows a user to easily pull up status information about their jobs: CPU usage, task information, node information, resident set size (RSS), and virtual memory (VM).

Basic usage is as follows, but the --format flag can and should also be used to limit output to relevant details.

sstat --jobs=your_job-id

Some relevant --format variables are contained in the below table:

Variable

Description

Variable

Description

avecpu

Average (system + user) CPU time of all tasks in job.

averss

Average resident set size of all tasks in job.

jobid

The number of the job or step.

maxdiskwrite

Maximum number of bytes written by all tasks in job.

ntasks

Total number of tasks in a job or step.

See the below example with the --format flag, which prints only the job's ID, CPU time, and number of tasks.

sstat --jobs=your_job-id --format=jobid,cputime,ntasks

 

sacct

This command allows a user to easily pull up information about past jobs that have completed.

Specify either a job ID or username with the --jobs or --user flag, respectively to pull up all information on a job:

sacct --jobs=<jobid[,additional_jobids]>

Some available --format variables are contained in the below table, and may be passed as a comma separated list

Variable

Description

Variable

Description

account

Account the job ran under.

allocTRES

Allocated trackable resources (e.g. cores/RAM)

avecpu

Average CPU time of all tasks in job.

cputime

Formatted (Elapsed time * core) count used

elapsed

Jobs elapsed time formatted as DD-HH:MM:SS.

state

The job’s state

jobid

The id of the job.

jobname

The name of the job.

maxdiskread

Maximum number of bytes read

maxdiskwrite

Maximum number of bytes written

maxrss

Maximum RAM use of all job tasks

ncpus

The number of allocated CPUs

nnodes

The number of allocated nodes

ntasks

Number of tasks in a job

priority

Slurm priority

qos

Quality of service

user

Username of the person who ran the job

Examples for better understanding job hardware utilization

Note that by default, only jobs run on the current day will be listed. To search within a different period of time, use the --starttime flag. The --long flag can also be used to show a non-abbreviated version of sacct output. For example, to list detailed job characteristics for a user’s jobs since December 15th, 2020:

This produces a lot of output. As an example for formatted output, the following complete command will list information about jobs that ran today for a user, specifically information about the job’s id, average CPU use, maximum amount of RAM (memory) used, the core time (wall time multiplied by number of cores allocated), and the job’s state:

The above command in conjunction with appropriate --starttime filtering is very useful for understanding more efficient hardware requests for future jobs. For instance, if maxrss is 1 GB, then the default memory allocated to a job (4 GB+) is more than sufficient.

An additionally useful flag for the format is allocTRES%42 which will print the allocated “trackable resources” associated with the job with a width of 42 character, e.g. billing=1,cpu=1,mem=4G,node=1 would be printed for a 1 core job. The allocTRES field is helpful for comparing to the avecpu and maxrss values, for instance.

If a + is listed at the end of a field, then that field has likely been truncated to fit into a fixed number of characters. Consider increasing the with by appending a % followed by a number to specify a new width. For example allocTRES%42 overrides the default width to 42 characters.

mysacct

For convenience, the command mysacct has been added to the system. This is equivalent to sacct --user=$USER --format=jobid,avecpu,maxrss,cputime,allocTRES%42,state and accepts the same flags that sacct would, e.g. --starttime=YYYY-MM-DD or --endtime=YYYY-MM-DD.

References:
https://curc.readthedocs.io/en/latest/running-jobs/slurm-commands.html#learning-status-information-with-sstat

https://slurm.schedmd.com/sstat.html

https://slurm.schedmd.com/sacct.html

man sacct