Poll Job Runtime Stats

sstat

This command allows a user to easily pull up status information about their jobs: CPU usage, task information, node information, resident set size (RSS), and virtual memory (VM).

Basic usage is as follows, but the --format flag can and should also be used to limit output to relevant details.

sstat --jobs=your_job-id

Some relevant --format variables are contained in the below table:

Variable	Description

Variable	Description
avecpu	Average (system + user) CPU time of all tasks in job.
averss	Average resident set size of all tasks in job.
jobid	The number of the job or step.
maxdiskwrite	Maximum number of bytes written by all tasks in job.
ntasks	Total number of tasks in a job or step.

See the below example with the --format flag, which prints only the job's ID, CPU time, and number of tasks.

sstat --jobs=your_job-id --format=jobid,cputime,ntasks

sacct

This command allows a user to easily pull up information about past jobs that have completed.

Specify either a job ID or username with the --jobs or --user flag, respectively to pull up all information on a job:

sacct --jobs=<jobid[,additional_jobids]>

Some available --format variables are contained in the below table, and may be passed as a comma separated list

Variable	Description

Variable	Description
`account`	Account the job ran under.
`allocTRES`	Allocated trackable resources (e.g. cores/RAM)
`avecpu`	Average CPU time of all tasks in job.
`cputime`	Formatted (Elapsed time * core) count used
`elapsed`	Jobs elapsed time formatted as DD-HH:MM:SS.
`state`	The job’s state
`jobid`	The id of the job.
`jobname`	The name of the job.
`maxdiskread`	Maximum number of bytes read
`maxdiskwrite`	Maximum number of bytes written
`maxrss`	Maximum RAM use of all job tasks
`ncpus`	The number of allocated CPUs
`nnodes`	The number of allocated nodes
`ntasks`	Number of tasks in a job
`priority`	Slurm priority
`qos`	Quality of service
`user`	Username of the person who ran the job

Examples for better understanding job hardware utilization

Note that by default, only jobs run on the current day will be listed. To search within a different period of time, use the --starttime flag. The --long flag can also be used to show a non-abbreviated version of sacct output. For example, to list detailed job characteristics for a user’s jobs since December 15th, 2020:

This produces a lot of output. As an example for formatted output, the following complete command will list information about jobs that ran today for a user, specifically information about the job’s id, average CPU use, maximum amount of RAM (memory) used, the core time (wall time multiplied by number of cores allocated), and the job’s state:

The above command in conjunction with appropriate --starttime filtering is very useful for understanding more efficient hardware requests for future jobs. For instance, if maxrss is 1 GB, then the default memory allocated to a job (4 GB+) is more than sufficient.

An additionally useful flag for the format is allocTRES%42 which will print the allocated “trackable resources” associated with the job with a width of 42 character, e.g. billing=1,cpu=1,mem=4G,node=1 would be printed for a 1 core job. The allocTRES field is helpful for comparing to the avecpu and maxrss values, for instance.

If a + is listed at the end of a field, then that field has likely been truncated to fit into a fixed number of characters. Consider increasing the with by appending a % followed by a number to specify a new width. For example allocTRES%42 overrides the default width to 42 characters.

mysacct

For convenience, the command mysacct has been added to the system. This is equivalent to sacct --user=$USER --format=jobid,avecpu,maxrss,cputime,allocTRES%42,state and accepts the same flags that sacct would, e.g. --starttime=YYYY-MM-DD or --endtime=YYYY-MM-DD.

References:
https://curc.readthedocs.io/en/latest/running-jobs/slurm-commands.html#learning-status-information-with-sstat

https://slurm.schedmd.com/sstat.html

https://slurm.schedmd.com/sacct.html

man sacct