New User Guide for RC Compute Resources
Your first time using an High-performance Computing environment (HPC) can be intimidating, but it doesn’t have to be. This guide will hopefully get you started with the basics.
This article will assume a basic familiarity with the Linux command line. If you are new to linux, or need a refresher, RC has created a guide at https://jyalim.github.io/agave-shell-novice/
This document also assumes you already have requested and been granted an account. If not, please see Getting an Account
Please also familiarize yourself with our policies before getting started. Sections 4-8 in particular contain important information on permitted and prohibited activities. If not all the terms make sense yet, don’t worry, we’ll be covering it further down in this document.
If you run into problems or need additional help, we hold regular weekly office hours (Holiday & Summer hours may vary).
Quick Start
For users who have never used any HPC environment before, we would recommend reading through the detailed start.
However, for those who wish to get started quickly please be sure to read policies before diving into the basic steps:
Choose a connection method (command prompt/terminal/browser)
Connect to the ASU VPN
Transfer files as needed
Login with your ASURITE & password
Run an interactive session or create an SBATCH script
Important Terms
Login Node: A node intended as a jumping point to compute nodes. Login nodes have minimal resources and should not be used for any application that consumes a lot of CPU or memory. Also known as a head node.
Compute Node: Nodes intended for heavy compute. This is where all heavy processing should be done
RC: Short for Research Computing, the team that manages the ASU HPC supercomputer.
HPC: Short for “High Performance Computing” it refers to a group (cluster) of computers designed for parallelism across many computers at once. Publicly these are often called “supercomputers”
Cluster: A group of interconnected computers that can work cooperatively or independently.
Job: Work assigned to be done on a compute node. Any time a compute node is assigned a job is created.
Scheduler: The application on our end that assigns compute resources for jobs.
Slurm: The brand name of our scheduler like “AirBNB” or “VRBO” are both brand names for short-term rentals.
Detailed Start
Choosing a connection method
Research Computing provides three methods for connecting to the supercomputer. Each has their advantages and disadvantages.
Connecting with SSH Is the most versatile method, however it tends to be slower with graphical applications. E.g. If you intend to use Matlab graphically (as opposed to Matlab command line only) the screen draw will be very slow. For graphical applications we recommend our webapp instead.
Our webapp has become the standard for new users, as it provides a file system viewer and editor, a job submission tool, the ability to view the job queue, and a zoo of interactive applications including a virtual desktop, Jupyter Lab, and RStudio. In the file manager, uploading files is as easy as dragging-and-dropping through the interface! This webapp is accessible through login.rc.asu.edu.
The virtual desktop provided by login.rc.asu.edu is the best way to use graphical applications on the supercomputer. However, please try to avoid using graphical sessions unless you’re first learning how to work with the supercomputer or you’re working with software that is only accessible through a graphical user interface. The goal of any interactive session on the supercomputer should be to develop a working sbatch (scheduler) script so that you may properly begin to take advantage of what supercomputing offers.
Connect through the Cisco VPN
All RC resources require the user to be connected to the ASU Cisco VPN. While it is sometimes possible to connect without the VPN while on campus, there are times when the VPN may still be required. Always connecting to the VPN first even while on campus will help avoid any issues.
For details please go to the SSL VPN page
PLEASE NOTE: If you are having trouble connecting to the ASU VPN you will need to contact ASU support. RC does not have any control or insight into the VPN and cannot assist with VPN issues.
Transfer needed files
This is optional, however most research is likely to require data sets to be imported. For details please see Transferring files or Google drive & Globus
Login to Agave
You should now be ready to log into Agave using the method of choice from above. When you first log in you will be connected to a login node. Users should continue reading for information on getting a compute node.
Run Interactive or SBATCH
If you are using RC’s Jupyter or RStudio this section can be skipped. If using a personally installed version of RStudio or Jupyter you should continue with this part.
Once you have a command prompt, there are two ways to get to a compute node:
Interactive: Will assign a compute node and connect your command prompt to it. This is good when working by hand to establish the commands needed to run your work. When your session disconnects, the interactive session also closes. Any unsaved work will be lost.
sbatch: This is a method of telling the scheduler you want an unattended job run. When an sbatch
is submitted the job will run until it either completes, fails, or runs out of time. Once submitted sbatch
jobs will run without remaining connected to the supercomputer.
Recommended Reading
That covers the basic steps, but you may still be wondering “How do i get my specific work done”. Here’s a little more reading that may help you get fully started.
Modules and Software
RC already has many software packages and many versions of the same software available. They can be accessed using modules.
Users can also install software to their home directory so long as it does not require a license. Users can also request a software install if they prefer to have a module available and the module is not already present. Software that is free for ASU but requires a license is acceptable for modules. Paid licenses are not covered by RC.
The FairShare score
Submitted jobs are subject to the FairShare score. The more a user uses the supercomputer, the lower their score. Jobs will always eventually run, however, the lower the score, the longer a job may wait before actually starting (depending on the number of jobs pending in the queue). Usage is tracked but “forgotten” through exponential decay with a half-life of one week. A user may expect their FairShare score to halve for every 10,000 core hours of tracked usage.
Using GPUs
Scientific research increasingly takes advantage of the power of GPUs. See our page on using GPUs
Command line switches
Interactive and sbatch
can take some command line switches which greatly affect the resources a job is assigned.
See our cheat sheet for a brief (but not complete) list of commonly used switches
RC status page
We recommend you check https://rcstatus.asu.edu regularly. This page contains important updates about planned and unplanned outages.
You can also see the status of the supercomputer including utilization at https://rcstatus.asu.edu/agave/smallstatus.php Each box represents a compute node on the supercomputer. Hovering your mouse over the box can provide detailed information on the node including what resources are available and which partitions it is assigned to. A legend for the node labels is given here: Node Legend for rcstatusarchived.
File Systems
There are two file systems available on Agave by default to first-time users, referred to as home and scratch. These are accessed at paths /home/<username>
and /scratch/<username>
. Home provides a default 100 GB of storage and scratch is provided for compute jobs: only actively computed data may reside on the scratch filesystem.
ASU provides cloud storage through an enterprise license for Google Drive, that may be used for archiving data (Google Drive & Globus)
Additional details are provided on this page: .
Additional help
Once you have gone through this document, if you still require additional assistance, you can submit a ticket at https://rcstatus.asu.edu/servicerequest/
If your job is failing, a jobID helps us significantly as we can pull detailed information about the job by using the ID.
For a great reference on building proficiency with command-line tools, we provide the following MIT link from CSAIL.