New User Guide for RC Compute Resources

Your first time using an High-performance Computing environment (HPC) can be intimidating, but it doesn’t have to be. This guide will hopefully get you started with the basics.

This article will assume a basic familiarity with the Linux command line. If you are new to linux, or need a refresher, RC has created a guide at https://jyalim.github.io/agave-shell-novice/

This document also assumes you already have requested and been granted an account. If not, please see Getting an Account

Please also familiarize yourself with our policies before getting started. Sections 4-8 in particular contain important information on permitted and prohibited activities. If not all the terms make sense yet, don’t worry, we’ll be covering it further down in this document.

If you run into problems or need additional help, we hold regular weekly office hours (Holiday & Summer hours may vary).

Quick Start

For users who have never used any HPC environment before, we would recommend reading through the detailed start.

However, for those who wish to get started quickly please be sure to read policies before diving into the basic steps:

  1. Choose a connection method (command prompt/terminal/browser)

  2. Connect to the ASU VPN

  3. Transfer files as needed

  4. Login with your ASURITE & password

  5. Run an interactive session or create an SBATCH script

Important Terms

  • Login Node: A node intended as a jumping point to compute nodes. Login nodes have minimal resources and should not be used for any application that consumes a lot of CPU or memory. Also known as a head node.

  • Compute Node: Nodes intended for heavy compute. This is where all heavy processing should be done

  • RC: Short for Research Computing, the team that manages the ASU HPC supercomputer.

  • HPC: Short for “High Performance Computing” it refers to a group (cluster) of computers designed for parallelism across many computers at once. Publicly these are often called “supercomputers”

  • Cluster: A group of interconnected computers that can work cooperatively or independently.

  • Job: Work assigned to be done on a compute node. Any time a compute node is assigned a job is created.

  • Scheduler: The application on our end that assigns compute resources for jobs.

  • Slurm: The brand name of our scheduler like “AirBNB” or “VRBO” are both brand names for short-term rentals.

Detailed Start

Choosing a connection method

Research Computing provides three methods for connecting to the supercomputer. Each has their advantages and disadvantages.

Is the most versatile method, however it tends to be slower with graphical applications. E.g. If you intend to use Matlab graphically (as opposed to Matlab command line only) the screen draw will be very slow. For graphical applications we recommend our webapp instead.

Our webapp has become the standard for new users, as it provides a file system viewer and editor, a job submission tool, the ability to view the job queue, and a zoo of interactive applications including a virtual desktop, Jupyter Lab, and RStudio. In the file manager, uploading files is as easy as dragging-and-dropping through the interface! This webapp is accessible through login.rc.asu.edu.

The virtual desktop provided by login.rc.asu.edu is the best way to use graphical applications on the supercomputer. However, please try to avoid using graphical sessions unless you’re first learning how to work with the supercomputer or you’re working with software that is only accessible through a graphical user interface. The goal of any interactive session on the supercomputer should be to develop a working sbatch (scheduler) script so that you may properly begin to take advantage of what supercomputing offers.

Connect through the Cisco VPN

All RC resources require the user to be connected to the ASU Cisco VPN. While it is sometimes possible to connect without the VPN while on campus, there are times when the VPN may still be required. Always connecting to the VPN first even while on campus will help avoid any issues.

For details please go to the SSL VPN page

PLEASE NOTE: If you are having trouble connecting to the ASU VPN you will need to contact ASU support. RC does not have any control or insight into the VPN and cannot assist with VPN issues.

Transfer needed files

This is optional, however most research is likely to require data sets to be imported. For details please see Transferring files or Google drive & Globus

Login to Agave

You should now be ready to log into Agave using the method of choice from above. When you first log in you will be connected to a login node. Users should continue reading for information on getting a compute node.

Run Interactive or SBATCH

If you are using RC’s Jupyter or RStudio this section can be skipped. If using a personally installed version of RStudio or Jupyter you should continue with this part.

Once you have a command prompt, there are two ways to get to a compute node:

Interactive: Will assign a compute node and connect your command prompt to it. This is good when working by hand to establish the commands needed to run your work. When your session disconnects, the interactive session also closes. Any unsaved work will be lost.

sbatch: This is a method of telling the scheduler you want an unattended job run. When an sbatch is submitted the job will run until it either completes, fails, or runs out of time. Once submitted sbatch jobs will run without remaining connected to the supercomputer.

That covers the basic steps, but you may still be wondering “How do i get my specific work done”. Here’s a little more reading that may help you get fully started.

Modules and Software

RC already has many software packages and many versions of the same software available. They can be accessed using modules.

Users can also install software to their home directory so long as it does not require a license. Users can also request a software install if they prefer to have a module available and the module is not already present. Software that is free for ASU but requires a license is acceptable for modules. Paid licenses are not covered by RC.

The FairShare score

Submitted jobs are subject to the FairShare score. The more a user uses the supercomputer, the lower their score. Jobs will always eventually run, however, the lower the score, the longer a job may wait before actually starting (depending on the number of jobs pending in the queue). Usage is tracked but “forgotten” through exponential decay with a half-life of one week. A user may expect their FairShare score to halve for every 10,000 core hours of tracked usage.

Using GPUs

Scientific research increasingly takes advantage of the power of GPUs. See our page on using GPUs

Command line switches

Interactive and sbatch can take some command line switches which greatly affect the resources a job is assigned.

See our cheat sheet for a brief (but not complete) list of commonly used switches

RC status page

We recommend you check https://rcstatus.asu.edu regularly. This page contains important updates about planned and unplanned outages.

You can also see the status of the supercomputer including utilization at https://rcstatus.asu.edu/agave/smallstatus.php Each box represents a compute node on the supercomputer. Hovering your mouse over the box can provide detailed information on the node including what resources are available and which partitions it is assigned to. A legend for the node labels is given here: .

File Systems

There are two file systems available on Agave by default to first-time users, referred to as home and scratch. These are accessed at paths /home/<username> and /scratch/<username>. Home provides a default 100 GB of storage and scratch is provided for compute jobs: only actively computed data may reside on the scratch filesystem.

ASU provides cloud storage through an enterprise license for Google Drive, that may be used for archiving data ()

Additional details are provided on this page: .

Additional help

Once you have gone through this document, if you still require additional assistance, you can submit a ticket at https://rcstatus.asu.edu/servicerequest/

If your job is failing, a jobID helps us significantly as we can pull detailed information about the job by using the ID.

For a great reference on building proficiency with command-line tools, we provide the following MIT link from CSAIL.