Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents

...

Note

This document assumes a basic familiarity with the Linux command line. If you are new to Linux, or need a refresher, Research Computing has created a guide at https://jyalim.github.io/sol-shell-novice/.

Quick Start

For users who have never used a supercomputer before, we recommend reading through the “Detailed Start” section of this document.

...

  1. Connect to the ASU Cisco AnyConnect VPN

  2. Login with your ASURITE & password

  3. Choose a connection method (terminal / web portal)

  4. Transfer needed files as needed

  5. Run an interactive session or create an SBATCH script

...

  • HPC: Short for “High Performance Computing” it refers to a group (or a cluster) of interconnected computers designed for parallelism across many computers at once. Publicly these are often called “supercomputers”.

  • Node: A single machine in a supercomputer. This will be either a physical machine or a virtual machine. 

  • Login Node: A node intended as a launching point to compute nodes. Login nodes have minimal resources and should not be used for any application that consumes a lot of CPU or memory. This is also known as a “head node”.

  • Compute Node: Nodes intended for heavy compute. This is where all heavy processing should be done.

  • Job: Work assigned to be done on a compute node. Any time a compute node is assigned a job is created.

  • Memory (RAM): Short for “Random-Access Memory“. This is used for the amount of memory that each calculation or computation requires in order to execute and complete successfully. The term “memory“ is not used for disk space. This is another main component that defines a node.

  • CPU: Short for “Central Processing Unit”, also called a core. This is one of the main components that defines a computing device, such as a node.

  • GPU: Short for “Graphic Processing Unit”. This is a specialized piece of hardware that can enable and accelerate certain computational research.

  • Scheduler: The application on our end that manages and assigns (allocates) compute resources for jobs. The scheduler used on the ASU Supercomputers is called Slurm.

Detailed Start

Connect through the Cisco VPN

...

Warning

PLEASE NOTE: If you are having trouble connecting to the ASU VPN you will need tocontact ASU Enterprise Technology. Research Computing cannot assist issues with the VPN.

Choosing a

...

Connection Method

Research Computing provides three two methods for connecting to the supercomputer. Each has their advantages and disadvantages.

  1. Connecting to the Supercomputer with

...

  1. the Web Portal
    The web portal has become the standard for new users

...

  1. . It provides a file system viewer and editor, a job submission tool, the ability to view the job queue, and a zoo of interactive applications including a virtual desktop, Jupyter Lab, and RStudio. In the file manager, uploading files is as easy as dragging-and-dropping through the interface! This web portal is accessible through sol.asu.edu.

    The virtual desktop provided by sol.asu.edu is the best way to use graphical applications on the supercomputer. However, please try to avoid using graphical sessions unless you are first learning how to work with the supercomputer or you’re working with software that is only accessible through a graphical user interface. The goal of any interactive session on the supercomputer should be to develop a working

...

  1. scheduling batch (SBATCH) script so that you may properly begin to take advantage of what supercomputing offers.

Transfer needed files

...

  1. Connecting to the Supercomputer with SSH
    SSH is the most versatile method. It is ideal for submitting jobs at scale by allowing you to create custom workflows, submit multiple jobs simultaneously through job arrays, and explore options to avoid data loss through dependencies. However, it tends to be slower with interactive graphical applications. If you intend to use MATLAB graphically (as opposed to MATLAB command line only) the screen draw will be very slow. For graphical applications, we recommend our web portal instead.

Login to Sol

You should are now be ready to reach the login node! The login node is intended as a launching point to allocate compute nodes for your job. You only need to provide your ASURITE and password, if prompted.

Info

The login node is running software called arbiter2. Arbiter2 monitors and protects interactive nodes with cgroups. It records the activity on nodes, automatically sets limits on the resources available to each user, and notifies users and administrators by email when users are penalized for using excessive resources.

Transfer Needed Files

This is optional. However, most research requires data sets or other files to be imported. For details, please see these tutorials on Transferring Files to and from the Supercomputer or using Google Drive & Globus.

Run an Interactive Session or Create an SBATCH Script

If you are using RC’s Jupyter or RStudio an interactive app provided in the web portal, this section can be skipped. If you are using a personally installed version of RStudio or Jupyter you should , please continue with reading this partsection.

Once you have a command prompt, there are two ways to get to a compute node:There are three ways to use resources on the supercomputer:

  1. Creating an interactive session in the web portal using an interactive app, such as Jupyter, RStudio, or MATLAB. This will assign a compute node to your interactive session in an interactive app of your choice. This is a great option for users to become familiar with using the supercomputer as well as to develop, test, and debug code.

  2. Starting an Interactive Session

...

  1. in the shell. This will assign a compute node and connect your command prompt to it. This is good when working by hand to establish the commands needed to run your work. When your session disconnects, the interactive session also closes. Any unsaved work will be lost.

  2. Scheduling Batch Scripts (Example)

...

  1. . This is a method of telling the scheduler you want an unattended (or non-interactive) job to run. When an sbatch script is submitted, the job will run until it either completes, fails, or runs out of time.

...

  1. These sbatch scripts can be submitted through the shell or through the “Job Composer“ in the web portal.

Recommended Reading

That covers This tutorial covered the basic steps , but you may still be wondering “How do i get my specific work done”of getting started on the supercomputer. Here’s a little more reading that may help you get fully started.

Modules and Software

Research Computing already has many software packages and many versions of the same software available. They can be accessed using modules.

Users can also install software to their home directory so long as it does not require a license. Users can also request a software install if they prefer to have a module available and the module is not already present. Software that is free for ASU but requires a license is acceptable for modules. Paid licenses are not covered by RCResearch Computing.

The FairShare

...

Score

Computational resources on the supercomputer are free for ASU faculty, students, and collaborators. To keep things fair, computational jobs are prioritized based on computational usage through a priority multiplier called FairShare, which ranges from 0 (lowest priority) to 1 (highest priority). Usage is “forgotten” via exponential decay with a half-life of one week, e.g., if a researcher instantaneously consumed 10,000 core-hour equivalents (CHE), then after one week the system would only “remember” 5,000 core hours of usage. See more on the dynamics here. CHE are tracked based on a linear combination of different hardware allocations, i.e.,

...

All jobs will always eventually run, however, researchers with higher utilization of the system may have to wait longer for their new jobs to start.

Using GPUs

Scientific research increasingly takes advantage of the power of GPUs. See our page on Requesting Resources on Sol .

Command-line

...

Switches

Interactive and sbatch can take some command line switches which greatly affect the resources a job is assigned.

See our Scheduling Jobs on Sol wikipage for a brief list of commonly used switches, as well as a list of partitions and QOSes.

XdMod (Job Statistics)

You can see day-to-day system utilization details at https://xdmod.sol.rc.asu.edu/

Sol Node Status

See the supercomputer’s node-level status here.

File Systems

There are two, primary file systems, referred to as home and scratch. These are accessed at paths /home/<username> and /scratch/<username>. Home provides a default 100 GB of storage and scratch is provided for compute jobs: only actively computed data may reside on the scratch filesystem.

...

Additional details are provided on this page: Storage & Filesystems.

Additional

...

Help

Once you have gone through this document, if you still require additional assistance, you can submit a ticket.

...

For a great reference on building proficiency with command-line tools, we provide the following MIT link from CSAIL.

...

Filter by label (Content by label)
showLabelsfalse
max5
spacescom.atlassian.confluence.content.render.xhtml.model.resource.identifiers.SpaceResourceIdentifier@a31
showSpacefalse
sortmodified
typepage
reversetrue
labelskb-how-to-article
cqllabel = "kb-how-to-article" and type = "page" and space = "RC"
Page Properties
hiddentrue

Related issues