Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 30 Next »

Overview

The supercomputer uses Mamba, a high-performance parallel package manager, to allow users to install the Python modules they need. It also plays a pivotal role in optimizing software environments on supercomputers. In the upcoming instructions, we'll explore the process of loading Mamba modules and delve into creating and loading new environments.

The supercomputers use mamba instead of conda and pip. mamba is a parallel and C++ implementation of conda that provides a much faster experience for setting up Python environments on the supercomputer. If you’re familiar with using conda, all you will need to do is replace the word conda in the command with the word mamba. pip is not stable in a multi-user environment like a supercomputer, so Research Computing discourages the use of pip except when necessary.

Be very careful with pip, it can easily break a mamba environment!

Do NOT install package manager like conda on your account!

Do NOT use any conda command except special cases!

Why Create a New Environment?

In a fresh terminal session, python or python3 points to a system-installed copy of Python (typically in /usr/bin). As the operating system heavily depends on this python instance, the version is fixed and only the most basic, built-in libraries are available.

Creating a new environment allows you to have full control over the Python version, the selection of libraries, and the specific versions, too. Python environments can then be engaged and disengaged freely, enabling a wide-variety of specific uses including CPU compute and even GPU acceleration.

Load the Package and Environment Manager

Load the latest stable version of the mamba Python manager with:

module load mamba/latest

List Available Environments

Many Python suites, such as Pytorch or Qiime, are commonly-requested and thus are provided by Research Computing staff on the supercomputers already. These environments are version-fixed and read-only, so they may be used freely by any number of users simultaneously without any risk of the environment changing.

All global/admin-maintained python environments may be found under /packages/envs:

$ ls /packages/envs/
caffe-1.0/      gpaw-22.1.0/   multiqc-1.14/    qiime2-2023.2/       shpc/
caffe-1.0-gpu/  gurobi-9.5.1/  parallelfold/    rapids22.10/         sleap/
cenote-taker2/  keras-2.9.0/   pyklip/          repeatmasker-2.0.3/  sparkhpc/
fastqc-0.11.9/  metl/          pytorch-1.8.2/   samtools-1.16.1/     tensorflow-gpu-2.10.0/
ffcv/           mlagents/      qiime2-2022.11/  scicomp/             tensorflow-gpu-2.6.0/

User environments are by default installed to ~/.conda/envs, and after running module load mamba/latest, all available environments may be listed with mamba info --envs.

[<asurite>@login1 ~]$ mamba info --envs

          mamba version : 1.5.1
# conda environments:
#
pytorchGPU               /home/<asurite>/.conda/envs/pytorchGPU
testing                  /home/<asurite>/.conda/envs/testing
updateTest               /home/<asurite>/.conda/envs/updateTest
base                     /packages/apps/mamba/1.5.1
pytorch-1.8.2.           /packages/envs/pytorch-1.8.2
scicomp                  /packages/envs/scicomp
...

Load Available Environments

Use the source activate command to load the environment you want.

$ module load mamba/latest
$ source activate gurobi-9.5.1

The name of the environment will appear to the left of the command prompt so that you know what environment is currently active.

(gurobi-9.5.1) $ python nobel_prize.py

Environments may also be activated with a full path, e.g.,

source activate /data/sciencelab/.conda/envs/pysci.

This capability makes /data (/wiki/spaces/RC/pages/60915741) an ideal location for groups sharing python environments!

Do NOT load environments with conda or mamba as the prefix, i.e., mamba activate gurobi-9.5.1, as this injects non-supercomputing friendly cruft into your supercomputing configuration files!

Creating Environments

Do NOT perform these steps on login nodes.

$ module load mamba/latest
$ mamba create -n <environment_name> -c conda-forge -c <channel> <packages>
$ source activate <environment_name>

In the above commands, the -c flag means "channel", which is a repository location name, so that mamba can find and download the correct package. And conda-forge is one of the most popular channels. The channel must be correct to install the correct package. The correct channel name can be found by searching the package name on anaconda.org.

It is best to install all necessary packages in a single command as line 2 showed above. It maximizes environment stability and minimizes total build time to have all major dependencies resolved initially.

To create an environment with a specific path, i.e. the data directory of a research group, the path of this directory needs to be included with the -p flag in the mamba create command:

$ mamba create -p /data/example_group/ENV_NAME -c conda-forge [-c <channel>] [packages]

Environments may also be created by specifying the path, but be careful as creating environments in non-default locations makes it easy to lose/break the environment!

When using mamba to install packages or create environments, you may see errors related to opening files in /packages/apps/mamba. These errors are harmless. An example is shown below.

Always verify the Prefix: is pointing where you need it to before proceeding with an installation, but otherwise, errors and warnings made by mamba may be ignored.

It is also good practice to verify what is being installed as a new package, what existing packages are being modified, and what existing packages are being removed before proceeding with the install.

Please review the mamba install section below for a summary of the components of mamba install.

Adding dependencies to existing Environments

The global/admin-maintained environments are read-only and can’t be changed by users. To add packages to one of these environments, you will need to clone it.

To clone a public environment:

$ module load mamba/latest
$ source activate <public_environment_name>
$ mamba env export --from-history -n <public_environment_name> > /your/path/to/<public_environment_name>.yaml
$ source deactivate
$ mamba create -n <your_environment_name> python=3
$ mamba env update -n <your_environment_name> --file /your/path/to/<public_environment_name>.yaml

Line 3 above asks mamba to export the list of packages without the version numbers in this public environment, unless the version numbers were specified during the installation process of this public environment. If you wish to preserve all the version numbers, the --from-history flag should be removed. Note that some public environments are old, and some version conflicts may arise if you specify the version numbers in the .yaml file.

It is recommended to use a new name for your own environment.

To install a new package to this new mamba environment you just made:

$ module load mamba/latest
$ source activate <your_environment_name>
$ mamba install -c <channel> <packages>

Please review the screenshot of an example mamba install below before proceeding. Annotations are shown in cyan with a black background and cyan outline.

When using mamba to install packages or create environments, you may see errors related to opening files in /packages/apps/mamba. These errors are harmless. An example is shown below.

Always verify the Prefix: is pointing where you need it to before proceeding with an installation, but otherwise, errors and warnings made by mamba may be ignored.

It is also good practice to verify what is being installed as a new package, what existing packages are being modified, and what existing packages are being removed before proceeding with the install.

mamba-install.png

Using environments in Jupyter

Once an environment is created, a kernel interface will need to be made to have that environment available in Jupyter. This is as easy as, mkjupy <env_name>. Please review /wiki/spaces/RC/pages/1905788308 for additional details.

ADVANCED: Building from GitHub repository

Many python packages are not necessarily available on available mamba channels. It is best to avoid these packages when possible. However, it is possible to integrate them into a workflow. First, clone the git repository into your home directory:

$ git clone <url of github repository>

This URL can be copied from GitHub repository. In the figure below, the blue line indicates the URL of the corresponding repository (repo) page.

The cloned directory should include instructions for installing the Python package.

Be sure that you’re either in an existing mamba environment or create a new one that supports the listed dependencies. TYPICALLY THE DEPENDENCIES ARE OVERSPECIFIED--dependency files are typically very fragile and non-portable, and include precise versions for second-order dependencies. If your build is failing, try to remove all but the first-order dependencies (e.g., installing a versioned pytorch will automatically install the most stable version of numpy).

Once the environment is created and activated, and all dependencies installed, the new repository module may be installed as specified in the README, typically a pip install. To use pip properly with mamba, please follow this guide: Python Package Installation Method Comparison

Be very careful with pip, it can easily break a mamba environment!

Never use sudo, which is often provided in instructions for system-wide installations by an administrator. It is unnecessary when installing into your own home directory.

Additional Help

If you require further assistance on this topic, please contact the Research Computing Team. To create a support ticket review our RTO Request Help page. For quick inquiries, reach out via our #rc-support Slack Channel or attend our office hours for live assistance.

We also offer a series of Educational Opportunities and Workshops.

  • No labels