Managing Python Modules Through the Mamba Environment Manager
Overview
The supercomputer uses Mamba, a high-performance parallel package manager, to allow users to install the Python modules they need. It also plays a pivotal role in optimizing software environments on supercomputers. In the upcoming instructions, we'll explore the process of loading Mamba modules and delve into creating and loading new environments.
The supercomputers use mamba
instead of conda
and pip
. mamba
is a parallel and C++ implementation of conda
that provides a much faster experience for setting up Python environments on the supercomputer. If you’re familiar with using conda
, all you will need to do is replace the word conda
in the command with the word mamba
. pip
is not stable in a multi-user environment like a supercomputer, so Research Computing discourages the use of pip
except when necessary.
Be very careful with pip
, it can easily break a mamba environment!
Do NOT install any package manager like conda on your account!
Do NOT use any conda command except special cases!
Why Use Python Environments?
In a fresh terminal session, python
or python3
points to a system-installed copy of Python (typically in /usr/bin)
. As the operating system heavily depends on this python instance, the version is fixed and only the most basic, built-in libraries are available.
Creating a new environment allows you to have full control over the Python version, the selection of libraries, and the specific versions, too. Python environments can then be engaged and disengaged freely, enabling a wide-variety of specific uses including CPU compute and even GPU acceleration.
Below is a cartoon created with chatGPT 4o, representing a python environment by a tool shed, and python packages installed in this environment by the various tools.
Load the Environment Manager “Mamba”
Load the latest stable version of the mamba Python manager with:
module load mamba/latest
Find Available Environments
Many Python packages, such as Pytorch
or Qiime
, are commonly-requested and thus are pre-installed by Research Computing staff on the supercomputers already. These environments are version-fixed and read-only, so they may be used freely by any number of users simultaneously without any risk of the environment changing.
All global/public/admin-maintained python environments may be found under /packages/envs
. User environments are by default installed to ~/.conda/envs
, and after running module load mamba/latest
, all available environments should be listed with mamba info --envs
.
[<asurite>@login1 ~]$ mamba info --envs
mamba version : 1.5.1
# conda environments:
#
pytorchGPU /home/<asurite>/.conda/envs/pytorchGPU
testing /home/<asurite>/.conda/envs/testing
updateTest /home/<asurite>/.conda/envs/updateTest
base /packages/apps/mamba/1.5.1
pytorch-1.8.2. /packages/envs/pytorch-1.8.2
scicomp /packages/envs/scicomp
...
Load Available Environments
Use the source activate
command to load the environment you want.
$ module load mamba/latest
$ source activate gurobi-9.5.1
The name of the environment will appear to the left of the command prompt so that you know what environment is currently active.
# To run a python script using the loaded env
(gurobi-9.5.1) $ python nobel_prize.py
# To check what packages are installed in the loaded env
(gurobi-9.5.1) $ mamba list
Creating Environments
In the above commands, the -c
flag means "channel", which is a repository location name, so that mamba
can find and download the correct package. And conda-forge
is one of the most popular channels. The channel must be correct to install the correct package. The correct channel name can be found by searching the package name on anaconda.org.
To create an environment with a specific path, i.e. the data directory of a research group, the path of this directory needs to be included with the -p
flag in the mamba create
command:
Example
Please check here for a brief example: A Brief Example
Adding Packages to Public or Existing Environments
To clone a public environment:
Line 3 above asks mamba
to export the list of packages without the version numbers nor the hashes in this public environment, unless the version numbers were specified during the installation process of this public environment. If you wish to preserve all the version numbers, the --from-history
and the --no-builds
flag should be removed. Note that some public environments are old, and some version conflicts may arise if you specify the version numbers in the .yaml file.
It is recommended to use a new name for your own environment.
To install a new package to this new mamba environment you just made:
Using environments in Jupyter
Once an environment is created, a kernel interface will need to be made to have that environment available in Jupyter. This is as easy as, mkjupy <env_name>
. Please check the next page for details:
Preparing Python Environments for Jupyter
ADVANCED: Building from GitHub repository
Many python packages are not necessarily available on available mamba channels. It is best to avoid these packages when possible. However, it is possible to integrate them into a workflow. First, clone the git repository into your home directory:
$ git clone <url of github repository>
This URL can be copied from GitHub repository. In the figure below, the blue line indicates the URL of the corresponding repository (repo) page.
The cloned directory should include instructions for installing the Python package.
Once the environment is created and activated, and all dependencies installed, the new repository module may be installed as specified in the README, typically a pip install.
To use pip properly with mamba, please follow this guide: Python Package Installation Method Comparison