Preparing Python environments for Jupyter with Anaconda

If you need a special-purpose python environment above and beyond what is already available on the Agave supercomputer, you can add your own custom Anaconda environments as kernels to the webapp’s Jupyter server.

Anaconda provides a python package manager, often referred to as conda as that’s the name of the program used in the shell for package management. Anaconda organizes Python workflows into environments that are independent of Jupyter. To have Jupyter recognize the Anaconda environments, we have to create kernels, which involve having ipykernel installed in the target environment and setting up special configuration files in your home directory, all automated by the script mkjupy <environment>.

What if I already have an environment created? If you have an existing, operable environment already, you can simply skip straight to the “Add the Kernel to Jupyter” step.

Most external documentation will likely assume that you are only installing their project on a workstation or server and thus will provide instructions that will install packages to your base python environment, causing many issues downstream on any system. Python has many uses on the supercomputer including regular operating system tasks, so care must be taken to use the anaconda/py3 module’s conda to appropriately maintain scientific computing python environments.

Create the Anaconda Environment

In order for the webapp to load your environment, you must first create the environment from the terminal. This can be accomplished with the following shell commands (where the $ indicates the shell prompt) for an example environment called nobel:

$ module load anaconda/py3 $ conda create -n nobel -c conda-forge python matplotlib pandas natsort ipykernel [... output truncated] $ source activate nobel (nobel) $ python nobel_prize.py

You may give the environment any name that is currently unused by any other existing environment.

Install additional packages to environment post creation

The target environment must be loaded before installing new software to it. The following examples demonstrate how to install the package seaborn to the example environment nobel assuming a fresh shell.

$ module load anaconda/py3 $ source activate nobel (nobel) $ conda install -c conda-forge seaborn

Add the Kernel to Jupyter

From the shell, add the kernel to the web user interface with mkjupy <environment> ["Fancy Title"]:

$ mkjupy nobel

Start the Session

In the supercomputer’s webapp (https://login.rc.asu.edu), create a new interactive session after selecting Jupyter from the server list. The new kernel is now available for driving notebooks!

How do I remove a kernel from this list?

From the terminal, execute the following commands (once again assuming a fresh shell session):

Okay, but how do I do all this from inside a notebook?

To explain how to do this appropriately, we first have to understand how Jupyter is launched on the web app and recall that Jupyter kernels drive notebook cell execution, not the parent environment.

Jupyter is hosted in the (base) environment of the anaconda/py3 module. As a result, the default shell inherits the (base) environment. For instance, even if we are working in a notebook driven by nobel, the system command !which python would show: /packages/7x/anaconda3/5.3.0/bin/python, not the python in the nobel environment! However, if we do !source activate nobel; which python, then the desired python path is shown (see below).

Given this need to leave the base environment unchanged, then quick shell commands with shell escape ! may be better replaced with the Jupyter magic %%bash, for instance: