Overview

Ollama is a lightweight model inference platform that simplifies deploying large language models (LLMs). This tool streamlines interacting with LLMs by eliminating the need to create or run scripts, gives access to a wide variety of models, and reduces the resources required to run the models by downloading quantized versions.

Steps

To run Ollama interactively on the command line, follow these steps:

Ollama will not run on a login node. Request an interactive shell session from the command line with a GPU, e.g. for a 30-minute session
$ interactive -t 30 -G 1 -p htc
Load the ollama module (to check the different versions available run module keyword ollama).
$ module load ollama/0.3.12
Start Ollama serve in the background
$ ollama-start
Run the model. You can find a list of available models here.
The first time the model is run, Ollama automatically performs an ollama pull and downloads the model. If the model is downloaded, it loads it into memory and starts the chat.
$ ollama run llama3.2
To stop the model you can type \bye on the prompt input.
>>> \bye
To stop the Ollama serve:
ollama-stop

Research Computing

Ollama on Sol

Overview

Steps

Related content