Conda Environments

Per their website, Conda is a tool that "provides package, dependency, and environment management for any language." Essentially, Conda lets you create your own environments that contain the necessary pieces of software needed for you to run whatever program(s) or pipelines you need.

Creating and Activating an Environment

Conda is provided on Luria as a module environment, so to use it you'll first have to load in the module miniconda3/v4.

module load miniconda3/v4

When miniconda3 is loaded, you'll be asked to run:

source /home/software/conda/miniconda3/bin/condainit

Make sure to do so.

Now, the conda program will be available to you.

To create a new conda environment, you'll first have to name it. It's typical to make a new environment for a particular task or pipeline, or for a single tool that requires being isolated. Name the environment accordingly. Once the environment is created, it can be activated.

conda create --name example_environment

conda activate example_environment

What's happening here? When you create an environment, Conda creates a new directory in ~/.conda/envs with the environment's name. This directory is where any packages and libraries that are installed via Conda will be placed. Activating a Conda environment will add this directory to your shell's environment, so that you can use any packages and libraries present in it as if they were installed to the system.

While the Conda environment is activated, you can use Conda to install packages to it. Any packages you tell it to install will be placed in the active environment directory.

conda install pigz # install a parallel compression program

pigz --version

which pigz # this will show you that the pigz program is installed in the Conda environment directory

Conda installs packages from what are called "channels". Channels are remote repositories that contain packages. Typical channels include anaconda, conda-forge, and bioconda. Each channel contains its own set of packages, so it's best to know what channel the software you need is located at. You can search channels for software by running:

conda search <channel name>::<package>

Or by simply looking it up online.

Once you are done using an environment, you can deactivate it. This is similar to unloading an environment module. If you ever need that environment again, you simply activate it and proceed to use the programs you installed to it previously.

conda deactivate

conda env list # Lists what environments you've created

conda activate example_environment # Activate the environment again

The following is a real-world example of a good use-case for creating a Conda environment.

Let's say you want to use the program radian, which provides a more modern R console experience than baseline R. However, there is no module available for radian. According to radian's documentation, radian is located on the conda-forge channel. Therefore, to make a Conda environment for radian and install both it and R, you'd do the following:

module load miniconda3/v4

source /home/software/conda/miniconda3/bin/condainit

conda create --name radian_environment

conda activate radian_environment

conda install -c conga-forge radian r-base

Now, whenever you want to run radian, you just load in Conda and activate the environment you created for it.

Declaratively Defining a Conda Environment

Instead of imperatively creating an environment, you can create a yaml file that describes the structure of your environment, such as your environment's name, what channels it will pull packages from, and what packages it needs, then have Conda create an environment from that file.

Defining an environment this way makes it easy to remember what packages you need for your use-case in case you need to recreate the environment in the future. It also makes it easy to share an environment with other researchers so that they can get up and running quickly.

Consider the following example: you need to run a pipeline that requires the following pieces of software with the corresponding version:

trim_galore/0.6.6
parallel/20200922
bioawk/1.0
perl/5.26.2
cutadapt/1.18
bowtie2/2.2.4

You can create the following yaml file called pipeline_example.yml:

name: pipeline_example
channels:
- bioconda
- conda-forge
- defaults
dependencies:
- trim-galore=0.6.6
- parallel=20200922
- bioawk=1.0
- perl=5.26.2
- cutadapt=1.18
- bowtie2=2.2.4
prefix: /home/software/conda/miniconda3

This yaml file details the name of the Conda environment, what channels it should install packages from, what packages need to be installed, and the Conda prefix directory.

Now, you can have Conda create the environment and activate it:

module load miniconda3/v4

source /home/software/conda/miniconda3/bin/condainit

conda env create -f pipeline_example.yml

conda activate pipeline_example

This saves you the trouble of creating the environment yourself then manually installing each package.

If you ever need to make changes to this environment, you can update the yaml file, then run:

conda activate pipeline_example
conda env update --file pipeline_example.yml --prune

Conda Environments in Slurm

Using your Conda environments in a Slurm script is very similar to using Environment Modules in a Slurm script. You just have to append the script with code to load in miniconda3, then activate the appropriate environment, like so:

#!/bin/bash
#SBATCH -N 1
#SBATCH -n 1
#SBATCH --mail-type=END	
#SBATCH --mail-user=example@mit.edu
###################################

module load miniconda3/v4
source /home/software/conda/miniconda3/bin/condainit
conda activate pipeline_example

# <your pipeline commands>

Sharing Conda Environments

If you would like to share a virtual environment that you've created with others, it's important to export the environment first. In so doing, you protect yourself from any modifications the other user might make to your environment, and make that environment portable, so that they can copy it to their own directory, or build on top of it without affecting your work.

# To export a Conda environment to a YAML file:

[user1]~ conda activate myenv # Activate the environment you'd like to export
[user1]~ conda env export > environment.yml

# After grabbing the YAML file and copying it to their home directory, a user could create a new environment from the environment.yml file:

[user2]~ conda env create -f environment.yml # You may now share this file with whomever wishes to use it
[user2]~ conda activate myenv # Activate the new environment from the file
[user2]~ conda env list # Verify that the new environment was installed correctly:

For more information, see the official Conda documentation here, and a more detailed guide from The Carpentries here.

Last updated

Massachusetts Institute of Technology