LogoLogo
LogoLogo
  • The Barbara K. Ostrom (1978) Bioinformatics and Computing Facility
  • Computing Resources
    • Active Data Storage
    • Archive Data Storage
    • Luria Cluster
      • FAQs
    • Other Resources
  • Bioinformatics Topics
    • Tools - A Basic Bioinformatics Toolkit
      • Getting more out of Microsoft Excel
      • Bioinformatics Applications of Unix
        • Unix commands applied to bioinformatics
        • Manipulate NGS files using UNIX commands
        • Manipulate alignment files using UNIX commands
      • Alignments and Mappers
      • Relational databases
        • Running Joins on Galaxy
      • Spotfire
    • Tasks - Bioinformatics Methods
      • UCSC Genome Bioinformatics
        • Interacting with the UCSC Genome Browser
        • Obtaining DNA sequence from the UCSC Database
        • Obtaining genomic data from the UCSC database using table browser queries
        • Filtering table browser queries
        • Performing a BLAT search
        • Creating Custom Tracks
        • UCSC Intersection Queries
        • Viewing cross-species alignments
        • Galaxy
          • Intro to Galaxy
          • Galaxy NGS Illumina QC
          • Galaxy NGS Illumina SE Mapping
          • Galaxy SNP Interval Data
        • Editing and annotation gene structures with Argo
      • GeneGO MetaCore
        • GeneGo Introduction
        • Loading Data Into GeneGO
        • Data Management in GeneGO
        • Setting Thresholds and Background Sets
        • Search And Browse Content Tab
        • Workflows and Reports Tab
        • One-click Analysis Tab
        • Building Network for Your Experimental Data
      • Functional Annotation of Gene Lists
      • Multiple Sequence Alignment
        • Clustalw2
      • Phylogenetic analysis
        • Neighbor Joining method in Phylip
      • Microarray data processing with R/Bioconductor
    • Running Jupyter notebooks on luria cluster nodes
  • Data Management
    • Globus
  • Mini Courses
    • Schedule
      • Previous Teaching
    • Introduction to Unix and KI Computational Resources
      • Basic Unix
        • Why Unix?
        • The Unix Tree
        • The Unix Terminal and Shell
        • Anatomy of a Unix Command
        • Basic Unix Commands
        • Output Redirection and Piping
        • Manual Pages
        • Access Rights
        • Unix Text Editors
          • nano
          • vi / vim
          • emacs
        • Shell Scripts
      • Software Installation
        • Module
        • Conda Environment
      • Slurm
    • Introduction to Unix
      • Why Unix?
      • The Unix Filesystem
        • The Unix Tree
        • Network Filesystems
      • The Unix Shell
        • About the Unix Shell
        • Unix Shell Manual Pages
        • Using the Unix Shell
          • Viewing the Unix Tree
          • Traversing the Unix Tree
          • Editing the Unix Tree
          • Searching the Unix Tree
      • Files
        • Viewing File Contents
        • Creating and Editing Files
        • Manipulating Files
        • Symbolic Links
        • File Ownership
          • How Unix File Ownership Works
          • Change File Ownership and Permissions
        • File Transfer (in-progress)
        • File Storage and Compression
      • Getting System Information
      • Writing Scripts
      • Schedule Scripts Using Crontab
    • Advanced Utilization of IGB Computational Resources
      • High Performance Computing Clusters
      • Slurm
        • Checking the Status of Computing Nodes
        • Submitting Jobs / Slurm Scripts
        • Interactive Sessions
      • Package Management
        • The System Package Manager
        • Environment Modules
        • Conda Environments
      • SSH Port Forwarding
        • SSH Port Forwarding Jupyter Notebooks
      • Containerization
        • Docker
          • Docker Installation
          • Running Docker Images
          • Building Docker Images
        • Singularity
          • Differences from Docker
          • Running Images in Singularity
      • Running Nextflow / nf-core Pipelines
    • Python
      • Introduction to Python for Biologists
        • Interactive Python
        • Types
          • Strings
          • Lists
          • Tuples
          • Dictionaries
        • Control Flow
        • Loops
          • For Loops
          • While Loops
        • Control Flows and Loops
        • Storing Programs for Re-use
        • Reading and Writing Files
        • Functions
      • Biopython
        • About Biopython
        • Quick Start
          • Basic Sequence Analyses
          • SeqRecord
          • Sequence IO
          • Exploration of Entrez Databases
        • Example Projects
          • Coronavirus Exploration
          • Translating a eukaryotic FASTA file of CDS entries
        • Further Resources
      • Machine Learning with Python
        • About Machine Learning
        • Hands-On
          • Project Introduction
          • Supervised Approaches
            • The Logistic Regression Model
            • K-Nearest Neighbors
          • Unsupervised Approaches
            • K-Means Clustering
          • Further Resources
      • Data Processing with Python
        • Pandas
          • About Pandas
          • Making DataFrames
          • Inspecting DataFrames
          • Slicing DataFrames
          • Selecting from DataFrames
          • Editing DataFrames
        • Matplotlib
          • About Matplotlib
          • Basic Plotting
          • Advanced Plotting
        • Seaborn
          • About Seaborn
          • Basic Plotting
          • Visualizing Statistics
          • Visualizing Proteomics Data
          • Visualizing RNAseq Data
    • R
      • Intro to R
        • Before We Start
        • Getting to Know R
        • Variables in R
        • Functions in R
        • Data Manipulation
        • Simple Statistics in R
        • Basic Plotting in R
        • Advanced Plotting in R
        • Writing Figures to a File
        • Further Resources
    • Version Control with Git
      • About Version Control
      • Setting up Git
      • Creating a Repository
      • Tracking Changes
        • Exercises
      • Exploring History
        • Exercises
      • Ignoring Things
      • Remotes in Github
      • Collaborating
      • Conflicts
      • Open Science
      • Licensing
      • Citation
      • Hosting
      • Supplemental
Powered by GitBook

MIT Resources

  • https://accessibility.mit.edu

Massachusetts Institute of Technology

On this page
  • Creating and Activating an Environment
  • Declaratively Defining a Conda Environment
  • Conda Environments in Slurm
  • Sharing Conda Environments

Was this helpful?

Export as PDF
  1. Mini Courses
  2. Advanced Utilization of IGB Computational Resources
  3. Package Management

Conda Environments

PreviousEnvironment ModulesNextSSH Port Forwarding

Last updated 9 months ago

Was this helpful?

Per , Conda is a tool that "provides package, dependency, and environment management for any language." Essentially, Conda lets you create your own environments that contain the necessary pieces of software needed for you to run whatever program(s) or pipelines you need.

Creating and Activating an Environment

Conda is provided on Luria as a module environment, so to use it you'll first have to load in the module miniconda3/v4.

module load miniconda3/v4

When miniconda3 is loaded, you'll be asked to run:

source /home/software/conda/miniconda3/bin/condainit

Make sure to do so.

Now, the conda program will be available to you.

To create a new conda environment, you'll first have to name it. It's typical to make a new environment for a particular task or pipeline, or for a single tool that requires being isolated. Name the environment accordingly. Once the environment is created, it can be activated.

conda create --name example_environment

conda activate example_environment

What's happening here? When you create an environment, Conda creates a new directory in ~/.conda/envs with the environment's name. This directory is where any packages and libraries that are installed via Conda will be placed. Activating a Conda environment will add this directory to your shell's environment, so that you can use any packages and libraries present in it as if they were installed to the system.

While the Conda environment is activated, you can use Conda to install packages to it. Any packages you tell it to install will be placed in the active environment directory.

conda install pigz # install a parallel compression program

pigz --version

which pigz # this will show you that the pigz program is installed in the Conda environment directory

Conda installs packages from what are called "channels". Channels are remote repositories that contain packages. Typical channels include anaconda, conda-forge, and bioconda. Each channel contains its own set of packages, so it's best to know what channel the software you need is located at. You can search channels for software by running:

conda search <channel name>::<package>

Or by simply looking it up online.

Once you are done using an environment, you can deactivate it. This is similar to unloading an environment module. If you ever need that environment again, you simply activate it and proceed to use the programs you installed to it previously.

conda deactivate

conda env list # Lists what environments you've created

conda activate example_environment # Activate the environment again

The following is a real-world example of a good use-case for creating a Conda environment.

module load miniconda3/v4

source /home/software/conda/miniconda3/bin/condainit

conda create --name radian_environment

conda activate radian_environment

conda install -c conga-forge radian r-base

Now, whenever you want to run radian, you just load in Conda and activate the environment you created for it.

Declaratively Defining a Conda Environment

Instead of imperatively creating an environment, you can create a yaml file that describes the structure of your environment, such as your environment's name, what channels it will pull packages from, and what packages it needs, then have Conda create an environment from that file.

Defining an environment this way makes it easy to remember what packages you need for your use-case in case you need to recreate the environment in the future. It also makes it easy to share an environment with other researchers so that they can get up and running quickly.

Consider the following example: you need to run a pipeline that requires the following pieces of software with the corresponding version:

trim_galore/0.6.6
parallel/20200922
bioawk/1.0
perl/5.26.2
cutadapt/1.18
bowtie2/2.2.4

You can create the following yaml file called pipeline_example.yml:

name: pipeline_example
channels:
- bioconda
- conda-forge
- defaults
dependencies:
- trim-galore=0.6.6
- parallel=20200922
- bioawk=1.0
- perl=5.26.2
- cutadapt=1.18
- bowtie2=2.2.4
prefix: /home/software/conda/miniconda3

This yaml file details the name of the Conda environment, what channels it should install packages from, what packages need to be installed, and the Conda prefix directory.

Now, you can have Conda create the environment and activate it:

module load miniconda3/v4

source /home/software/conda/miniconda3/bin/condainit

conda env create -f pipeline_example.yml

conda activate pipeline_example

This saves you the trouble of creating the environment yourself then manually installing each package.

If you ever need to make changes to this environment, you can update the yaml file, then run:

conda activate pipeline_example
conda env update --file pipeline_example.yml --prune

Conda Environments in Slurm

Using your Conda environments in a Slurm script is very similar to using Environment Modules in a Slurm script. You just have to append the script with code to load in miniconda3, then activate the appropriate environment, like so:

#!/bin/bash
#SBATCH -N 1
#SBATCH -n 1
#SBATCH --mail-type=END	
#SBATCH --mail-user=example@mit.edu
###################################

module load miniconda3/v4
source /home/software/conda/miniconda3/bin/condainit
conda activate pipeline_example

# <your pipeline commands>

Sharing Conda Environments

If you would like to share a virtual environment that you've created with others, it's important to export the environment first. In so doing, you protect yourself from any modifications the other user might make to your environment, and make that environment portable, so that they can copy it to their own directory, or build on top of it without affecting your work.

# To export a Conda environment to a YAML file:

[user1]~ conda activate myenv # Activate the environment you'd like to export
[user1]~ conda env export > environment.yml

# After grabbing the YAML file and copying it to their home directory, a user could create a new environment from the environment.yml file:

[user2]~ conda env create -f environment.yml # You may now share this file with whomever wishes to use it
[user2]~ conda activate myenv # Activate the new environment from the file
[user2]~ conda env list # Verify that the new environment was installed correctly:

Let's say you want to use the program radian, which provides a more modern R console experience than baseline R. However, there is no module available for radian. According to , radian is located on the conda-forge channel. Therefore, to make a Conda environment for radian and install both it and R, you'd do the following:

For more information, see the official Conda documentation , and a more detailed guide from The Carpentries .

their website
radian's documentation
here
here