LogoLogo
LogoLogo
  • The Barbara K. Ostrom (1978) Bioinformatics and Computing Facility
  • Computing Resources
    • Active Data Storage
    • Archive Data Storage
    • Luria Cluster
      • FAQs
    • Other Resources
  • Bioinformatics Topics
    • Tools - A Basic Bioinformatics Toolkit
      • Getting more out of Microsoft Excel
      • Bioinformatics Applications of Unix
        • Unix commands applied to bioinformatics
        • Manipulate NGS files using UNIX commands
        • Manipulate alignment files using UNIX commands
      • Alignments and Mappers
      • Relational databases
        • Running Joins on Galaxy
      • Spotfire
    • Tasks - Bioinformatics Methods
      • UCSC Genome Bioinformatics
        • Interacting with the UCSC Genome Browser
        • Obtaining DNA sequence from the UCSC Database
        • Obtaining genomic data from the UCSC database using table browser queries
        • Filtering table browser queries
        • Performing a BLAT search
        • Creating Custom Tracks
        • UCSC Intersection Queries
        • Viewing cross-species alignments
        • Galaxy
          • Intro to Galaxy
          • Galaxy NGS Illumina QC
          • Galaxy NGS Illumina SE Mapping
          • Galaxy SNP Interval Data
        • Editing and annotation gene structures with Argo
      • GeneGO MetaCore
        • GeneGo Introduction
        • Loading Data Into GeneGO
        • Data Management in GeneGO
        • Setting Thresholds and Background Sets
        • Search And Browse Content Tab
        • Workflows and Reports Tab
        • One-click Analysis Tab
        • Building Network for Your Experimental Data
      • Functional Annotation of Gene Lists
      • Multiple Sequence Alignment
        • Clustalw2
      • Phylogenetic analysis
        • Neighbor Joining method in Phylip
      • Microarray data processing with R/Bioconductor
    • Running Jupyter notebooks on luria cluster nodes
  • Data Management
    • Globus
  • Mini Courses
    • Schedule
      • Previous Teaching
    • Introduction to Unix and KI Computational Resources
      • Basic Unix
        • Why Unix?
        • The Unix Tree
        • The Unix Terminal and Shell
        • Anatomy of a Unix Command
        • Basic Unix Commands
        • Output Redirection and Piping
        • Manual Pages
        • Access Rights
        • Unix Text Editors
          • nano
          • vi / vim
          • emacs
        • Shell Scripts
      • Software Installation
        • Module
        • Conda Environment
      • Slurm
    • Introduction to Unix
      • Why Unix?
      • The Unix Filesystem
        • The Unix Tree
        • Network Filesystems
      • The Unix Shell
        • About the Unix Shell
        • Unix Shell Manual Pages
        • Using the Unix Shell
          • Viewing the Unix Tree
          • Traversing the Unix Tree
          • Editing the Unix Tree
          • Searching the Unix Tree
      • Files
        • Viewing File Contents
        • Creating and Editing Files
        • Manipulating Files
        • Symbolic Links
        • File Ownership
          • How Unix File Ownership Works
          • Change File Ownership and Permissions
        • File Transfer (in-progress)
        • File Storage and Compression
      • Getting System Information
      • Writing Scripts
      • Schedule Scripts Using Crontab
    • Advanced Utilization of IGB Computational Resources
      • High Performance Computing Clusters
      • Slurm
        • Checking the Status of Computing Nodes
        • Submitting Jobs / Slurm Scripts
        • Interactive Sessions
      • Package Management
        • The System Package Manager
        • Environment Modules
        • Conda Environments
      • SSH Port Forwarding
        • SSH Port Forwarding Jupyter Notebooks
      • Containerization
        • Docker
          • Docker Installation
          • Running Docker Images
          • Building Docker Images
        • Singularity
          • Differences from Docker
          • Running Images in Singularity
      • Running Nextflow / nf-core Pipelines
    • Python
      • Introduction to Python for Biologists
        • Interactive Python
        • Types
          • Strings
          • Lists
          • Tuples
          • Dictionaries
        • Control Flow
        • Loops
          • For Loops
          • While Loops
        • Control Flows and Loops
        • Storing Programs for Re-use
        • Reading and Writing Files
        • Functions
      • Biopython
        • About Biopython
        • Quick Start
          • Basic Sequence Analyses
          • SeqRecord
          • Sequence IO
          • Exploration of Entrez Databases
        • Example Projects
          • Coronavirus Exploration
          • Translating a eukaryotic FASTA file of CDS entries
        • Further Resources
      • Machine Learning with Python
        • About Machine Learning
        • Hands-On
          • Project Introduction
          • Supervised Approaches
            • The Logistic Regression Model
            • K-Nearest Neighbors
          • Unsupervised Approaches
            • K-Means Clustering
          • Further Resources
      • Data Processing with Python
        • Pandas
          • About Pandas
          • Making DataFrames
          • Inspecting DataFrames
          • Slicing DataFrames
          • Selecting from DataFrames
          • Editing DataFrames
        • Matplotlib
          • About Matplotlib
          • Basic Plotting
          • Advanced Plotting
        • Seaborn
          • About Seaborn
          • Basic Plotting
          • Visualizing Statistics
          • Visualizing Proteomics Data
          • Visualizing RNAseq Data
    • R
      • Intro to R
        • Before We Start
        • Getting to Know R
        • Variables in R
        • Functions in R
        • Data Manipulation
        • Simple Statistics in R
        • Basic Plotting in R
        • Advanced Plotting in R
        • Writing Figures to a File
        • Further Resources
    • Version Control with Git
      • About Version Control
      • Setting up Git
      • Creating a Repository
      • Tracking Changes
        • Exercises
      • Exploring History
        • Exercises
      • Ignoring Things
      • Remotes in Github
      • Collaborating
      • Conflicts
      • Open Science
      • Licensing
      • Citation
      • Hosting
      • Supplemental
Powered by GitBook

MIT Resources

  • https://accessibility.mit.edu

Massachusetts Institute of Technology

On this page
  • Installing nf-core / Nextflow
  • Using nf-core / Nextflow

Was this helpful?

Export as PDF
  1. Mini Courses
  2. Advanced Utilization of IGB Computational Resources

Running Nextflow / nf-core Pipelines

PreviousRunning Images in SingularityNextPython

Last updated 5 months ago

Was this helpful?

Nextflow is a system which allows you to build reproducible pipelines. It chains together simple actions to create a complex data analysis pipeline. People have used Nextflow to create bioinformatics pipelines for many different operations, including RNASeq analysis, Hi-C analysis, etc.

NF-Core is a "a community effort to collect a curated set of analysis pipelines built using Nextflow." You can find many popular bioinformatics Nextflow pipelines on .

We can take advantage of nf-core on our cluster by installing it in a Conda environment. Before doing so, however, we must set a couple of environment variables in our ~/.bashrc files that Nextflow and nf-core need to correctly cache the Singularity images they'll be using throughout the pipeline.

Edit your ~/.bashrc file and append these environment variables to the end of the file:

export NXF_SINGULARITY_CACHEDIR="$HOME/.singularity/cache"
export NXF_OFFLINE='TRUE'

To make sure these environment variables are set, you can either log out of Luria and log back in, or run the following to load the new shell environment:

source ~/.bash_profile

Installing nf-core / Nextflow

Nextflow and nf-core are installed through Conda, so we'll want to make sure we activate the Conda module before starting:

srun --pty bash # Start an interactive session on a compute node

module load miniconda3/v4

source /home/software/conda/miniconda3/bin/condainit

They also require us to have specific channels configured:

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

Once these channels have been added, we can go along with the installation:

conda create --name nf-core
conda activate nf-core
conda install python=3.12 nf-core nextflow

Once installed, update software

nextflow self-update
conda update nf-core

Using nf-core / Nextflow

nf-core pipelines list

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓
┃ Pipeline Name             ┃ Stars ┃ Latest Release ┃      Released ┃ Last Pulled ┃ Have latest release? ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩
│ riboseq                   │     4 │          1.0.1 │   2 weeks ago │           - │ -                    │
│ sarek                     │   339 │          3.4.1 │   1 weeks ago │           - │ -                    │
│ oncoanalyser              │    14 │            dev │  17 hours ago │           - │ -                    │
│ tfactivity                │     7 │            dev │     yesterday │           - │ -                    │
│ pangenome                 │    47 │          1.1.2 │  1 months ago │           - │ -                    │
│ scnanoseq                 │     2 │            dev │     yesterday │           - │ -                    │
│ fetchngs                  │   123 │         1.12.0 │  2 months ago │           - │ -                    │
│ rnaseq                    │   778 │         3.14.0 │  4 months ago │ 2 hours ago │ No (v3.14.0)         │
...........................................................................................................

│ slamseq                   │     4 │          1.0.0 │   4 years ago │           - │ -                    │
└───────────────────────────┴───────┴────────────────┴───────────────┴─────────────┴──────────────────────┘

Nextflow pipelines all require the revision number and different parameters for running. You can see what parameters are available for a particular revision of a pipeline and which are required at the pipeline's corresponding web page, or by running the pipeline without any parameters and reading the Nextflow error log.

Nextflow also requires you to specify a "profile" for running a pipeline. A profile is essentially a set of sensible settings that the pipeline should run with. Each pipeline has its own profile specific for itself, and two test profiles: test, which runs the pipeline with a minimal public dataset, and test_full, which runs the pipeline with a full-size public dataset.

In addition to these, nf-core provides profiles for common containerization software, such as Docker, Podman, and Singularity.

We'll use the test profile to ensure the pipeline can install and run correctly. We'll also use the singularity profile since Luria is set up for use with singularity. The test profile will give the pipeline its own inputs, so we'll only need to specify --outdir. Make sure you load in singularity since we're setting the singularity profile, instructing Nextflow to use singularity to set up the pipeline.

module load singularity/3.5.0

nextflow run nf-core/rnaseq -r 3.14.0 -profile test,singularity --outdir test

Nextflow will begin to download the necessary Singularity images to run the rnaseq pipeline v3.14.0. This should take anywhere between 7-12 minutes. Since we've set the necessary environment variables for Nextflow to see the Singularity image cache, subsequent runs of this revision of the pipeline will start up much faster.

As the Nextflow pipeline runs, it will put metadata into .nextflow/cache and other data into the work/ directory. If the pipeline errors out at any point, you can read the error log, fix the issue, then add the -resume flag to your command to resume from where you left off. Nextflow will read the metadata and data it generated in the previous run to know where in the pipeline to start back up from.

Once the pipeline is finished setting itself up, it will run with a minimal public dataset as input, then output the results into the test/ directory we specified. This directory will have extensive information about multiple points of the run.

ls test/
bbsplit  fastqc  multiqc  pipeline_info  salmon  star_salmon  trimgalore

You can either check to check what Nextflow pipelines are available, or you can use the command line nf-core tool. The command line tool will also give you information about what pipelines you have installed, the version installed, the last time you used them, etc.

We're going to run an example rnaseq pipeline using rnaseq pipeline v3.14.0. The parameters for this pipeline are enumerated here: . The two required parameters are --input, the "path to comma-separated file containing information about the samples in the experiment" and --outdir, "the output directory where the results will be saved."

nf-core's website
https://nf-co.re/rnaseq/3.14.0/parameters
nf-core's website