# Running Nextflow / nf-core Pipelines

<figure><img src="/files/sMCLojDn1QwmnJYqyHW0" alt=""><figcaption></figcaption></figure>

Nextflow is a system which allows you to build reproducible pipelines. It chains together simple actions to create a complex data analysis pipeline. People have used Nextflow to create bioinformatics pipelines for many different operations, including RNASeq analysis, Hi-C analysis, etc.

NF-Core is a "a community effort to collect a curated set of analysis pipelines built using Nextflow." You can find many popular bioinformatics Nextflow pipelines on [nf-core's website](https://nf-co.re/pipelines).

We can take advantage of nf-core on our cluster by installing it in a Conda environment. Before doing so, however, we must set a couple of environment variables in our `~/.bashrc` files that Nextflow and nf-core need to correctly cache the Singularity images they'll be using throughout the pipeline.

Edit your `~/.bashrc` file and append these environment variables to the end of the file:

```bash
export NXF_SINGULARITY_CACHEDIR="$HOME/.singularity/cache"
export NXF_OFFLINE='TRUE'
```

To make sure these environment variables are set, you can either log out of Luria and log back in, or run the following to load the new shell environment:

```bash
source ~/.bash_profile
```

## Installing nf-core / Nextflow

Nextflow and nf-core are installed through Conda, so we'll want to make sure we activate the Conda module before starting:

```bash
srun --pty bash # Start an interactive session on a compute node

module load miniconda3/v4

source /home/software/conda/miniconda3/bin/condainit
```

They also require us to have specific channels configured:

```bash
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
```

Once these channels have been added, we can go along with the installation:

```bash
conda create --name nf-core
conda activate nf-core
conda install python=3.12 nf-core=2.13.1 nextflow=24.10.4
```

{% hint style="warning" %}
Currently, Nextflow 24 is the most compatible version with our system. Nextflow will advise you to update, but please do not, as this will break your pipelines.
{% endhint %}

## Using nf-core / Nextflow

You can either check [nf-core's website](https://nf-co.re/pipelines) to check what Nextflow pipelines are available, or you can use the command line `nf-core` tool. The command line tool will also give you information about what pipelines you have installed, the version installed, the last time you used them, etc.

```bash
nf-core list

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓
┃ Pipeline Name             ┃ Stars ┃ Latest Release ┃      Released ┃ Last Pulled ┃ Have latest release? ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩
│ riboseq                   │     4 │          1.0.1 │   2 weeks ago │           - │ -                    │
│ sarek                     │   339 │          3.4.1 │   1 weeks ago │           - │ -                    │
│ oncoanalyser              │    14 │            dev │  17 hours ago │           - │ -                    │
│ tfactivity                │     7 │            dev │     yesterday │           - │ -                    │
│ pangenome                 │    47 │          1.1.2 │  1 months ago │           - │ -                    │
│ scnanoseq                 │     2 │            dev │     yesterday │           - │ -                    │
│ fetchngs                  │   123 │         1.12.0 │  2 months ago │           - │ -                    │
│ rnaseq                    │   778 │         3.14.0 │  4 months ago │ 2 hours ago │ No (v3.14.0)         │
...........................................................................................................

│ slamseq                   │     4 │          1.0.0 │   4 years ago │           - │ -                    │
└───────────────────────────┴───────┴────────────────┴───────────────┴─────────────┴──────────────────────┘
```

Nextflow pipelines all require the revision number and different parameters for running. You can see what parameters are available for a particular revision of a pipeline and which are required at the pipeline's corresponding web page, or by running the pipeline without any parameters and reading the Nextflow error log.

Nextflow also requires you to specify a "profile" for running a pipeline. A profile is essentially a set of sensible settings that the pipeline should run with. Each pipeline has its own profile specific for itself, and two test profiles: `test`, which runs the pipeline with a minimal public dataset, and `test_full`, which runs the pipeline with a full-size public dataset.

In addition to these, nf-core provides profiles for common containerization software, such as Docker, Podman, and Singularity.

We're going to run an example rnaseq pipeline using rnaseq pipeline v3.14.0. The parameters for this pipeline are enumerated here: <https://nf-co.re/rnaseq/3.14.0/parameters>. The two required parameters are `--input`, the "path to comma-separated file containing information about the samples in the experiment" and `--outdir`, "the output directory where the results will be saved."

We'll use the test profile to ensure the pipeline can install and run correctly. We'll also use the singularity profile since Luria is set up for use with singularity. The test profile will give the pipeline its own inputs, so we'll only need to specify `--outdir`. Make sure you load in singularity since we're setting the singularity profile, instructing Nextflow to use singularity to set up the pipeline.

```bash
module load singularity/3.10.4

nextflow run nf-core/rnaseq -r 3.14.0 -profile test,singularity --outdir test
```

Nextflow will begin to download the necessary Singularity images to run the rnaseq pipeline v3.14.0. This should take anywhere between 7-12 minutes. Since we've set the necessary environment variables for Nextflow to see the Singularity image cache, subsequent runs of this revision of the pipeline will start up much faster.

As the Nextflow pipeline runs, it will put metadata into `.nextflow/cache` and other data into the `work/` directory. If the pipeline errors out at any point, you can read the error log, fix the issue, then add the `-resume` flag to your command to resume from where you left off. Nextflow will read the metadata and data it generated in the previous run to know where in the pipeline to start back up from.

Once the pipeline is finished setting itself up, it will run with a minimal public dataset as input, then output the results into the `test/` directory we specified. This directory will have extensive information about multiple points of the run.

```bash
ls test/
bbsplit  fastqc  multiqc  pipeline_info  salmon  star_salmon  trimgalore
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://igb.mit.edu/mini-courses/advanced-utilization-of-igb-computational-resources/running-nextflow-nf-core-pipelines.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
