> For the complete documentation index, see [llms.txt](https://igb.mit.edu/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://igb.mit.edu/bioinformatics-topics/tasks-bioinformatics-methods/ucsc-genome-bioinformatics/galaxy/galaxy-ngs-illumina-qc.md).

# Galaxy NGS Illumina QC

This tutorial shows how to perform basic QC on Illumina data, such as basic quality statistics, quality score boxplots, trimming and masking.

**1. Load a fastq file and annotate the uploaded data**

* On the Tool Panel, click on Get Data → Upload File.
  * Browse to your home folder and select the file *Galaxy\_GM12878.fastqillumina'.*
  * Set the format to "fastqillumina".
* Click the "Execute" button.
* Once the job is completed, click on the pencil icon to edit the attributes.
  * Set the Database field to "Human Mar 2006 (NCBI36/hg18).
  * Click the "Save" button.

<figure><img src="/files/dxuhHpqetbriZwalW6Bz" alt=""><figcaption></figcaption></figure>

**2. Convert to Sanger FASTQ format**

* On the Tool Panel, click on NGS Toolbox Beta → NGS: QC and Manipulation → FASTQ Groomer.
* This tool converts between various FASTQ quality formats.
  * By default, the quality format output is Sanger FASTQ.
  * Sanger FASTQ is the required format for downstream analyses in Galaxy.
  * Set the input type to "Illumina 1.3+"
  * Click the "Execute" button.
* Once the job is completed, click on the pencil icon and edit the name of the job as "GM12878 fastqsanger".

<figure><img src="/files/uwnnrus32JAxEpaKKPuO" alt=""><figcaption></figcaption></figure>

**3. Compute Quality Statistics**

* On the Tool Panel, click on NGS Toolbox Beta → Fastx-Toolkit → Compute Quality Statistics.
* This tool compute quality statistics such as min, max, mean, median, Q1, Q3, IQR, etc. of quality scores.
  * Select Data 2 as input library.
  * Click the "Execute" button.

<figure><img src="/files/pYPwwkQNrA2pk9QDv9xs" alt=""><figcaption></figcaption></figure>

**4. Draw Quality Score Boxplot**

* On the Tool Panel, click on NGS Toolbox Beta → Fastx-Toolkit → Draw Quality Score Boxplot.
* This tool creates a box graph of the quality scores in the library.
  * Select Data 3 as statistic report file.
  * Click the "Execute" button.
* Once the job is completed, click on the eye icon to see the boxplot figure. You can expand and collapse the figure by clicking on the arrows placed on the sides of the main panel.

<figure><img src="/files/IsKTXUx2eDzqfoEQBjUM" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/kNPa6Pi6mok5gIPY3kjC" alt=""><figcaption></figcaption></figure>

**5. Trim Sequence Reads to length of 60 bases**

* On the Tool Panel, click on NGS Toolbox Beta → FASTQ Trimmer (by column).
* This tool trims the end of the reads.
  * Select Data 2 as input FASTQ file.
  * Set the offset from 5' end to 16.
  * With these parameters, all reads are trimmed after the 60th base.
  * Click the "Execute" button.
* Once the job is completed, click on the eye icon to edit the attributes of the resulting data and change the name to "GM12878 Trimmed fastqsanger".

<figure><img src="/files/3AjnWOHNY6ipNCBhrZT7" alt=""><figcaption></figcaption></figure>

**6. Apply Quality Masker to bases with quality lower than 20**

* On the Tool Panel, click on NGS Toolbox Beta → FASTQ Masker.
* This tool allows masking base characters in FASTQ files according to quality score value and comparison method.
  * Select Data 2 as input file to mask.
  * Set the criterion as "less than" and the threshold to 20.
  * Click the "Execute" button.
  * With these parameters, any base with quality less than 20 will be masked with a symbol "N".

<figure><img src="/files/l8cu6H9ZS2iSoGVsyt6o" alt=""><figcaption></figcaption></figure>

**7. Apply FASTQ Quality Trimmer**

* On the Tool Panel, click on NGS Toolbox Beta → FASTQ Quality Trimmer (by sliding window).
* This tool allows trimming the ends of reads based upon the aggregate value of quality scores found within a sliding window. Several criteria can be used to determine the aggregate value (min, max, sum, mean) within the sliding window.
  * Select Data 2 as input file.
  * Select "Trim 5' end" only from the scroll down menu.
  * Set window size to 3.
  * Select "max score" as aggregate action.
  * Select ">= 2" as criterion for trimming
  * Click the "Execute" button.

<figure><img src="/files/LzhAZQVgGMUmDOWbOIpv" alt=""><figcaption></figcaption></figure>

**8. Create a Data Subset by selecting the first 2,500 sequence reads**

* On the Tool Panel, click on Text Manipulation → Select first lines.
* This tool select the first N lines of the input dataset.
  * Set to 10,000 the number of lines to select.
  * Select data 5 as input.
  * Click the "Execute" button.
  * With these parameters, the first 10,000 lines of the input FASTQ file are selected, corresponding to the first 2,500 sequence reads.

<figure><img src="/files/rpFF5C6p900yCwjEDBsI" alt=""><figcaption></figcaption></figure>


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://igb.mit.edu/bioinformatics-topics/tasks-bioinformatics-methods/ucsc-genome-bioinformatics/galaxy/galaxy-ngs-illumina-qc.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
