Galaxy NGS Illumina SE Mapping
Last updated
Last updated
MIT Resources
https://accessibility.mit.eduMassachusetts Institute of Technology
This tutorial shows how to perform basic QC on Illumina data, such as basic quality statistics, quality score boxplots, trimming and masking.
1. Load fastq file and annotate the uploaded data
On the Tool Panel, click on Get Data → Upload File.
This tools allows you to upload a data from a file, url or a textbox.
Select Galaxy_GM12878_trimmed.fastq as input file.
Select "fastqsanger" as file format.
Select "hg18" as genome.
Click the "Execute" button.
2. Map to reference genome (hg18) using Bowtie
On the Tool Panel, click on NGS Toolbox Beta → NGS Mapping → Map with Bowtie for Illmina.
This tools allows you to run the aligner Bowtie. The output is a SAM files with all the read alignments.
Select hg canonical as reference genome.
Leave other settings as default.
Click the "Execute" button.
3. Filter SAM file on bitwise flag values
On the Tool Panel, click on NGS Toolbox Beta → NGS SAMtools → Filter SAM on bitwise flag values.
This tools
Select data 2 as input dataset.
Add new flag with type set to "the read is unmapped" and the value set to "No".
Add new flag with type set to "read strand" and the value set to "Yes".
Click the "Execute" button.
With these parameters, the resulting output consists of those reads that are properly mapped and are on the reverse strand.
4. Find how many reads map to each chromosome
On the Tool Panel, click on Join, Subtract and Group → Group →
This tools
Select data 2 (bowtie output) as input dataset.
Group by column 3 (reference name, i.e. chromosome name)
Add new operation: count on column 1 .
Click the "Execute" button.
With these parameters, the resulting output consists of those reads that are properly mapped and are on the reverse strand.
Edit the attributes of this module by clicking on the eye icon: rename as "read distribution by chromosome".
5. Find the most represented chromosome
On the Tool Panel, click on Filter and Sort → Sort data →
This tools
Select the column representing the key to sort
Select "Numerical sort" in "descending order" as options.
Click the "Execute" button.
With these parameters, the results show that chr19 is the most represented chromosome.
6. Convert SAM to BAM
On the Tool Panel, click on NGS Toolbox Beta→ NGS Samtools &rarr SAM to BAM converter.
This tools converts SAM-formatted files into BAM-formatted files.
Select the SAM file to convert.
Click the "Execute" button.
7. Compute general statistics via Flagstat operation
On the Tool Panel, click on NGS Toolbox Beta→ NGS Samtools &rarr flagstat.
This tools provides a simple summary based on BAM-format.
Select data 6 (BAM file) as input.