Galaxy NGS Illumina SE Mapping

This tutorial shows how to perform basic QC on Illumina data, such as basic quality statistics, quality score boxplots, trimming and masking.

Single-End Mapping of Illumina Data

1. Load fastq file and annotate the uploaded data

  • On the Tool Panel, click on Get Data → Upload File.

  • This tools allows you to upload a data from a file, url or a textbox.

    • Select Galaxy_GM12878_trimmed.fastq as input file.

    • Select "fastqsanger" as file format.

    • Select "hg18" as genome.

  • Click the "Execute" button.

2. Map to reference genome (hg18) using Bowtie

  • On the Tool Panel, click on NGS Toolbox Beta → NGS Mapping → Map with Bowtie for Illmina.

  • This tools allows you to run the aligner Bowtie. The output is a SAM files with all the read alignments.

    • Select hg canonical as reference genome.

    • Leave other settings as default.

  • Click the "Execute" button.

3. Filter SAM file on bitwise flag values

  • On the Tool Panel, click on NGS Toolbox Beta → NGS SAMtools → Filter SAM on bitwise flag values.

  • This tools

    • Select data 2 as input dataset.

    • Add new flag with type set to "the read is unmapped" and the value set to "No".

    • Add new flag with type set to "read strand" and the value set to "Yes".

  • Click the "Execute" button.

  • With these parameters, the resulting output consists of those reads that are properly mapped and are on the reverse strand.

4. Find how many reads map to each chromosome

  • On the Tool Panel, click on Join, Subtract and Group → Group →

  • This tools

    • Select data 2 (bowtie output) as input dataset.

    • Group by column 3 (reference name, i.e. chromosome name)

    • Add new operation: count on column 1 .

  • Click the "Execute" button.

  • With these parameters, the resulting output consists of those reads that are properly mapped and are on the reverse strand.

  • Edit the attributes of this module by clicking on the eye icon: rename as "read distribution by chromosome".

5. Find the most represented chromosome

  • On the Tool Panel, click on Filter and Sort → Sort data →

  • This tools

    • Select the column representing the key to sort

    • Select "Numerical sort" in "descending order" as options.

  • Click the "Execute" button.

  • With these parameters, the results show that chr19 is the most represented chromosome.

6. Convert SAM to BAM

  • On the Tool Panel, click on NGS Toolbox Beta→ NGS Samtools &rarr SAM to BAM converter.

  • This tools converts SAM-formatted files into BAM-formatted files.

    • Select the SAM file to convert.

    • Click the "Execute" button.

7. Compute general statistics via Flagstat operation

  • On the Tool Panel, click on NGS Toolbox Beta→ NGS Samtools &rarr flagstat.

  • This tools provides a simple summary based on BAM-format.

    • Select data 6 (BAM file) as input.

Last updated

Was this helpful?