Galaxy SNP Interval Data

his example is inspired by a screencast published on the Galaxy website. It consists in combining exon information and SNP information, both represented as interval data.

1. Load exon data from UCSC tables

  • On the Tool Panel, click on Get Data → UCSC Main Table Browser.

  • This tools allows you to upload data from the UCSC Tables.

    • Use the following parameters:

      • Group: Variation and Repeats

      • Track: SNP(130)

      • Region: chr19:1-100,000

      • Output format: BED

      • Send output to Galaxy: checked

    • Click "Get Output" button.

      • Select the radiobox so that one BED record is created for the whole gene.

      • Click the button "send query to Galaxy"

  • With these parameters, this tool creates a BED file containing all the SNPs for the first 1M bases of chromosome 19.

  • Once the job is completed, change the name of the dataset to "SNPs chr19".

2. Load SNP data from UCSC tables

  • On the Tool Panel, click on Get Data → UCSC Main Table Browser.

  • This tools allows you to upload data from the UCSC Tables.

    • Use the following parameters:

      • Group: Genes and Gene Prediction

      • Track: UCSC Genes

      • Region: chr19:1-100,000

      • Output format: BED

      • Send output to Galaxy: checked

    • Click "Get Output" button.

      • Select the radiobox so that one BED record is created per coding exon.

      • Click the button "send query to Galaxy"

  • With these parameters, this tool creates a BED file containing all the exon information for the first 1M bases of chromosome 19.

  • Once the job is completed, change the name of the dataset to "exons chr19".

3. Join exon and SNP information

  • On the Tool Panel, click on Operate on Genomic Intervals → Join the intervals.

  • This tools allows you to join the information from two interval files based on the coordinates of each feature.

    • Select the SNP chr19 and the exons chr19 files as input.

    • Click on the "Execute" button.

  • Because some exons might contain multiple SNPs, the resulting output might have size greater than the two input files.

4. Find the number of SNPs per exon

  • On the Tool Panel, click on Join, Subtract and Group → Group.

  • This tools groups the information based on a given column and performs the aggregation operations on the other columns.

    • Select data 3 as input.

    • Select column 4 (exon ID).

    • Add operation to count c4.

    • Click on the "Execute" button.

5. Find the exon with the most SNPs

  • On the Tool Panel, click on Filter and Sort → Sort.

  • This tools ...

    • Select data 4 as input.

    • Select column 2 as sorting key.

    • Click on the "Execute" button.

6. Find how many chromosomes have a given number of exons

  • On the Tool Panel, click on Join, Subtract, Group → Group.

  • This tools ...

    • Select data 5 as input (sorted).

    • Select column 2 as sorting key.

    • Set the operation to "count" on column 1.

    • Click on the "Execute" button.

7. Filter exons with at least 10 SNPs

  • On the Tool Panel, click on Filter and Sort → Filter.

  • This tools ...

    • Select data 5 as input (sorted).

    • Set the condition to SNP count greater than 10 (i.e. c2 >= 10).

    • Click on the "Execute" button.

8. Retrieve original information for exons

  • On the Tool Panel, click on Join, Subtract, Group → Join.

  • This tools ...

  • This is equivalent to a relational join (not an interval join).

    • Select the exons with more than 10 SNPs as first input.

    • Select the exon data for chr19:1-1,000,000 as second input.

    • Select column 1 (exonID) for the first file.

    • Select column 4 (exonID) for the second file.

    • Click the "Execute" button.

  • Now repeat this step but invert the order of the file. Note that this time the output is a BED-formatted output, wherease before it was a tabular file.

9. Display using the UCSC Browser

  • On the Data Panel on the right-hand size, click on the last job → Display at UCSC.

    • The User track show the exons that have more than 10 SNPs in the region of chr19 considered.

Last updated

Was this helpful?