Galaxy SNP Interval Data
Last updated
Last updated
MIT Resources
https://accessibility.mit.eduMassachusetts Institute of Technology
his example is inspired by a screencast published on the Galaxy website. It consists in combining exon information and SNP information, both represented as interval data.
1. Load exon data from UCSC tables
On the Tool Panel, click on Get Data → UCSC Main Table Browser.
This tools allows you to upload data from the UCSC Tables.
Use the following parameters:
Group: Variation and Repeats
Track: SNP(130)
Region: chr19:1-100,000
Output format: BED
Send output to Galaxy: checked
Click "Get Output" button.
Select the radiobox so that one BED record is created for the whole gene.
Click the button "send query to Galaxy"
With these parameters, this tool creates a BED file containing all the SNPs for the first 1M bases of chromosome 19.
Once the job is completed, change the name of the dataset to "SNPs chr19".
2. Load SNP data from UCSC tables
On the Tool Panel, click on Get Data → UCSC Main Table Browser.
This tools allows you to upload data from the UCSC Tables.
Use the following parameters:
Group: Genes and Gene Prediction
Track: UCSC Genes
Region: chr19:1-100,000
Output format: BED
Send output to Galaxy: checked
Click "Get Output" button.
Select the radiobox so that one BED record is created per coding exon.
Click the button "send query to Galaxy"
With these parameters, this tool creates a BED file containing all the exon information for the first 1M bases of chromosome 19.
Once the job is completed, change the name of the dataset to "exons chr19".
3. Join exon and SNP information
On the Tool Panel, click on Operate on Genomic Intervals → Join the intervals.
This tools allows you to join the information from two interval files based on the coordinates of each feature.
Select the SNP chr19 and the exons chr19 files as input.
Click on the "Execute" button.
Because some exons might contain multiple SNPs, the resulting output might have size greater than the two input files.
4. Find the number of SNPs per exon
On the Tool Panel, click on Join, Subtract and Group → Group.
This tools groups the information based on a given column and performs the aggregation operations on the other columns.
Select data 3 as input.
Select column 4 (exon ID).
Add operation to count c4.
Click on the "Execute" button.
5. Find the exon with the most SNPs
On the Tool Panel, click on Filter and Sort → Sort.
This tools ...
Select data 4 as input.
Select column 2 as sorting key.
Click on the "Execute" button.
6. Find how many chromosomes have a given number of exons
On the Tool Panel, click on Join, Subtract, Group → Group.
This tools ...
Select data 5 as input (sorted).
Select column 2 as sorting key.
Set the operation to "count" on column 1.
Click on the "Execute" button.
7. Filter exons with at least 10 SNPs
On the Tool Panel, click on Filter and Sort → Filter.
This tools ...
Select data 5 as input (sorted).
Set the condition to SNP count greater than 10 (i.e. c2 >= 10).
Click on the "Execute" button.
8. Retrieve original information for exons
On the Tool Panel, click on Join, Subtract, Group → Join.
This tools ...
This is equivalent to a relational join (not an interval join).
Select the exons with more than 10 SNPs as first input.
Select the exon data for chr19:1-1,000,000 as second input.
Select column 1 (exonID) for the first file.
Select column 4 (exonID) for the second file.
Click the "Execute" button.
Now repeat this step but invert the order of the file. Note that this time the output is a BED-formatted output, wherease before it was a tabular file.
9. Display using the UCSC Browser
On the Data Panel on the right-hand size, click on the last job → Display at UCSC.
The User track show the exons that have more than 10 SNPs in the region of chr19 considered.