his example is inspired by a screencast published on the Galaxy website. It consists in combining exon information and SNP information, both represented as interval data.
1. Load exon data from UCSC tables
On the Tool Panel, click on Get Data → UCSC Main Table Browser.
This tools allows you to upload data from the UCSC Tables.
Use the following parameters:
Group: Variation and Repeats
Track: SNP(130)
Region: chr19:1-100,000
Click "Get Output" button.
Select the radiobox so that one BED record is created for the whole gene.
Click the button "send query to Galaxy"
With these parameters, this tool creates a BED file containing all the SNPs for the first 1M bases of chromosome 19.
Once the job is completed, change the name of the dataset to "SNPs chr19".
2. Load SNP data from UCSC tables
On the Tool Panel, click on Get Data → UCSC Main Table Browser.
This tools allows you to upload data from the UCSC Tables.
Use the following parameters:
3. Join exon and SNP information
On the Tool Panel, click on Operate on Genomic Intervals → Join the intervals.
This tools allows you to join the information from two interval files based on the coordinates of each feature.
Select the SNP chr19 and the exons chr19 files as input.
4. Find the number of SNPs per exon
On the Tool Panel, click on Join, Subtract and Group → Group.
This tools groups the information based on a given column and performs the aggregation operations on the other columns.
Select data 3 as input.
5. Find the exon with the most SNPs
On the Tool Panel, click on Filter and Sort → Sort.
This tools ...
Select data 4 as input.
Select column 2 as sorting key.
6. Find how many chromosomes have a given number of exons
On the Tool Panel, click on Join, Subtract, Group → Group.
This tools ...
Select data 5 as input (sorted).
Select column 2 as sorting key.
7. Filter exons with at least 10 SNPs
On the Tool Panel, click on Filter and Sort → Filter.
This tools ...
Select data 5 as input (sorted).
Set the condition to SNP count greater than 10 (i.e. c2 >= 10).
8. Retrieve original information for exons
On the Tool Panel, click on Join, Subtract, Group → Join.
This tools ...
This is equivalent to a relational join (not an interval join).
Select the exons with more than 10 SNPs as first input.
9. Display using the UCSC Browser
On the Data Panel on the right-hand size, click on the last job → Display at UCSC.
The User track show the exons that have more than 10 SNPs in the region of chr19 considered.
Send output to Galaxy: checked
Track: UCSC Genes
Region: chr19:1-100,000
Output format: BED
Send output to Galaxy: checked
Click "Get Output" button.
Select the radiobox so that one BED record is created per coding exon.
Click the button "send query to Galaxy"
With these parameters, this tool creates a BED file containing all the exon information for the first 1M bases of chromosome 19.
Once the job is completed, change the name of the dataset to "exons chr19".
Because some exons might contain multiple SNPs, the resulting output might have size greater than the two input files.
Add operation to count c4.
Click on the "Execute" button.
Click on the "Execute" button.
Set the operation to "count" on column 1.
Click on the "Execute" button.
Click on the "Execute" button.
Select the exon data for chr19:1-1,000,000 as second input.
Select column 1 (exonID) for the first file.
Select column 4 (exonID) for the second file.
Click the "Execute" button.
Now repeat this step but invert the order of the file. Note that this time the output is a BED-formatted output, wherease before it was a tabular file.












