# Galaxy SNP Interval Data

his example is inspired by a screencast published on the Galaxy website. It consists in combining exon information and SNP information, both represented as interval data.

**1. Load exon data from UCSC tables**

* On the Tool Panel, click on Get Data → UCSC Main Table Browser.
* This tools allows you to upload data from the UCSC Tables.
  * Use the following parameters:
    * Group: Variation and Repeats
    * Track: SNP(130)
    * Region: chr19:1-100,000
    * Output format: BED
    * Send output to Galaxy: checked
  * Click "Get Output" button.
    * Select the radiobox so that one BED record is created for the whole gene.
    * Click the button "send query to Galaxy"
* With these parameters, this tool creates a BED file containing all the SNPs for the first 1M bases of chromosome 19.
* Once the job is completed, change the name of the dataset to "SNPs chr19".

<figure><img src="/files/AZeU5VUbLWkKPcRcRyev" alt=""><figcaption></figcaption></figure>

**2. Load SNP data from UCSC tables**

* On the Tool Panel, click on Get Data → UCSC Main Table Browser.
* This tools allows you to upload data from the UCSC Tables.
  * Use the following parameters:
    * Group: Genes and Gene Prediction
    * Track: UCSC Genes
    * Region: chr19:1-100,000
    * Output format: BED
    * Send output to Galaxy: checked
  * Click "Get Output" button.
    * Select the radiobox so that one BED record is created per coding exon.
    * Click the button "send query to Galaxy"
* With these parameters, this tool creates a BED file containing all the exon information for the first 1M bases of chromosome 19.
* Once the job is completed, change the name of the dataset to "exons chr19".

<figure><img src="/files/nwy8xrYHgQ58Qm4PPTAp" alt=""><figcaption></figcaption></figure>

**3. Join exon and SNP information**

* On the Tool Panel, click on Operate on Genomic Intervals → Join the intervals.
* This tools allows you to join the information from two interval files based on the coordinates of each feature.
  * Select the SNP chr19 and the exons chr19 files as input.
  * Click on the "Execute" button.
* Because some exons might contain multiple SNPs, the resulting output might have size greater than the two input files.

<figure><img src="/files/kkcYHa4LhCXIsbyQml8k" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/EFza1Yw0R4znwVdwS1iJ" alt=""><figcaption></figcaption></figure>

**4. Find the number of SNPs per exon**

* On the Tool Panel, click on Join, Subtract and Group → Group.
* This tools groups the information based on a given column and performs the aggregation operations on the other columns.
  * Select data 3 as input.
  * Select column 4 (exon ID).
  * Add operation to count c4.
  * Click on the "Execute" button.

<figure><img src="/files/WqsyhlxMPH1OD7e4fvas" alt=""><figcaption></figcaption></figure>

**5. Find the exon with the most SNPs**

* On the Tool Panel, click on Filter and Sort → Sort.
* This tools ...
  * Select data 4 as input.
  * Select column 2 as sorting key.
  * Click on the "Execute" button.

<figure><img src="/files/a0KjylMNTXMuxsCdtl6S" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/QBFNj64puUzMK3JmSvgC" alt=""><figcaption></figcaption></figure>

**6. Find how many chromosomes have a given number of exons**

* On the Tool Panel, click on Join, Subtract, Group → Group.
* This tools ...
  * Select data 5 as input (sorted).
  * Select column 2 as sorting key.
  * Set the operation to "count" on column 1.
  * Click on the "Execute" button.

<figure><img src="/files/xIc47ljbM3lHIYtFgsxj" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/ocgxSLkf9xTbxLIiTXLn" alt=""><figcaption></figcaption></figure>

**7. Filter exons with at least 10 SNPs**

* On the Tool Panel, click on Filter and Sort → Filter.
* This tools ...
  * Select data 5 as input (sorted).
  * Set the condition to SNP count greater than 10 (i.e. c2 >= 10).
  * Click on the "Execute" button.

<figure><img src="/files/QrHHrFkPfTpX9bbC0hqy" alt=""><figcaption></figcaption></figure>

**8. Retrieve original information for exons**

* On the Tool Panel, click on Join, Subtract, Group → Join.
* This tools ...
* This is equivalent to a relational join (not an interval join).
  * Select the exons with more than 10 SNPs as first input.
  * Select the exon data for chr19:1-1,000,000 as second input.
  * Select column 1 (exonID) for the first file.
  * Select column 4 (exonID) for the second file.
  * Click the "Execute" button.
* Now repeat this step but invert the order of the file. Note that this time the output is a BED-formatted output, wherease before it was a tabular file.

<figure><img src="/files/oUiOpBeF2j3eXcI4qmWJ" alt=""><figcaption></figcaption></figure>

&#x20;![](/files/tys6p6nco1L5LlViK8AK)

**9. Display using the UCSC Browser**

* On the Data Panel on the right-hand size, click on the last job → Display at UCSC.
  * The User track show the exons that have more than 10 SNPs in the region of chr19 considered.

<figure><img src="/files/0bGbsipJo9dXQb91QLVp" alt=""><figcaption></figcaption></figure>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://igb.mit.edu/bioinformatics-topics/tasks-bioinformatics-methods/ucsc-genome-bioinformatics/galaxy/galaxy-snp-interval-data.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
