LogoLogo
LogoLogo
  • The Barbara K. Ostrom (1978) Bioinformatics and Computing Facility
  • Computing Resources
    • Active Data Storage
    • Archive Data Storage
    • Luria Cluster
      • FAQs
    • Other Resources
  • Bioinformatics Topics
    • Tools - A Basic Bioinformatics Toolkit
      • Getting more out of Microsoft Excel
      • Bioinformatics Applications of Unix
        • Unix commands applied to bioinformatics
        • Manipulate NGS files using UNIX commands
        • Manipulate alignment files using UNIX commands
      • Alignments and Mappers
      • Relational databases
        • Running Joins on Galaxy
      • Spotfire
    • Tasks - Bioinformatics Methods
      • UCSC Genome Bioinformatics
        • Interacting with the UCSC Genome Browser
        • Obtaining DNA sequence from the UCSC Database
        • Obtaining genomic data from the UCSC database using table browser queries
        • Filtering table browser queries
        • Performing a BLAT search
        • Creating Custom Tracks
        • UCSC Intersection Queries
        • Viewing cross-species alignments
        • Galaxy
          • Intro to Galaxy
          • Galaxy NGS Illumina QC
          • Galaxy NGS Illumina SE Mapping
          • Galaxy SNP Interval Data
        • Editing and annotation gene structures with Argo
      • GeneGO MetaCore
        • GeneGo Introduction
        • Loading Data Into GeneGO
        • Data Management in GeneGO
        • Setting Thresholds and Background Sets
        • Search And Browse Content Tab
        • Workflows and Reports Tab
        • One-click Analysis Tab
        • Building Network for Your Experimental Data
      • Functional Annotation of Gene Lists
      • Multiple Sequence Alignment
        • Clustalw2
      • Phylogenetic analysis
        • Neighbor Joining method in Phylip
      • Microarray data processing with R/Bioconductor
    • Running Jupyter notebooks on luria cluster nodes
  • Data Management
    • Globus
  • Mini Courses
    • Schedule
      • Previous Teaching
    • Introduction to Unix and KI Computational Resources
      • Basic Unix
        • Why Unix?
        • The Unix Tree
        • The Unix Terminal and Shell
        • Anatomy of a Unix Command
        • Basic Unix Commands
        • Output Redirection and Piping
        • Manual Pages
        • Access Rights
        • Unix Text Editors
          • nano
          • vi / vim
          • emacs
        • Shell Scripts
      • Software Installation
        • Module
        • Conda Environment
      • Slurm
    • Introduction to Unix
      • Why Unix?
      • The Unix Filesystem
        • The Unix Tree
        • Network Filesystems
      • The Unix Shell
        • About the Unix Shell
        • Unix Shell Manual Pages
        • Using the Unix Shell
          • Viewing the Unix Tree
          • Traversing the Unix Tree
          • Editing the Unix Tree
          • Searching the Unix Tree
      • Files
        • Viewing File Contents
        • Creating and Editing Files
        • Manipulating Files
        • Symbolic Links
        • File Ownership
          • How Unix File Ownership Works
          • Change File Ownership and Permissions
        • File Transfer (in-progress)
        • File Storage and Compression
      • Getting System Information
      • Writing Scripts
      • Schedule Scripts Using Crontab
    • Advanced Utilization of IGB Computational Resources
      • High Performance Computing Clusters
      • Slurm
        • Checking the Status of Computing Nodes
        • Submitting Jobs / Slurm Scripts
        • Interactive Sessions
      • Package Management
        • The System Package Manager
        • Environment Modules
        • Conda Environments
      • SSH Port Forwarding
        • SSH Port Forwarding Jupyter Notebooks
      • Containerization
        • Docker
          • Docker Installation
          • Running Docker Images
          • Building Docker Images
        • Singularity
          • Differences from Docker
          • Running Images in Singularity
      • Running Nextflow / nf-core Pipelines
    • Python
      • Introduction to Python for Biologists
        • Interactive Python
        • Types
          • Strings
          • Lists
          • Tuples
          • Dictionaries
        • Control Flow
        • Loops
          • For Loops
          • While Loops
        • Control Flows and Loops
        • Storing Programs for Re-use
        • Reading and Writing Files
        • Functions
      • Biopython
        • About Biopython
        • Quick Start
          • Basic Sequence Analyses
          • SeqRecord
          • Sequence IO
          • Exploration of Entrez Databases
        • Example Projects
          • Coronavirus Exploration
          • Translating a eukaryotic FASTA file of CDS entries
        • Further Resources
      • Machine Learning with Python
        • About Machine Learning
        • Hands-On
          • Project Introduction
          • Supervised Approaches
            • The Logistic Regression Model
            • K-Nearest Neighbors
          • Unsupervised Approaches
            • K-Means Clustering
          • Further Resources
      • Data Processing with Python
        • Pandas
          • About Pandas
          • Making DataFrames
          • Inspecting DataFrames
          • Slicing DataFrames
          • Selecting from DataFrames
          • Editing DataFrames
        • Matplotlib
          • About Matplotlib
          • Basic Plotting
          • Advanced Plotting
        • Seaborn
          • About Seaborn
          • Basic Plotting
          • Visualizing Statistics
          • Visualizing Proteomics Data
          • Visualizing RNAseq Data
    • R
      • Intro to R
        • Before We Start
        • Getting to Know R
        • Variables in R
        • Functions in R
        • Data Manipulation
        • Simple Statistics in R
        • Basic Plotting in R
        • Advanced Plotting in R
        • Writing Figures to a File
        • Further Resources
    • Version Control with Git
      • About Version Control
      • Setting up Git
      • Creating a Repository
      • Tracking Changes
        • Exercises
      • Exploring History
        • Exercises
      • Ignoring Things
      • Remotes in Github
      • Collaborating
      • Conflicts
      • Open Science
      • Licensing
      • Citation
      • Hosting
      • Supplemental
Powered by GitBook

MIT Resources

  • https://accessibility.mit.edu

Massachusetts Institute of Technology

On this page

Was this helpful?

Export as PDF
  1. Bioinformatics Topics
  2. Tasks - Bioinformatics Methods
  3. UCSC Genome Bioinformatics
  4. Galaxy

Galaxy SNP Interval Data

PreviousGalaxy NGS Illumina SE MappingNextEditing and annotation gene structures with Argo

Last updated 1 year ago

Was this helpful?

his example is inspired by a screencast published on the Galaxy website. It consists in combining exon information and SNP information, both represented as interval data.

1. Load exon data from UCSC tables

  • On the Tool Panel, click on Get Data → UCSC Main Table Browser.

  • This tools allows you to upload data from the UCSC Tables.

    • Use the following parameters:

      • Group: Variation and Repeats

      • Track: SNP(130)

      • Region: chr19:1-100,000

      • Output format: BED

      • Send output to Galaxy: checked

    • Click "Get Output" button.

      • Select the radiobox so that one BED record is created for the whole gene.

      • Click the button "send query to Galaxy"

  • With these parameters, this tool creates a BED file containing all the SNPs for the first 1M bases of chromosome 19.

  • Once the job is completed, change the name of the dataset to "SNPs chr19".

2. Load SNP data from UCSC tables

  • On the Tool Panel, click on Get Data → UCSC Main Table Browser.

  • This tools allows you to upload data from the UCSC Tables.

    • Use the following parameters:

      • Group: Genes and Gene Prediction

      • Track: UCSC Genes

      • Region: chr19:1-100,000

      • Output format: BED

      • Send output to Galaxy: checked

    • Click "Get Output" button.

      • Select the radiobox so that one BED record is created per coding exon.

      • Click the button "send query to Galaxy"

  • With these parameters, this tool creates a BED file containing all the exon information for the first 1M bases of chromosome 19.

  • Once the job is completed, change the name of the dataset to "exons chr19".

3. Join exon and SNP information

  • On the Tool Panel, click on Operate on Genomic Intervals → Join the intervals.

  • This tools allows you to join the information from two interval files based on the coordinates of each feature.

    • Select the SNP chr19 and the exons chr19 files as input.

    • Click on the "Execute" button.

  • Because some exons might contain multiple SNPs, the resulting output might have size greater than the two input files.

4. Find the number of SNPs per exon

  • On the Tool Panel, click on Join, Subtract and Group → Group.

  • This tools groups the information based on a given column and performs the aggregation operations on the other columns.

    • Select data 3 as input.

    • Select column 4 (exon ID).

    • Add operation to count c4.

    • Click on the "Execute" button.

5. Find the exon with the most SNPs

  • On the Tool Panel, click on Filter and Sort → Sort.

  • This tools ...

    • Select data 4 as input.

    • Select column 2 as sorting key.

    • Click on the "Execute" button.

6. Find how many chromosomes have a given number of exons

  • On the Tool Panel, click on Join, Subtract, Group → Group.

  • This tools ...

    • Select data 5 as input (sorted).

    • Select column 2 as sorting key.

    • Set the operation to "count" on column 1.

    • Click on the "Execute" button.

7. Filter exons with at least 10 SNPs

  • On the Tool Panel, click on Filter and Sort → Filter.

  • This tools ...

    • Select data 5 as input (sorted).

    • Set the condition to SNP count greater than 10 (i.e. c2 >= 10).

    • Click on the "Execute" button.

8. Retrieve original information for exons

  • On the Tool Panel, click on Join, Subtract, Group → Join.

  • This tools ...

  • This is equivalent to a relational join (not an interval join).

    • Select the exons with more than 10 SNPs as first input.

    • Select the exon data for chr19:1-1,000,000 as second input.

    • Select column 1 (exonID) for the first file.

    • Select column 4 (exonID) for the second file.

    • Click the "Execute" button.

  • Now repeat this step but invert the order of the file. Note that this time the output is a BED-formatted output, wherease before it was a tabular file.

9. Display using the UCSC Browser

  • On the Data Panel on the right-hand size, click on the last job → Display at UCSC.

    • The User track show the exons that have more than 10 SNPs in the region of chr19 considered.