UCSC Intersection Queries
Last updated
Last updated
MIT Resources
https://accessibility.mit.eduMassachusetts Institute of Technology
Intersection queries are one of the most useful functions implemented by UCSC. They allow users to characterize the positional relationships of 2 different overlapping tracks.
The demonstration of this tool will build on the observations made during examination of the 5' end of the PCDH10 in Obtaining_genomic_data_from_the_UCSC_database_using_table_browser_queries. The region upstream of the 5' end of the gene overlapped with a CpG island. CpG islands correlate with promoters because they tend to be unmethylated. In non-promoter sequence, the C of a CG dinucleotide is usually methylated and thus subject to spontaneous deaminiation. See the details page of the CpG Islands track for more information. Note: all UCSC tracks are well documented and together contain extensive information about a wide range of genomics methods and data types.
summary/statistics and the browser can be used to confirm the custom track contains the desired data.
This processing creates one interval for each GENCODE transcript and many intervals will be similar because many start positions are close to one another. This redundancy can be collapsed by sending the results to galaxy and using the bedtools functions "Sort BED files" followed by "Merge BED files". There are fine controls on this processing and it will not be demonstrated here but the output is available here.
Create a custom track using the URL to that merged data and count the results.
Compare the contents of the custom tracks in the browser. The region is chr2:109,000,000-119,000,000
Use an intersection query to identify regions in the merged upstream sequences that overlap with a CpG island.
Open Tools --> Table Browser and set the "group" and "track" options so that they specify the merged bed custom track.
Select the "create" button next to "intersection" button and select intersection with CpG islands according to the following image:
Note the ability to control the degree of intersection (some, none, partial).
After intersection is in place, use summary/statistics to count the results and view some in the genome browser.
The results of the intersection query can be used in the same ways as any other kind of data.