LogoLogo
LogoLogo
  • The Barbara K. Ostrom (1978) Bioinformatics and Computing Facility
  • Computing Resources
    • Active Data Storage
    • Archive Data Storage
    • Luria Cluster
      • FAQs
    • Other Resources
  • Bioinformatics Topics
    • Tools - A Basic Bioinformatics Toolkit
      • Getting more out of Microsoft Excel
      • Bioinformatics Applications of Unix
        • Unix commands applied to bioinformatics
        • Manipulate NGS files using UNIX commands
        • Manipulate alignment files using UNIX commands
      • Alignments and Mappers
      • Relational databases
        • Running Joins on Galaxy
      • Spotfire
    • Tasks - Bioinformatics Methods
      • UCSC Genome Bioinformatics
        • Interacting with the UCSC Genome Browser
        • Obtaining DNA sequence from the UCSC Database
        • Obtaining genomic data from the UCSC database using table browser queries
        • Filtering table browser queries
        • Performing a BLAT search
        • Creating Custom Tracks
        • UCSC Intersection Queries
        • Viewing cross-species alignments
        • Galaxy
          • Intro to Galaxy
          • Galaxy NGS Illumina QC
          • Galaxy NGS Illumina SE Mapping
          • Galaxy SNP Interval Data
        • Editing and annotation gene structures with Argo
      • GeneGO MetaCore
        • GeneGo Introduction
        • Loading Data Into GeneGO
        • Data Management in GeneGO
        • Setting Thresholds and Background Sets
        • Search And Browse Content Tab
        • Workflows and Reports Tab
        • One-click Analysis Tab
        • Building Network for Your Experimental Data
      • Functional Annotation of Gene Lists
      • Multiple Sequence Alignment
        • Clustalw2
      • Phylogenetic analysis
        • Neighbor Joining method in Phylip
      • Microarray data processing with R/Bioconductor
    • Running Jupyter notebooks on luria cluster nodes
  • Data Management
    • Globus
  • Mini Courses
    • Schedule
      • Previous Teaching
    • Introduction to Unix and KI Computational Resources
      • Basic Unix
        • Why Unix?
        • The Unix Tree
        • The Unix Terminal and Shell
        • Anatomy of a Unix Command
        • Basic Unix Commands
        • Output Redirection and Piping
        • Manual Pages
        • Access Rights
        • Unix Text Editors
          • nano
          • vi / vim
          • emacs
        • Shell Scripts
      • Software Installation
        • Module
        • Conda Environment
      • Slurm
    • Introduction to Unix
      • Why Unix?
      • The Unix Filesystem
        • The Unix Tree
        • Network Filesystems
      • The Unix Shell
        • About the Unix Shell
        • Unix Shell Manual Pages
        • Using the Unix Shell
          • Viewing the Unix Tree
          • Traversing the Unix Tree
          • Editing the Unix Tree
          • Searching the Unix Tree
      • Files
        • Viewing File Contents
        • Creating and Editing Files
        • Manipulating Files
        • Symbolic Links
        • File Ownership
          • How Unix File Ownership Works
          • Change File Ownership and Permissions
        • File Transfer (in-progress)
        • File Storage and Compression
      • Getting System Information
      • Writing Scripts
      • Schedule Scripts Using Crontab
    • Advanced Utilization of IGB Computational Resources
      • High Performance Computing Clusters
      • Slurm
        • Checking the Status of Computing Nodes
        • Submitting Jobs / Slurm Scripts
        • Interactive Sessions
      • Package Management
        • The System Package Manager
        • Environment Modules
        • Conda Environments
      • SSH Port Forwarding
        • SSH Port Forwarding Jupyter Notebooks
      • Containerization
        • Docker
          • Docker Installation
          • Running Docker Images
          • Building Docker Images
        • Singularity
          • Differences from Docker
          • Running Images in Singularity
      • Running Nextflow / nf-core Pipelines
    • Python
      • Introduction to Python for Biologists
        • Interactive Python
        • Types
          • Strings
          • Lists
          • Tuples
          • Dictionaries
        • Control Flow
        • Loops
          • For Loops
          • While Loops
        • Control Flows and Loops
        • Storing Programs for Re-use
        • Reading and Writing Files
        • Functions
      • Biopython
        • About Biopython
        • Quick Start
          • Basic Sequence Analyses
          • SeqRecord
          • Sequence IO
          • Exploration of Entrez Databases
        • Example Projects
          • Coronavirus Exploration
          • Translating a eukaryotic FASTA file of CDS entries
        • Further Resources
      • Machine Learning with Python
        • About Machine Learning
        • Hands-On
          • Project Introduction
          • Supervised Approaches
            • The Logistic Regression Model
            • K-Nearest Neighbors
          • Unsupervised Approaches
            • K-Means Clustering
          • Further Resources
      • Data Processing with Python
        • Pandas
          • About Pandas
          • Making DataFrames
          • Inspecting DataFrames
          • Slicing DataFrames
          • Selecting from DataFrames
          • Editing DataFrames
        • Matplotlib
          • About Matplotlib
          • Basic Plotting
          • Advanced Plotting
        • Seaborn
          • About Seaborn
          • Basic Plotting
          • Visualizing Statistics
          • Visualizing Proteomics Data
          • Visualizing RNAseq Data
    • R
      • Intro to R
        • Before We Start
        • Getting to Know R
        • Variables in R
        • Functions in R
        • Data Manipulation
        • Simple Statistics in R
        • Basic Plotting in R
        • Advanced Plotting in R
        • Writing Figures to a File
        • Further Resources
    • Version Control with Git
      • About Version Control
      • Setting up Git
      • Creating a Repository
      • Tracking Changes
        • Exercises
      • Exploring History
        • Exercises
      • Ignoring Things
      • Remotes in Github
      • Collaborating
      • Conflicts
      • Open Science
      • Licensing
      • Citation
      • Hosting
      • Supplemental
Powered by GitBook

MIT Resources

  • https://accessibility.mit.edu

Massachusetts Institute of Technology

On this page

Was this helpful?

Export as PDF
  1. Bioinformatics Topics
  2. Tasks - Bioinformatics Methods

Microarray data processing with R/Bioconductor

PreviousNeighbor Joining method in PhylipNextRunning Jupyter notebooks on luria cluster nodes

Last updated 1 year ago

Was this helpful?

is:

  • a large number of life sciences software implemented in R

  • an extensive collection of experimental and annotation data that relate to the analysis software.

Documentation

  • The packages are written, documented and supported according to consistent standards.

  • Each package has a website. For example see:

  • both pdf documentation and example R scripts exits.

Mailing List

IMPORTANT:Carefully read the posting guide and follow the instructions.

  • The list is high-traffic and widely read so be careful with postings.

UseExample

#Connect to rous.mit.edu and start R
R
#check available packages:
library()
#load affdata
library(affydata)
#Run some of the commands
data(Dilution)
ls()
Dilution
class(Dilution)
expressionData<-exprs(Dilution)
class(expressionData)
expressionData[1:3,]
log2(expressionData[1:3,])
round(log2(expressionData[1:3,]), digits=2)

Processing Microarray Data with r/bioconductor

The following series of commands can be used to process array data with gcrma and do differential expression testing with LPE:

library(affy)
library(gcrma)
library(LPE)
library(affyPLM)

#Set the working directory to the location of your CEL files.

setwd("/Path/TO/CEL_Files")

#Import Data
#Order of samples in the resulting matrix can be specified by the order in the list.

Name_Dat<-ReadAffy(
"Condition1a.CEL", "Condition1b.CEL", "Condition1c.CEL",
"Condition2a.CEL", "Condition2b.CEL", "Condition2c.CEL")

#RNA degradation Work
RNAdeg<-AffyRNAdeg(Name_Dat)
png(file="Name_rnaDeg.png", bg="white")
plotAffyRNAdeg(RNAdeg,cols=c(1:16))
dev.off()

#PLM work
pset1<-fitPLM(Name_Dat)

#RLE plot
png(file="rle.png", bg="white")
par(mar=c(3, 10, 3, 3))
RLE(pset1, main = "RLE for Name", horizontal=TRUE, las=2)
dev.off()

#NUSE plot
png(file="nuse.png", bg="white")
par(mar=c(3, 10, 3, 3))
NUSE(pset1, ylim= c(0.95,1.2), main = "NUSE for Name", horizontal=TRUE, las=2)
dev.off()

#Process the Data
Name_Exp<-gcrma(Name_Dat, fast=FALSE)

Name_Tab<-exprs(Name_Exp)
Name_Tab<-round(Name_Tab, digits=2)
write.table(data.frame(Name_Tab), sep="\t", quote=FALSE, file="Name.txt")

#Differential Expression Testing
#In editor, add "ProbeID" to top of first column and delete the affy control rows that start with AFFX

Name<-read.table("Name.txt", header=TRUE)
attach(Name)
names(Name)

set.seed(0)

#testing the columns by printing the first 3 rows for each condition
#var.Cond1
Name[1:3,c(2,3,4)]
#var.Cond2
Name[1:3,c(5,6,7)]

#LPE tests
var.Cond1<-baseOlig.error(Name[,c(2,3,4)],q=0.01)
var.Cond2<-baseOlig.error(Name[,c(5,6,7)],q=0.01)

lpeVal.Cond1.Cond2<-data.frame(lpe(Name[,c(5,6,7)], Name[,c(2,3,4)], var.Cond1, var.Cond2, probe.set.name = Name$ProbeID))

lpeVal.Cond1.Cond2<-round(lpeVal.Cond1.Cond2, digits=2)

fdrBH.Cond1.Cond2<-fdr.adjust(lpeVal.Cond1.Cond2, adjp="BH")

write.table(lpeVal.Cond1.Cond2, quote=FALSE, sep="\t", file="lpeVal.Cond1.Cond2.txt")
write.table(fdrBH.Cond1.Cond2, quote=FALSE, sep="\t", file="fdrBH.Cond1.Cond2.txt")

#clustering with pvalues

library(pvclust)

Name.pv<-pvclust(Name, method.hclust="ward",
method.dist="correlation", use.cor="pairwise.complete.obs",
nboot=1000, r=seq(.5,1.4,by=.1), store=FALSE, weight=FALSE)

png(filename="Name_PV.png", bg="white",width=960, height=500)
plot(TAM.pv)
dev.off()
Bioconductor
affy
affydata
Instructions on joining the mailing list