Basic Plotting in R

The mathematician Richard Hamming once said, “The purpose of computing is insight, not numbers”, and the best way to develop insight is often to visualize data. Visualization deserves an entire lecture (or course) of its own, but we can explore a few features of R’s base plotting package.

When we are working with large sets of numbers it can be useful to display that information graphically. R has a number of built-in tools for basic graph types such as hisotgrams, scatter plots, bar charts, boxplots and much more. We’ll test a few of these out here on our samplemeans vector, but first we will create a combined data frame that maps our metadata to the sample mean values.

Prepare data to practice basic plots in R

baseDir<-getwd()
dataDir<-file.path(baseDir,"data")
metadata <- read.table(file.path(dataDir, 'mouse_exp_design.csv'), header=T, sep=",", row.names=1)
rpkm_data <- read.table(file.path(dataDir, 'counts.rpkm'), header=T, sep=",", row.names=1)
m <- match(row.names(metadata), colnames(rpkm_data))
data_ordered  <- rpkm_data[,m]
samplemeans <- apply(data_ordered, 2, mean)

# Create a combined data frame
all(rownames(metadata) == names(samplemeans)) # sanity check for sample order
df <- cbind(metadata, samplemeans)

Scatter Plot

Let’s start with a scatter plot. A scatter plot provides a graphical view of the relationship between two sets of numbers. We don’t have a variable in our metadata that is a continuous variable, so there is nothing to plot it against but we can plot the values against their index values just to demonstrate the function.

par(mar = rep(5, 4))
plot(samplemeans)

Each point represents a sample and the value on the x-axis is the sample number, where the values on the y-axis correspond to the average expression for that sample. For any plot you can customize many features of your graphs (fonts, colors, axes, titles) through graphic options. We can change the shape of the data point using pch.

plot(samplemeans, pch=8)

We can add a title to the plot by assigning a string to main

plot(samplemeans, pch=8, main="Scatter plot of mean values")

Barplot

In the case of our data, a barplot would be much more useful. We can use barplot to draw a single bar representing each sample and the height indicates the average expression level.

barplot(samplemeans)

The sample names appear to be too large for the plot, we can change that by changing the cex.names value.

barplot(samplemeans, cex.names=0.5)

The names are too small to read. Alternatively we can also just change the names to be numeric values and keep the same size.

barplot(samplemeans, names.arg=c(1:12)) # supply numbers as labels

We can also flip the axes so that the plot is projected horizontally.

barplot(samplemeans, names.arg=c(1:12), horiz=TRUE)

Histogram

If we are interested in an overall distribution of values, histogram is a plot very commonly used. It plots the frequencies that data appears within certain ranges. To plot a histogram of the data use the hist command:

hist(samplemeans)

The range of values for sample means is 9 to 16. As you can see R will automatically calculate the intervals to use. There are many options to determine how to break up the intervals. Let’s increase the number of breaks to see how that changes the plot:

hist(samplemeans, xlab="Mean expression level", main="", breaks=20)

Similar to the other plots we can tweak the aesthetics. Let’s color in the bar and remove the borders:

hist(samplemeans, xlab="Mean expression level", main="", col="darkgrey", border=FALSE)

Boxplot

Using addiitonal sample information from our metadata, we can use plots to compare values between the two different celltypes ‘typeA’ and ‘typeB’ using a boxplot. A boxplot provides a graphical view of the median, quartiles, maximum, and minimum of a data set.

boxplot(samplemeans~celltype, df)

Similar to the plots above, we can pass in arguments to add in extras like plot title, axis labels and colors.

boxplot(samplemeans~celltype, df,  col=c("blue","red"), main="Average expression differences between celltypes", ylab="Expression")

PreviousSimple Statistics in R NextAdvanced Plotting in R

Last updated 8 months ago

Was this helpful?