Building Docker Images

Docker is a container engine, but it's also an image build tool. You can build Docker images yourself by creating a Dockerfile, essentially a file that outlines each step in creating your image.

Below are the common commands used in a Dockerfile to outline these steps:

  • FROM - Dictates what the base image you're building off of.

  • LABEL - A simple label attached to your image as metadata. A common label would be description for writing a description of the image.

  • RUN - Runs the command you specify in the image. For example, if the base image is Ubuntu, then you can run any Ubuntu commands here. Common things to run would be apt-get install <package> to install an Ubuntu package into your container.

  • CMD - The command that should run when the container is started. This tends to be the major software that is being packaged.

Knowing these is enough to build a simple Docker image. We'll be using this knowledge to build our own Docker image for Seurat.

Seurat is an R package designed for QC, analysis, and exploration of single-cell RNA-seq data. Seurat aims to enable users to identify and interpret sources of heterogeneity from single-cell transcriptomic measurements, and to integrate diverse types of single-cell data.

We'll use rocker/rstudio as a base so that we can have RStudio available to us automatically.

Create a file named "Dockerfile".

First, we must select the base image. We'll use rocker/rstudio version 4.3.2, which comes with R 4.3.2. We'll make sure to label the image with a simple description.

FROM rocker/rstudio:4.3.2
LABEL description="Docker image for Seurat4"

Then, we must outline the steps needed to install Seurat4. rocker/rstudio is built on top of Ubuntu, so any packages we need to install should use Ubuntu's apt-get utility. The following packages are needed for the installation of Seurat and other tools:

RUN apt-get update && apt-get install -y \
    libhdf5-dev build-essential libxml2-dev \
    libssl-dev libv8-dev libsodium-dev libglpk40 \
    libgdal-dev libboost-dev libomp-dev \
    libbamtools-dev libboost-iostreams-dev \
    libboost-log-dev libboost-system-dev \
    libboost-test-dev libcurl4-openssl-dev libz-dev \
    libarmadillo-dev libhdf5-cpp-103

Now, we can run R to install Seurat and other useful R tools, including BiocManager, which we'll use in the next step to install useful bioinformatics R libraries.

RUN R -e "install.packages(c('Seurat', 'hdf5r', 'dplyr', 'cowplot', 'knitr', 'slingshot', 'msigdbr', 'remotes', 'metap', 'devtools', 'R.utils', 'ggalt', 'ggpubr', 'BiocManager'), repos='http://cran.rstudio.com/')"

Installing R libaries using BiocManager:

RUN R -e "BiocManager::install(c('SingleR', 'slingshot', 'scRNAseq', 'celldex', 'fgsea', 'multtest', 'scuttle', 'BiocGenerics', 'DelayedArray', 'DelayedMatrixStats', 'limma', 'S4Vectors', 'SingleCellExperiment', 'SummarizedExperiment', 'batchelor', 'org.Mm.eg.db', 'AnnotationHub', 'scater', 'edgeR', 'apeglm', 'DESeq2', 'pcaMethods', 'clusterProfiler'))"

Installing other tools from GitHub:

RUN R -e "remotes::install_github(c('satijalab/seurat-wrappers', 'kevinblighe/PCAtools', 'chris-mcginnis-ucsf/DoubletFinder', 'velocyto-team/velocyto.R'))"

All together, the Dockerfile should look like this:

FROM rocker/rstudio:4.3.2
LABEL description="Docker image for Seurat4"

RUN apt-get update && apt-get install -y \
    libhdf5-dev build-essential libxml2-dev \
    libssl-dev libv8-dev libsodium-dev libglpk40 \
    libgdal-dev libboost-dev libomp-dev \
    libbamtools-dev libboost-iostreams-dev \
    libboost-log-dev libboost-system-dev \
    libboost-test-dev libcurl4-openssl-dev libz-dev \
    libarmadillo-dev libhdf5-cpp-103

RUN R -e "install.packages(c('Seurat', 'hdf5r', 'dplyr', 'tidyverse', 'cowplot', 'knitr', 'slingshot', 'msigdbr', 'remotes', 'metap', 'devtools', 'R.utils', 'ggalt', 'ggpubr', 'BiocManager'), repos='http://cran.rstudio.com/')"

RUN R -e "BiocManager::install(c('SingleR', 'slingshot', 'scRNAseq', 'celldex', 'fgsea', 'multtest', 'scuttle', 'BiocGenerics', 'DelayedArray', 'DelayedMatrixStats', 'limma', 'S4Vectors', 'SingleCellExperiment', 'SummarizedExperiment', 'batchelor', 'org.Mm.eg.db', 'AnnotationHub', 'scater', 'edgeR', 'apeglm', 'DESeq2', 'pcaMethods', 'clusterProfiler'))"

RUN R -e "remotes::install_github(c('satijalab/seurat-wrappers', 'kevinblighe/PCAtools', 'chris-mcginnis-ucsf/DoubletFinder', 'velocyto-team/velocyto.R'))"

Now that we have the Dockerfile, we can invoke the Docker build commands in the command line. We'll want to tag our Docker image with our name and the name of the image, preferably something descriptive. I'll choose asoberan/abrfseurat for my build.

cd /path/to/directory/where/Dockerfile/is/located

docker buildx build -t asoberan/abrfseurat .

Of course, each of you could build this yourselves and have a custom local copy of this image. However, the benefits of containerization are that it makes programs and environments portable. I've already created the image and uploaded it to Dockerhub. So instead of everyone needing to create their own image, you just pull my existing image and use it immediately.

I've created images for both amd64 and arm64. If you're running a PC or an Intel-based Mac, you'll want to use the tag latest-x86_64. If you're running Apple Silicon or another ARM processor, you'll want to use the tag latest-arm64.

docker run --rm -it -p 8787:8787 asoberan/abrseurat:<tag>

Once the Docker image is pulled and runs, you can navigate to http://localhost:8787 and login to the RStudio instance with user rstudio and the given password. All the libraries needed for Seurat should be available out of the box.

However, we've fallen into the same problem as previously: we are running this instance of RStudio locally on our computers. How can we take advantage of this image on the Luria cluster?

Last updated

Massachusetts Institute of Technology