LogoLogo
LogoLogo
  • The Barbara K. Ostrom (1978) Bioinformatics and Computing Facility
  • Computing Resources
    • Active Data Storage
    • Archive Data Storage
    • Luria Cluster
      • FAQs
    • Other Resources
  • Bioinformatics Topics
    • Tools - A Basic Bioinformatics Toolkit
      • Getting more out of Microsoft Excel
      • Bioinformatics Applications of Unix
        • Unix commands applied to bioinformatics
        • Manipulate NGS files using UNIX commands
        • Manipulate alignment files using UNIX commands
      • Alignments and Mappers
      • Relational databases
        • Running Joins on Galaxy
      • Spotfire
    • Tasks - Bioinformatics Methods
      • UCSC Genome Bioinformatics
        • Interacting with the UCSC Genome Browser
        • Obtaining DNA sequence from the UCSC Database
        • Obtaining genomic data from the UCSC database using table browser queries
        • Filtering table browser queries
        • Performing a BLAT search
        • Creating Custom Tracks
        • UCSC Intersection Queries
        • Viewing cross-species alignments
        • Galaxy
          • Intro to Galaxy
          • Galaxy NGS Illumina QC
          • Galaxy NGS Illumina SE Mapping
          • Galaxy SNP Interval Data
        • Editing and annotation gene structures with Argo
      • GeneGO MetaCore
        • GeneGo Introduction
        • Loading Data Into GeneGO
        • Data Management in GeneGO
        • Setting Thresholds and Background Sets
        • Search And Browse Content Tab
        • Workflows and Reports Tab
        • One-click Analysis Tab
        • Building Network for Your Experimental Data
      • Functional Annotation of Gene Lists
      • Multiple Sequence Alignment
        • Clustalw2
      • Phylogenetic analysis
        • Neighbor Joining method in Phylip
      • Microarray data processing with R/Bioconductor
    • Running Jupyter notebooks on luria cluster nodes
  • Data Management
    • Globus
  • Mini Courses
    • Schedule
      • Previous Teaching
    • Introduction to Unix and KI Computational Resources
      • Basic Unix
        • Why Unix?
        • The Unix Tree
        • The Unix Terminal and Shell
        • Anatomy of a Unix Command
        • Basic Unix Commands
        • Output Redirection and Piping
        • Manual Pages
        • Access Rights
        • Unix Text Editors
          • nano
          • vi / vim
          • emacs
        • Shell Scripts
      • Software Installation
        • Module
        • Conda Environment
      • Slurm
    • Introduction to Unix
      • Why Unix?
      • The Unix Filesystem
        • The Unix Tree
        • Network Filesystems
      • The Unix Shell
        • About the Unix Shell
        • Unix Shell Manual Pages
        • Using the Unix Shell
          • Viewing the Unix Tree
          • Traversing the Unix Tree
          • Editing the Unix Tree
          • Searching the Unix Tree
      • Files
        • Viewing File Contents
        • Creating and Editing Files
        • Manipulating Files
        • Symbolic Links
        • File Ownership
          • How Unix File Ownership Works
          • Change File Ownership and Permissions
        • File Transfer (in-progress)
        • File Storage and Compression
      • Getting System Information
      • Writing Scripts
      • Schedule Scripts Using Crontab
    • Advanced Utilization of IGB Computational Resources
      • High Performance Computing Clusters
      • Slurm
        • Checking the Status of Computing Nodes
        • Submitting Jobs / Slurm Scripts
        • Interactive Sessions
      • Package Management
        • The System Package Manager
        • Environment Modules
        • Conda Environments
      • SSH Port Forwarding
        • SSH Port Forwarding Jupyter Notebooks
      • Containerization
        • Docker
          • Docker Installation
          • Running Docker Images
          • Building Docker Images
        • Singularity
          • Differences from Docker
          • Running Images in Singularity
      • Running Nextflow / nf-core Pipelines
    • Python
      • Introduction to Python for Biologists
        • Interactive Python
        • Types
          • Strings
          • Lists
          • Tuples
          • Dictionaries
        • Control Flow
        • Loops
          • For Loops
          • While Loops
        • Control Flows and Loops
        • Storing Programs for Re-use
        • Reading and Writing Files
        • Functions
      • Biopython
        • About Biopython
        • Quick Start
          • Basic Sequence Analyses
          • SeqRecord
          • Sequence IO
          • Exploration of Entrez Databases
        • Example Projects
          • Coronavirus Exploration
          • Translating a eukaryotic FASTA file of CDS entries
        • Further Resources
      • Machine Learning with Python
        • About Machine Learning
        • Hands-On
          • Project Introduction
          • Supervised Approaches
            • The Logistic Regression Model
            • K-Nearest Neighbors
          • Unsupervised Approaches
            • K-Means Clustering
          • Further Resources
      • Data Processing with Python
        • Pandas
          • About Pandas
          • Making DataFrames
          • Inspecting DataFrames
          • Slicing DataFrames
          • Selecting from DataFrames
          • Editing DataFrames
        • Matplotlib
          • About Matplotlib
          • Basic Plotting
          • Advanced Plotting
        • Seaborn
          • About Seaborn
          • Basic Plotting
          • Visualizing Statistics
          • Visualizing Proteomics Data
          • Visualizing RNAseq Data
    • R
      • Intro to R
        • Before We Start
        • Getting to Know R
        • Variables in R
        • Functions in R
        • Data Manipulation
        • Simple Statistics in R
        • Basic Plotting in R
        • Advanced Plotting in R
        • Writing Figures to a File
        • Further Resources
    • Version Control with Git
      • About Version Control
      • Setting up Git
      • Creating a Repository
      • Tracking Changes
        • Exercises
      • Exploring History
        • Exercises
      • Ignoring Things
      • Remotes in Github
      • Collaborating
      • Conflicts
      • Open Science
      • Licensing
      • Citation
      • Hosting
      • Supplemental
Powered by GitBook

MIT Resources

  • https://accessibility.mit.edu

Massachusetts Institute of Technology

On this page

Was this helpful?

Export as PDF
  1. Mini Courses
  2. Advanced Utilization of IGB Computational Resources
  3. Containerization
  4. Docker

Building Docker Images

Docker is a container engine, but it's also an image build tool. You can build Docker images yourself by creating a Dockerfile, essentially a file that outlines each step in creating your image.

Below are the common commands used in a Dockerfile to outline these steps:

  • FROM - Dictates what the base image you're building off of.

  • LABEL - A simple label attached to your image as metadata. A common label would be description for writing a description of the image.

  • RUN - Runs the command you specify in the image. For example, if the base image is Ubuntu, then you can run any Ubuntu commands here. Common things to run would be apt-get install <package> to install an Ubuntu package into your container.

  • CMD - The command that should run when the container is started. This tends to be the major software that is being packaged.

Knowing these is enough to build a simple Docker image. We'll be using this knowledge to build our own Docker image for Seurat.

Seurat is an R package designed for QC, analysis, and exploration of single-cell RNA-seq data. Seurat aims to enable users to identify and interpret sources of heterogeneity from single-cell transcriptomic measurements, and to integrate diverse types of single-cell data.

We'll use rocker/rstudio as a base so that we can have RStudio available to us automatically.

Create a file named "Dockerfile".

First, we must select the base image. We'll use rocker/rstudio version 4.3.2, which comes with R 4.3.2. We'll make sure to label the image with a simple description.

FROM rocker/rstudio:4.3.2
LABEL description="Docker image for Seurat4"

Then, we must outline the steps needed to install Seurat4. rocker/rstudio is built on top of Ubuntu, so any packages we need to install should use Ubuntu's apt-get utility. The following packages are needed for the installation of Seurat and other tools:

RUN apt-get update && apt-get install -y \
    libhdf5-dev build-essential libxml2-dev \
    libssl-dev libv8-dev libsodium-dev libglpk40 \
    libgdal-dev libboost-dev libomp-dev \
    libbamtools-dev libboost-iostreams-dev \
    libboost-log-dev libboost-system-dev \
    libboost-test-dev libcurl4-openssl-dev libz-dev \
    libarmadillo-dev libhdf5-cpp-103

Now, we can run R to install Seurat and other useful R tools, including BiocManager, which we'll use in the next step to install useful bioinformatics R libraries.

RUN R -e "install.packages(c('Seurat', 'hdf5r', 'dplyr', 'cowplot', 'knitr', 'slingshot', 'msigdbr', 'remotes', 'metap', 'devtools', 'R.utils', 'ggalt', 'ggpubr', 'BiocManager'), repos='http://cran.rstudio.com/')"

Installing R libaries using BiocManager:

RUN R -e "BiocManager::install(c('SingleR', 'slingshot', 'scRNAseq', 'celldex', 'fgsea', 'multtest', 'scuttle', 'BiocGenerics', 'DelayedArray', 'DelayedMatrixStats', 'limma', 'S4Vectors', 'SingleCellExperiment', 'SummarizedExperiment', 'batchelor', 'org.Mm.eg.db', 'AnnotationHub', 'scater', 'edgeR', 'apeglm', 'DESeq2', 'pcaMethods', 'clusterProfiler'))"

Installing other tools from GitHub:

RUN R -e "remotes::install_github(c('satijalab/seurat-wrappers', 'kevinblighe/PCAtools', 'chris-mcginnis-ucsf/DoubletFinder', 'velocyto-team/velocyto.R'))"

All together, the Dockerfile should look like this:

FROM rocker/rstudio:4.3.2
LABEL description="Docker image for Seurat4"

RUN apt-get update && apt-get install -y \
    libhdf5-dev build-essential libxml2-dev \
    libssl-dev libv8-dev libsodium-dev libglpk40 \
    libgdal-dev libboost-dev libomp-dev \
    libbamtools-dev libboost-iostreams-dev \
    libboost-log-dev libboost-system-dev \
    libboost-test-dev libcurl4-openssl-dev libz-dev \
    libarmadillo-dev libhdf5-cpp-103

RUN R -e "install.packages(c('Seurat', 'hdf5r', 'dplyr', 'tidyverse', 'cowplot', 'knitr', 'slingshot', 'msigdbr', 'remotes', 'metap', 'devtools', 'R.utils', 'ggalt', 'ggpubr', 'BiocManager'), repos='http://cran.rstudio.com/')"

RUN R -e "BiocManager::install(c('SingleR', 'slingshot', 'scRNAseq', 'celldex', 'fgsea', 'multtest', 'scuttle', 'BiocGenerics', 'DelayedArray', 'DelayedMatrixStats', 'limma', 'S4Vectors', 'SingleCellExperiment', 'SummarizedExperiment', 'batchelor', 'org.Mm.eg.db', 'AnnotationHub', 'scater', 'edgeR', 'apeglm', 'DESeq2', 'pcaMethods', 'clusterProfiler'))"

RUN R -e "remotes::install_github(c('satijalab/seurat-wrappers', 'kevinblighe/PCAtools', 'chris-mcginnis-ucsf/DoubletFinder', 'velocyto-team/velocyto.R'))"

Now that we have the Dockerfile, we can invoke the Docker build commands in the command line. We'll want to tag our Docker image with our name and the name of the image, preferably something descriptive. I'll choose asoberan/abrfseurat for my build.

cd /path/to/directory/where/Dockerfile/is/located

docker buildx build -t asoberan/abrfseurat .

Of course, each of you could build this yourselves and have a custom local copy of this image. However, the benefits of containerization are that it makes programs and environments portable. I've already created the image and uploaded it to Dockerhub. So instead of everyone needing to create their own image, you just pull my existing image and use it immediately.

I've created images for both amd64 and arm64. If you're running a PC or an Intel-based Mac, you'll want to use the tag latest-x86_64. If you're running Apple Silicon or another ARM processor, you'll want to use the tag latest-arm64.

docker run --rm -it -p 8787:8787 asoberan/abrseurat:<tag>

However, we've fallen into the same problem as previously: we are running this instance of RStudio locally on our computers. How can we take advantage of this image on the Luria cluster?

PreviousRunning Docker ImagesNextSingularity

Last updated 1 year ago

Was this helpful?

Once the Docker image is pulled and runs, you can navigate to and login to the RStudio instance with user rstudio and the given password. All the libraries needed for Seurat should be available out of the box.

http://localhost:8787