LogoLogo
LogoLogo
  • The Barbara K. Ostrom (1978) Bioinformatics and Computing Facility
  • Computing Resources
    • Active Data Storage
    • Archive Data Storage
    • Luria Cluster
      • FAQs
    • Other Resources
  • Bioinformatics Topics
    • Tools - A Basic Bioinformatics Toolkit
      • Getting more out of Microsoft Excel
      • Bioinformatics Applications of Unix
        • Unix commands applied to bioinformatics
        • Manipulate NGS files using UNIX commands
        • Manipulate alignment files using UNIX commands
      • Alignments and Mappers
      • Relational databases
        • Running Joins on Galaxy
      • Spotfire
    • Tasks - Bioinformatics Methods
      • UCSC Genome Bioinformatics
        • Interacting with the UCSC Genome Browser
        • Obtaining DNA sequence from the UCSC Database
        • Obtaining genomic data from the UCSC database using table browser queries
        • Filtering table browser queries
        • Performing a BLAT search
        • Creating Custom Tracks
        • UCSC Intersection Queries
        • Viewing cross-species alignments
        • Galaxy
          • Intro to Galaxy
          • Galaxy NGS Illumina QC
          • Galaxy NGS Illumina SE Mapping
          • Galaxy SNP Interval Data
        • Editing and annotation gene structures with Argo
      • GeneGO MetaCore
        • GeneGo Introduction
        • Loading Data Into GeneGO
        • Data Management in GeneGO
        • Setting Thresholds and Background Sets
        • Search And Browse Content Tab
        • Workflows and Reports Tab
        • One-click Analysis Tab
        • Building Network for Your Experimental Data
      • Functional Annotation of Gene Lists
      • Multiple Sequence Alignment
        • Clustalw2
      • Phylogenetic analysis
        • Neighbor Joining method in Phylip
      • Microarray data processing with R/Bioconductor
    • Running Jupyter notebooks on luria cluster nodes
  • Data Management
    • Globus
  • Mini Courses
    • Schedule
      • Previous Teaching
    • Introduction to Unix and KI Computational Resources
      • Basic Unix
        • Why Unix?
        • The Unix Tree
        • The Unix Terminal and Shell
        • Anatomy of a Unix Command
        • Basic Unix Commands
        • Output Redirection and Piping
        • Manual Pages
        • Access Rights
        • Unix Text Editors
          • nano
          • vi / vim
          • emacs
        • Shell Scripts
      • Software Installation
        • Module
        • Conda Environment
      • Slurm
    • Introduction to Unix
      • Why Unix?
      • The Unix Filesystem
        • The Unix Tree
        • Network Filesystems
      • The Unix Shell
        • About the Unix Shell
        • Unix Shell Manual Pages
        • Using the Unix Shell
          • Viewing the Unix Tree
          • Traversing the Unix Tree
          • Editing the Unix Tree
          • Searching the Unix Tree
      • Files
        • Viewing File Contents
        • Creating and Editing Files
        • Manipulating Files
        • Symbolic Links
        • File Ownership
          • How Unix File Ownership Works
          • Change File Ownership and Permissions
        • File Transfer (in-progress)
        • File Storage and Compression
      • Getting System Information
      • Writing Scripts
      • Schedule Scripts Using Crontab
    • Advanced Utilization of IGB Computational Resources
      • High Performance Computing Clusters
      • Slurm
        • Checking the Status of Computing Nodes
        • Submitting Jobs / Slurm Scripts
        • Interactive Sessions
      • Package Management
        • The System Package Manager
        • Environment Modules
        • Conda Environments
      • SSH Port Forwarding
        • SSH Port Forwarding Jupyter Notebooks
      • Containerization
        • Docker
          • Docker Installation
          • Running Docker Images
          • Building Docker Images
        • Singularity
          • Differences from Docker
          • Running Images in Singularity
      • Running Nextflow / nf-core Pipelines
    • Python
      • Introduction to Python for Biologists
        • Interactive Python
        • Types
          • Strings
          • Lists
          • Tuples
          • Dictionaries
        • Control Flow
        • Loops
          • For Loops
          • While Loops
        • Control Flows and Loops
        • Storing Programs for Re-use
        • Reading and Writing Files
        • Functions
      • Biopython
        • About Biopython
        • Quick Start
          • Basic Sequence Analyses
          • SeqRecord
          • Sequence IO
          • Exploration of Entrez Databases
        • Example Projects
          • Coronavirus Exploration
          • Translating a eukaryotic FASTA file of CDS entries
        • Further Resources
      • Machine Learning with Python
        • About Machine Learning
        • Hands-On
          • Project Introduction
          • Supervised Approaches
            • The Logistic Regression Model
            • K-Nearest Neighbors
          • Unsupervised Approaches
            • K-Means Clustering
          • Further Resources
      • Data Processing with Python
        • Pandas
          • About Pandas
          • Making DataFrames
          • Inspecting DataFrames
          • Slicing DataFrames
          • Selecting from DataFrames
          • Editing DataFrames
        • Matplotlib
          • About Matplotlib
          • Basic Plotting
          • Advanced Plotting
        • Seaborn
          • About Seaborn
          • Basic Plotting
          • Visualizing Statistics
          • Visualizing Proteomics Data
          • Visualizing RNAseq Data
    • R
      • Intro to R
        • Before We Start
        • Getting to Know R
        • Variables in R
        • Functions in R
        • Data Manipulation
        • Simple Statistics in R
        • Basic Plotting in R
        • Advanced Plotting in R
        • Writing Figures to a File
        • Further Resources
    • Version Control with Git
      • About Version Control
      • Setting up Git
      • Creating a Repository
      • Tracking Changes
        • Exercises
      • Exploring History
        • Exercises
      • Ignoring Things
      • Remotes in Github
      • Collaborating
      • Conflicts
      • Open Science
      • Licensing
      • Citation
      • Hosting
      • Supplemental
Powered by GitBook

MIT Resources

  • https://accessibility.mit.edu

Massachusetts Institute of Technology

On this page

Was this helpful?

Export as PDF
  1. Mini Courses
  2. Advanced Utilization of IGB Computational Resources
  3. Containerization
  4. Docker

Running Docker Images

To create a basic Docker container from the Debian image, we run the following:

docker run debian echo 'Hello, World!'

# If this is your first time running the ubuntu image, this will pull a lot of data from Dockerhub, then run:

Hello, World!

What's happening here? We invoke the docker command, and tell it to run a command in a container created using the debian image, which it gets by default from Dockerhub. The rest of this line simply tells Docker what command to run in the container, in this case echo 'Hello, World!'.

Important note: Docker images are built for specific CPU architectures (i.e. amd64 vs arm64) and you can only run an image if its the same architecture as your computer. Many popular Docker images have versions for both amd64 and arm64, but it's up to you to check whether or not a version compatible with your CPU's architecture exists before trying to run an image.

We can explore this container a bit more by creating an interactive session inside of it. This allows us to see the filesystem present in the container.

To start an interactive session in the container created using the ubuntu image:

docker run -it debian bash
root@dsliajldkajs:/#

Inside the container, we can't do very much, but we can see that it has its own filesystem with the usual FHS layout. We can certainly make changes to this filesystem similarly to how we do so on a normal Unix system. However, once the container stops running, any changes we make are reset once the container comes back up.

If data inside the container does not survive container reboots, how does any data persist?

There are two ways that Docker allows us to persist data: bind mounting the host's filesystem or creating a Docker volume.

We'll focus on bind mounting first. Bind mounting essentially lets you plug a hole in a Docker container filesystem that points to somewhere on your host computer's filesystem.

To bind mount a directory, when we run our container, we pass the -v flag then provide <source>:<destination> where <source> is the directory that you want accessed in the container and <destination> is where in the container the directory points to.

For example, to bind mount a local directory in the Ubuntu container

docker run -v "/home/asoberan:/mnt" -it debian bash
root@mlfmlkma:#/ ls /mnt

# Your files should be present in the /mnt directory inside the container

Now, if you create something in /mnt in the container, you'll see those changes made on your local directory as well.

This Ubuntu images is pretty barebones, as you've seen. Images such as this aren't meant for using outright, but instead for building upon to create other, more useful images.

We'll use one of these more useful images to set up an R development environment.

The image we'll be using is rocker/rstudio, an image made by the R community for setting up a barebones R environemnt or for building a more robust R environment.

Let's start up an interactive session using the rocker/rstudio image available on Dockerhub.

docker run --rm -it rocker/rstudio bash
root@damldkmsla:#/ R
> library("tidyverse")
> Error in library("tidyverse") : there is no package called ‘tidyverse’
> install.packages(c("tidyverse"))
# tidyverse installation output
> library(tidyverse)
── Attaching core tidyverse packages ─────────────────── tidyverse 2.0.0 ──
:heavy_check_mark: dplyr    1.1.4    :heavy_check_mark: readr    2.1.5
:heavy_check_mark: forcats  1.0.0    :heavy_check_mark: stringr  1.5.1
:heavy_check_mark: ggplot2  3.5.0    :heavy_check_mark: tibble   3.2.1
:heavy_check_mark: lubridate 1.9.3    :heavy_check_mark: tidyr    1.3.1
:heavy_check_mark: purrr    1.0.2
── Conflicts ───────────────────────────────────── tidyverse_conflicts() ──
:heavy_multiplication_x: dplyr::filter() masks stats::filter()
:heavy_multiplication_x: dplyr::lag()   masks stats::lag()
:information_source: Use the conflicted package (<http://conflicted.r-lib.org/>) to force all
conflicts to become errors

As you can see, rocker/rstudio does not come with tidyverse built-in. However, the R environment it provides is just like any other R environment, so it's incredibly simple to install it.

Working with R in the command line can be fairly cumbersome. The real power of rocker/rstudio is that it comes built-in with an RStudio server.

By default, RStudio binds to the port 8787. However, the port in the container is in its own network. So like before, we'll need to port forward from the container to our local network. Thankfully, Docker has a built-in way of doing this, using the -p flag, which is supplied with <host port>:<container port>, where host port is the port on your own computer and container port is the port in the container. To keep things simple, we'll keep the port numbers the same.

docker run --rm -it -p 8787:8787 rocker/rstudio

Remember, any files you create in this RStudio Server are created in the Docker container. When the Docker container stops, those files will be gone. If you want to save your files or use R files you have from previous work, it's best to bind mount the directory with your files and make sure to only make changes to the bind-mounted directory in the container.

PreviousDocker InstallationNextBuilding Docker Images

Last updated 1 year ago

Was this helpful?

This should start an RStudio server which you can access on your computer's web browser at .

http://localhost:8787