All pages
Powered by GitBook
1 of 19

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Basic Unix

Why Unix?

UNIX is an operating system (suite of programs), originally developed in 1969 at Bell Labs and has been under development ever since. UNIX is a very popular operating system for many reasons, including:

  • multi user (multiple users can log in to a computer at the same time, and use concurrently the resources of that computer).

  • multi-tasking (each user can perform many tasks at the same time).

  • network-ready (built-in TCP /IP networking makes easy to communicate between computers).

  • very powerful programming environments (free of the many limits imposed by other operating systems).

  • robust and stable.

  • scalable, portable, flexible.

  • open source.

UNIX systems also have a graphical user interface (GUI) similar to Microsoft Windows which provides an easy to use environment. However, knowledge of UNIX is required for operations which aren't covered by a graphical program, or for when there is no windows interface available (for example, in a telnet session).

UNIX at a glance

UNIX consists of three main components: kernel, shell, and programs.

  • KERNEL: hub of the operating system.

  • SHELL: command line interpreter.

  • PROGRAMS: collections of operations.

Each component has specific roles:

  • KERNEL: it allocates time and memory to programs and handles the filestore and communications in response to system calls.

  • SHELL: it interprets the commands the user types in and arranges for them to be carried out.

  • PROGRAM: execute a set of predefined operations.

As an illustration of the way the shell and the kernel work together, suppose a user types the following command:

rm myfile 

Internally,

  1. The shell searches the PATH environment variable for the file containing the program rm.

  2. The shell requests to the kernel, through system calls, to execute the program rm on myfile.

  3. When the process has finished running, the shell returns the UNIX prompt to the user, indicating that it is waiting for further commands.

Introduction to Unix and KI Computational Resources

THIS TRAINING MATERIAL IS DEPRECATED. PLEASE REFER TO THE FOLLOWING TWO COURSES:

KI Bioinformatics Support

  • Modes of Support

    • Individual Training

    • Project Work

$90 per hour
$1200/0.1FTE/mo
  • Requesting Support

    • Send email to Charlie Whittaker or Stuart Levine to arrange for a free outreach consultation.

  • Project Tracking and Billing Handled with ilabs

    • Core users must have ilabs account and valid cost-objects in place. The BCC/BMC core must provide a cost estimate and the project has to be approved by the faculty member before work begins.

Overview of the KI Computing Systems

A model of the KI computing environment
Introduction to Unix
Advanced Utilization of IGB Computational Resources

Unix Text Editors

The Unix Tree

  • The UNIX file system consists of files and directories organized in a hierarchical structure.

  • Avoid File names and directory names with spaces and special characters

  • In the figure below, there are 5 subdirectories under the root, i.e. bin, tmp, home, use, src. The directory home has two subdirectories (i.e. iap and paolaf). Thus the parent directory of iap is home.

The system provides a few short synonyms relative to the working directory:

  • The root is denoted by a forward slash (/).

  • The home directory of any given user is specified by a tilde (~).

  • A single dot (".") indicates the current working directory.

  • A double dot ("..") indicates the directory above the current directory.

You can find out your current working directory (in other words, where you currently are in the Unix tree) by typing the command pwd (print working directory) at the prompt and then hitting return.

# print working directory  
pwd
# change directory to root directory
cd /
pwd
# change directory to home directory
cd ~
pwd
# change directory to root directory
cd /
pwd

Anatomy of a Unix Command

A basic UNIX command has 3 parts. These are:

  1. command name

  2. options (zero or more)

  3. arguments (zero or more)

In general, UNIX commands are written as single strings followed by the return key. Note that UNIX commands are case-sensitive, i.e. it does matter whether you type a letter in uppercase or lowercase.

Example

The example above has the following components:

  • the command name (wc)

  • one option (-l)

  • one argument (myfile.txt)

Options

  • Also called “switches” or “flags”.

  • Specify the way a given commands must work.

  • Preceded by one or two dashes (one dash if the switch is a single character, two dashes otherwise).

Tab completion

Tab completion is a useful features of UNIX where the shell automatically fills in partially typed commands when the tab key is used. This is advantageous in many ways:

  • Commands or filenames with long or difficult spelling require fewer keystroke to reach

  • In the case of multiple possible completions, the shell will list all filenames beginning with those few characters.

History

Another useful feature is the fact that the shell remembers each typed command, so you can easily recall and run previous commands.

  • By using the up and down arrows at the prompt, you can revisit the command history and recall previously typed commands.

  • Previous commands can be re-executed as they are or can be edited using the left/right arrow.

Use the history command to specify how many and which previous commands to display.

Examples:

Manual Pages

Unix comes with a preloaded documentation, also known as "man pages".

Anatomy of a man page

Each page is a self-contained document consisting of the following sections:

  • NAME: the name and a brief description of the command.

  • SYNOPSIS: how to use the command, including a listing of the various options and arguments you can use with the command. Square brackets ([ ]) are often used to indicate optional arguments. Any arguments or options that are not in square brackets are required.

  • DESCRIPTION: a more detailed description of the command including descriptions of each option.

  • SEE ALSO: References to other man pages that may be helpful in understanding how to use the command in question.

View man pages

To view a manual page, use the command man.

When viewing a man page, use the following keys to navigate the pages:

  • spacebar- view the next screen;

  • b (for "back") - view the previous screen;

  • arrow keys - navigate within the pages;

  • q (for "quit") - quit and return to the shell prompt.

For example, to view the man page associated with the command cp, type the following:

Search man pages

When you are not sure of the exact name of a command, you can use the apropos command to see all the commands with the given keyword on their man page.

For example, to see which commands are relevant to the task of copying, type the following:

wc –l myfile.txt
# display last typed commands
history 
#display the last n typed commands
history n
#execute command n
!n
#execute last command in the list
!!
# recall and execute the n-th last command
!-n
# recall the last 5 typed commands
history 5 
# recall and execute the 3rd last command
!-3
# recall and execute command number 120
!120
man command_name
man cp
apropos keyword 
apropos copy

Module

Multiple versions of various software packages are available on luria.mit.edu Versions are managed using module.

module commands

  • List available applications on the system:

module avail
  • Load an application so it can be used:

module add
  • Specify version number of an application, e.g.

module add tophat/2.0.12

We recommend you always module add the version number so that you know which version of the software is used. Otherwise, a running application may generate different results or break when the software is upgraded in the future.

Note: some packages are available only after you load the python (for python packages) or jre (for java packages) modules. For example, compare the output of the following:

module avail

with

module add jre/1.6.0-29
module add python/2.7.2
module avail
  • R and R packages - Various versions of R are installed on luria. Each one has a collection of associated packages. Execute the following commands taking note of the comment lines:

# Load an R version
module add r/2.15.3
# Start R
R
  • Once inside R, check out the available R packages

#List R packages and direct the list into an object called "a"
a<-installed.packages()
#Display the dimensions of the object "a"
dim(a)
#List the parts of object "a" that have package name and package version
a[,c(1,3)]
  • NOTE: Those package version numbers may not be consistent from month to month and can change without warning.

  • When working with R, capture your session info with the command below so you can reproduce the environment if needed:

sessionInfo()
  • quit R

q(save="yes")
#enter y at prompt
  • Once back in the shell, view the R data that remains

ls -lat | head
cat .Rhistory
  • List loaded applications:

module list
  • Unload an application:

module del
  • Print module help:

module help

vi / vim

How it works

  • The Unix command vi starts up the visual editor.

  • Typing vi followed by a file name will automatically open the file.

  • After issuing the command, the appearance of your screen changes. Rather than seeing the shell prompt, the content of the file filename.txt appears on the screen. If filename.txt hadn't existed before you invoked the vi command, the screen will appear mostly blank.

vi filename.txt

Modes

  • One of the fundamental concept to keep in mind when using vi is that three modes exist: command, insert, and visual.

  • In command mode, everything you type on the keyboard gets interpreted as a command.

  • In insert mode, most everything you type on the keyboard gets interpreted as characters and will appear in your file--letters, numbers, punctuation, line returns, etc.

  • When you are in insert mode, you can switch to command mode by pressing the Esc key on your keyboard (this is often at the left and upper portion of your keyboard).

  • When you are in command mode, there are many keys you can use to get into insert mode, each one gives you a slightly different way of starting to type your text. For example, if you are in command mode, you can simply type i on your keyboard to enter insert mode (i stands for insert). Then, the characters you type go into the file.

Basic commands

  • h = move one character to the left

  • l = move one character to the right

  • k = move up one line

  • j = move down one line

  • [ctrl] b = move back one screen

  • [ctrl] f = move forward one screen

  • quit Vi without saving anything (you'll lose any changes you made when using this command) type: :q!

  • save/write the file you're working on without exiting type: :w followed by a filename

  • save/write your file and quit the vi editor in one step by typing: :wq

The Unix Terminal and Shell

Terminal

  • Common name for the interface that allows you to input commands and see the output from commands.

Command prompt

  • Sequence of one or more characters, such as $ or % in combination with other customizable elements, such as the username or the current working directory.

  • It indicates readiness to accept commands.

Shell

  • Command line interpreter.

  • Users operate the computer by entering command input as text for a shell to execute or by creating text scripts of one or more such commands.

nano

  • Nano is freely available under the GPL for Unix and Unix-like systems.

  • It is a keyboard-oriented editor, controlled with key combinations, i.e. the "control" key (denoted by "^") PLUS certain letter keys.

Basic commands:

  • ^O : saves the current file

  • ^W : goes to the search menu

  • ^G : gets the help screen

  • ^X : exits

Basic Usage

  • Launch the editor by typing "nano" at the shell prompt. An overview of available commands is given at any time at the bottom of the editor screen.

  • Open an existing file by typing "nano" followed by the file name.

  • If you modify the file, when you exit you will be asked whether to save the changes and the name of the file to save.

Advanced Usage

Find and replace in nano

  • While holding down Ctrl key, press \

  • Enter your search string at prompt and press return

  • Enter your replacement string at prompt and press return

  • Respond to "Replace this instance" option menu:

    • Y to replace current instance

    • N to skip current instance

    • A to replace all instances

  • Note: The search string can also be a regular expression.

emacs

  • Prior to starting this demonstration, you should set up your unix directory according to the instructions in

  • is a powerful text editor with extensive functionality.

1) Control mode is activated by pressing and holding the <ctrl> key while pressing the second key. This process is written as:

C-key

This example shows the result after C-s (search):

2) Meta-mode is activated by pressing and releasing the <ESC> key followed by some other key that activates a sub-menu of commands. This process is written as:

ESC option

This example is what appears after entering ESC x (activate execute-extended-command menu)


  • check to see that you are in the IAP_2010 directory unix by entering the command:

  • the result should be (where USERNAME is your own unix username)

/home/USERNAME/IAP_2010/unix

  • If it is not, you can change to that directory by executing:

  • create a copy of the file replace.txt with:

  • Begin editing replace2.txt with:

  • Note, on new accounts a splash screen welcomes you to emacs use C-l to exit the screen and proceed to editing.

  • Insert text by regular typing.

  • Delete letters with C-d

  • Delete lines with C-k

  • Find and replace text with ESC x , then type replace-string on M-x line. enter search string, then return, then replacement text.

  • Move to the end of the document with ESC >

  • Move back to the start of the document with ESC <

  • Play tetris with ESC x, tetris

  • Save and exit with C-x followed by C-c then answer prompts if changes were made.

  • To turn off the splash welcome screen, edit a file called .emacs to have the contents:

Shell Scripts

  • Shell scripts are sequential collection of commands that generally represent a workflow.

  • Shell scripts can use other commands as building blocks.

Example

Suppose that occasionally you want the system to tell you some simple information, like your user id, the current directory and its content, and display the current date. To avoid typing this sequence of commands every time you want to know this information, you can compose a short script that execute the commands in a specific order.

  • Open a text editor (e.g. nano).

  • Type the sequence of commands to execute in the specific order they should be executed.

  • Save the file by giving to the script a meaningful name, for example info.sh.

  • Execute the command by invoking your shell (e.g. bash) followed by the script name. If you don't know what shell you are running, type "echo $SHELL" at the prompt.

Output Redirection and Piping

Output Redirection

  • Most processes initiated by UNIX commands take their input from the standard input (i.e. the keyboard) and write their output to the standard output (i.e. the terminal screen).

  • The "<" sign can be used to redirect the input, that is to specify that the input comes from something other than the keyboard.

  • The ">" sign can be used to redirect the output, that is to specify that the output goes to something other than the terminal screen.

  • The ">>" sign can be used to append the output to something other than the terminal screen.

Piping

  • A pipe is denoted by "|".

  • Several pipes can be used in the same command line to create a "pipeline".

  • A pipe takes the output of a command and immediately sends it as input to another command.

  • A pipe is often used in conjunction with the command "less" to view the output within the pager.

# list the current files and redirect the output to a file named "mylist.txt"
ls > mylist.txt
# view content of mylist.txt
cat mylist.txt
# redirect the input to a command 
cat < mylist.txt
# redirect the output and append 
cat mylist.txt > list1.txt
cat mylist.txt >> list2.txt
# view content 
cat list2.txt
#view users connected
who
#count the number of users connected
who | wc -l
#display the content of bin
ls -la /usr/local/bin
#display the content of bin within the pager provided by "less"
ls -la /usr/local/bin | less
nano
nano file1.txt
clear
echo Today date is:
date
echo
echo Your current directory is:
pwd
echo
echo The files in your current directory are:
ls -lt
echo
echo Have a nice day!
bash info.sh

Software Installation

Filetree.png
pwd
cd /home/USERNAME/IAP_2010/unix
cp replace.txt replace2.txt
emacs replace2.txt
(setq inhibit-splash-screen t)
Unix utilities
emacs

Access Rights

  • Every user has a unique username and is member of 1+ groups.

  • Every file and directory has a owner, a group and a set of permission flags.

  • Flags specify permissions:

    • read, write and execute(rwx)

    • owner, group and others (ugo).

  • Programs need to have execute (x) permission to run in shell

ls -lt

Permissions are controlled with the commands chmod, chown, chgrp

  • Only owner can change permissions of a file

chmod

  • chmod has 2 different syntaxes

Syntax 1

  • assign (=), gives(+), or take away(-) permission

  • who corresponds to --> u (user), g (group), o (other)

  • permission corresponds to --> read (r), write (w), execute (x)

    • chmod who=permission

    • chmod who+permission

    • chmod who-permission

 ls -l catfile.txt
 chmod g=rwx catfile.txt
 ls -l catfile.txt
 ls -l catfile.txt
 chmod g-w catfile.txt
 ls -l catfile.txt
 ls -l catfile.txt
 chmod o+x catfile.txt
 ls -l catfile.txt

chmod n

  • numeric shortcuts to change mode in a compact way.

  • format: chmod n filename

    • n is a 3 digit number, values from 0 to 7

    • each digit is associated to user, group, other

  • examples:

ls -l catfile.txt
chmod 644 catfile.txt
ls -l catfile.txt
chmod 700 catfile.txt
ls -l catfile.txt

Conda Environment

About Conda

1. Conda is an open-source package management system and environment management system.

2. Package, dependency and environment management for any language---Python, R, Ruby, Lua, Scala, Java, JavaScript, C/C++, FORTRAN

3. Comparison of pip and conda

  • Pip installs Python software packaged as wheels or source distributions, which may require compilers and libraries. Conda packages are binaries. No need for compilers.

  • Conda packages are not limited to Python package.

  • Conda can create isolated environments. Pip has no built in support for environments but rather depends on other tools like virtualenv or venv to create isolated environments.

  • Pip installation does not check dependencies of all packages. Conda installation will verify that all requirements of all packages installed in an environment are met. This check can be slow.

  • A package may not be available as a conda package but is available on PyPI and can be installed with pip

4. Miniconda is a free and mini version of Anaconda.

Use Conda

1. Load miniconda3 module

Some online conda doc suggests running conda init to initialize conda environment. Please do not run this as it will add conda settings in your .bashrc file which may pollute your environment. You only want to call the conda environment when needed, not by default.

2. Managing environments

Conda allows you to create separate environments containing files, packages, and their dependencies that will not interact with other environments. When you begin using conda, you already have a default environment named base. To see available packages in the base environment

You can run the command "python -V" to see what is the python version in the base environment. You can't add your own programs into the base environment because you will not have permissions. Create your own separate environments to keep your programs isolated from each other. To see available conda environments

Create a new environment and install a package in it. We will name the environment snowflakes and install the package BioPython. Type the following:

Run python and then import Bio to confirm biopython can be imported. Then run conda deactivate to leave the snowflakes environment.

To delete an entire environment.

3. Show conda environment info

4. Exercise 1: install samtools

If you run "module avail samtools", you will find the latest module available is 1.10. Now you are going to install a newer version of samtools using conda. First create an environment named samtools

Visit and search for "samtools" which will lead you to page about installation instructions. Follow the instruction there. Run the samtools command to confirm it worked.

What if you want to install a different version of samtools? You can change the syntax to conda install -c bioconda samtools=1.1. This will downgrade your samtools from 1.12 to 1.11.

What if you want to keep both versions? HINT: create a separate conda environment named samtools-1.1.

5. Exercise 2: install umi_tools

Next you want to install a python package named umi_tools. Search for umi_tools in to get instructions.

You can download conda cheat sheet from or google conda cheat sheet.

module load miniconda3/v4
source /home/software/conda/miniconda3/bin/condainit
conda list
conda env list
conda create --name snowflakes
conda env list
conda activate snowflakes
conda install biopython
conda list
conda deactivate
conda remove --name snowflakes --all
conda info
conda create --name samtools
conda activate samtools
https://anaconda.org
https://pypi.org
https://anaconda.org
https://anaconda.org
https://docs.conda.io/projects/conda/en/latest/user-guide/cheatsheet.html
Ctrls.jpg

Basic Unix Commands

pwd

  • This command name stands for "print working directory".

  • It displays on the terminal the absolute path name of the current (working) directory.

  • It is useful to locate where you are currently in the Unix tree.

# print working directory
pwd

cd

  • This command name stands for "change directory".

  • It changes your current working directory to the specified location.

  • Your home directory is referred to as "~" (tilde).

  • The current directory is referred to with a single dot ( ".").

  • The directory located one level up the current directory (also known as "parent directory") is referred to with two dots ("..").

  • The last visited directory is referred to with an hyphen ("-").

# go to root directory of the system and print the working directory
cd /
pwd
# go to the home directory and print the working directory
cd ~
pwd
# change directory using the absolute path and print the working directory
cd /net/bmc-pub14/data/
pwd

ls

  • This command stands for "list".

  • It displays the content of a directory.

  • By default, the content is listed in lexicographic order.

# list the content of the current directory
ls

# list content of the parent directory
ls ../
  • Command switches allows to list the directory's content with more or less detail and sorted according to different criteria.

    • The switch "a" lists all files and directories, even those starting with "." that are normally hidden.

    • The switch "l" lists the content in "long" format, including the total size (in blocks), mode (permission), the number of links, the owner, the size, the date of last modification, and the file or subdirectory name. Note that the first hyphen is replaced with a “d” if the item is a directory.

    • The switch "t" sorts the content of the directory by time modified (most recently modified first).

    • The switch "S" sorts the content of the directory by size (smaller first).

    • The switch "r" reverses the order of the sort.

    • The switch "R" recursively lists subdirectories encountered.

    • The switch "u" specifies the use of time of last access of the file, instead of last modification, for sorting the content.

    • The switch "U" specifies the use of time of file creation, instead of last modification, for sorting the content.

    • The switch "F" displays a slash ("/") immediately after a directory, an asterisk ("*") after an executable, an at sign ("@") after a symbolic link.

# list all files
ls -a
# list in long format
ls -l
# list in long format, sorting by date
ls -lt 
# list in reverse lexicographic order
ls -r
# list by size 
ls -S
# list in long format according to last access
ls -lu 
# list in long format according to the time of creation
ls -lU
# display “/” after a directory, “*” after an executable, “@” after a symbolic link
ls -F 
  • To view the commands available on luria, execute the following (note the space-separated list of directories to list):

ls /bin/ /usr/bin/ /usr/local/bin/

mkdir

  • This command name stands for "make a directory".

  • It creates a new folder (or directory). If no path is specified, the new directory is created in the current directory.

  • The switch "-v" specifies a verbose mode: a message with the folder(s) created is printed on the screen.

# create a directory named "testdir1" with a subdirectory named "testdir2"
mkdir testdir1
mkdir testdir1/testdir2

# change current directory directly to "testdir2"
cd testdir1/testdir2 
# go to the parent directory (i.e. testdir1) and print the working directory
cd ..
pwd
# create a new directory named "testdir3" with the verbose mode on
mkdir -v testdir3

cp

  • This command name stands for "copy".

  • It makes copies of files and directories to the specified location.

  • The switch "v" enables the verbose mode: messages describing the files copied are displayed on the screen.

  • The switch "i" enables the interactive mode: confirmation is requested before overwriting files.

  • Wildcards symbols such as "*" or "?" are commonly used to copy multiple files with a single command.

    • The symbol "*" stands for any number of alphanumeric characters.

    • The symbol "?" stands for a single alphanumeric character.

The following examples assume you have a directory named "unix_class" in your home directory:

# create a directory "unix_class" in your home directory and access it 
mkdir ~/unix_class
cd ~/unix_class
# copy the file named arrayDat.txt into your unix_class directory
cp /net/ostrom/data/dropbox/arrayDat.txt ~/unix_class/
ls

# the above command is equivalent to the following: 
cp /net/ostrom/data/dropbox/arrayDat.txt ./
ls
# use the interactive mode and type "y" to confirm the choice 
cp -i /net/ostrom/data/dropbox/arrayDat.txt ~/unix_class/
ls
# use the verbose mode to see the file being copied 
cp -v /net/ostrom/data/dropbox/arrayDat.txt ~/unix_class/
# make a local copy of the file call it "arrayDat1.txt" 
cp arrayDat.txt arrayDat1.txt
ls
# copy all the files with suffix "array”  into the current directory 
cp /net/ostrom/data/dropbox/UNIX/array* ./
ls

# copy any file whose extension is "txt" 
cp -i /net/ostrom/data/dropbox/UNIX/*.txt ./
ls

mv

  • This command name stands for "move".

  • It moves (renames) files and directories.

  • Several switches are available to specify the behavior of this command, including "-i" (interactive mode) and "-v" (verbose).

# cp arrayDat.txt into arrayDat1.txt, then rename arrayDat1.txt as arrayDat2.txt 
cp arrayDat.txt arrayDat1.txt
mv arrayDat1.txt arrayDat2.txt
ls
# rename in interactive mode, i.e. ask for confirmation before overwriting existing file 
cp arrayDat2.txt arrayDat3.txt
ls
mv -i arrayDat2.txt arrayDat3.txt
ls
# rename in verbose mode, i.e. print information on the screen 
mv -v arrayDat3.txt arrayDat4.txt
ls

rmdir

  • This command stands for "remove directory".

  • It deletes the specified directory.

  • The switch "-v" specifies a verbose mode: a message with the folder(s) deleted is printed on the screen.

  • Note that in general you cannot delete a directory that is not empty. The content of such directory has to be deleted before the directory itself can be deleted.

# remove the folder "testdir2" in the location specified by the path
rmdir ~/testdir1/testdir2
# remove the folder "newdir1"
rmdir testdir1
# remove the folder "newdir3" using the verbose mode
rmdir -v testdir3

rm

  • This command name stands for "remove".

  • It deletes files and directories.

  • Several switches are available to specify the behavior of this command, including "-i" (interactive mode) and "-v" (verbose).

# create copies of arrayDat.txt 
cp arrayDat.txt arrayDat1.txt
cp arrayDat.txt arrayDat2.txt
cp arrayDat.txt arrayDat3.txt
cp arrayDat.txt arrayDat4.txt
ls
# delete file 
rm arrayDat2.txt
# delete file in interactive mode, i.e. ask for confirmation before removing 
rm -i arrayDat3.txt
# delete file in verbose mode, i.e. prints the name of file deleted 
rm -v arrayDat4.txt
  • The "rm" command can be used in conjunction with wildcard symbols to delete multiple files at once. Extreme caution should be used, as action are not often reveresible. It is good practice either to use the interactive/verbose mode or use the command "ls" with the intended pattern, before invoking the command "rm".

# create a copy of arrayDat1.txt, then delete files with suffix “arrayDat”,
  followed by a digit and extension “txt”
cp arrayDat1.txt arrayDat2.txt
rm -i arrayDat?.txt
  • The switch "-r" deletes the content of a directory recursively,

i.e. deletes the directory's files and subdirectories, including their content.

#create a subdirectory and copy *.txt files 
mkdir testdir
cp *.txt testdir/
ls 
ls testdir/*

# remove directory and its content 
rmdir testdir
rm -r testdir
ls

cat

  • This command name stands for "concatenate".

  • It concatenates the content of file(s) and prints it out.

  • The switch "-n" specifies to number each line, starting from 1.

  • The "cat" command is often used in conjuction with the pipe ("|") to view the content of long files.

  • The "cat" command can be used to copy and append the content of files using output redirection (">").

# view the content of myfile.txt 
cat arrayDat.txt
# numbers each line of myfile.txt 
cat -n arrayDat.txt
# copy the content of myfile.txt using output redirection
cat arrayDat.txt > file1.txt
cat arrayDat.txt > file2.txt
ls

# concatenate the content of the two files
cat file1.txt file2.txt
# append the content of file2.txt to file1.txt
cat file2.txt >> file1.txt
cat file1.txt
  • When used without argument, the "cat" command takes STDIN as default argument. You can use this behavior to type in brief information, rather than invoking a standard text editor.

# type the following command 
cat > testfile.txt
# type any text, then hit Return, then press Ctrl-C 
#view content of newly created file
cat testfile.txt

less

  • This command is a "pager" that allows the user to view (but not modify) the contents of a text file one screen at a time.

  • The space-bar is used to advance to the next page.

  • The key "q" is used to "quit" the pager.

  • The switch "-n" allows to view the content of a file starting at the specified line.

# view the content of arrayDat.txt, hit the space bar to advance, type "q" to quit
less arrayDat.txt 
# view content of file starting at line 5
less +5 arrayDat.txt

head

  • This command displays the top part of a file.

  • By default, the first 10 lines of a file are displayed.

  • The switch "-n" specifies the number of lines to display.

  • The switch "-b" specifies the number of bytes to display.

# display the first lines of a file (10 lines by default)
head arrayDat.txt
#display the first 2 lines of a file
head -n 2 arrayDat.txt

tail

  • This command displays the bottom part of a file.

  • By default, the last 10 lines of a file are displayed.

  • The switch "-n" specifies the number of lines to display.

  • The switch "-b" specifies the number of bytes to display.

# display the last lines of a file (10 lines by default)
tail arrayDat.txt
#display the last 2 lines of a file
tail-n 2 arrayDat.txt

top

  • This command displays Linux tasks

  • The top program provides a dynamic real-time view of a running system

du

  • This command stands for Disk Usage

  • It estimates file space usage

# print sizes in human readable format
du -h

df

  • This command stands for Disk Free

  • It reports file system disk space usage

# print sizes in human readable format
df -h

Slurm

About Slurm

  • Jobs are managed on luria.mit.edu using Slurm.

  • Slurm is an advanced job scheduler for a cluster environment.

  • The main purpose of a job scheduler is to utilize system resources in the most efficient way possible.

  • The number of tasks (slots) required for each job should be specified with the "-n" flag

  • Each node provides 16,32, or 96 slots.

  • The process of submitting jobs to Slurm is done using a script.

Creating a simple Slurm script

  • The process of submitting jobs to Slurm is done generally using a script. The job script allows all options and the programs/commands to be placed in a single file.

  • It is possible to specify options via command line, but it becomes cumbersome when the number of options is significant.

  • An example of a script that can be used to submit a job to the cluster is reported below. Start by opening a file and copy and paste the following commands, then save the file as myjob.sh or any other meaningful name. Note: Job names can not start with a number.

The first 5 lines specify important information about the job submitted, the rest of the file contain some simple UNIX commands (date, sleep) and comments (lines starting with #####).

  • The "#SBATCH" is used in the script to indicate an slurm option.

  • #SBATCH -N 1: You should always include this line exactly. The number followed by -N must always be 1 unless you run MPI applications which is rare for typical bioinformatics software.

  • #SBATCH -n: This is the number of tasks requested. The recommended maximum number allowed is 16 in the normal partition, i.e., don't ask for more than 16 tasks in your script unless you receive special instructions from the system administrator. It is important to request resources as accurate as you can. If possible, please do not request more than what you need, and do not request less than what you need. The best way to find out how much you need is through testing. While your job is running, you can ssh to the node and use the top command to see if it is using the requested resources properly. Note that what you requested from the slurm scheduler by -n is not the actual CPUs allocated by the OS.

  • #SBATCH --mail-user=[]: You must replace [] with your email address

  • #SBATCH -t [min] OR -t [days-hh:mm:ss]: Specifies the wall clock limit. It has a time limit of 14 days at maximum, i.e. a job can not run for more than 14 days on luria.

Submitting a job

Submit your job by executing the command:

where myjob.sh is the name of the submit script. After submission, you should see the message: Submitted batch job XXX where XXX is an auto-incremented job number assigned by the scheduler. For example: Submitted batch job 3200863

Submitting jobs to a specific node

  • To submit your job to a specific node, use the following command:

where X is a number specifying the node you intend to use. For example, the following command will submit myjob.sh to node c5 : sbatch -w c5 myjob.sh

  • To submit your job while excluding nodes (for example exclude c[5-22]), use the following command:

Interactive Sessions

You should not run interactive jobs on the head node of Luria. The head node is shared by all users. An interactive job may negatively affect how other users interact with the head node or even make the head node inaccessible to all users. Thus, instead of running myjob.sh on the head node, you should run "sbatch myjob.sh". However, you can run an interactive job on a compute node. This can be done using command "srun --pty bash", which will open up a remote shell on a random compute node.

Then you can run program interactively. This is often useful when you are compiling, debugging, or testing a program, and the program does not take long to finish.

Sometimes your program (such as matlab or R) may need the X11 window for graphical user interface, and then you can use the command srun --pty bash. You will also need to install an X11 client such as Xming or XQuartz on your machine to display X window and enable X11 forwarding on your ssh client.

Remember to exit cleanly from interactive sessions when done; otherwise it will be killed without your notice.

User job limitations

A user can submit up to 1000 jobs at a time. Jobs are typically scheduled on on a first-come, first-served basis. If you submit a lot of jobs at a time and take a lot of resources, others will have to wait until your jobs complete which are not optimal for cluster usage. If you do need to submit a lot of jobs, please add these options in your job scripts.

The nice option is to lower the job priority and the exclude option is to exclude half of the nodes for your jobs. This will allow others’ jobs getting a chance to run while allowing you to run some of your jobs.

Monitoring and controlling a job

  • To monitor the progress of your job use the command:

  • To display information relative only to the jobs you submitted, use the following (where username is your username):

  • A useful tip on customizing the output of squeue

  • Get more information on a job

Viewing job results

  • Any job run on the cluster is output to a slurm output file slurm-XXX.out where XXX is the job ID number (for example: slurm-3200707.out).

After submitting myjob.sh, any output that would normally be printed out to the screen is now redirected to:slurm-XXX.out

You can also redirect output within the submission script.

Deleting a job

  • To stop and delete a job, use the following command:

where XXX is the job number assigned by slurm when you submit the job using sbatch. You can only delete your jobs.

Checking the host status

  • To check the status of host and its nodes, you can use the following command:

states of nodes:

  • Customizing output

Job arrays

Slurm job arrays provide an easy way to submit a large number of independent processing jobs. For example, job arrays can be used to process the same workflow with different datasets. When a job array script is submitted, a specified number of array tasks are created based on the master sbatch script.

SLURM provides a variable named $SLURM_ARRAY_TASK_ID to each task. It can be used inside the job script to handle input/output for that task. For example, create a file named ex3.sh with the following lines.

Example job

Create a slurm job script named ex1.sh that processes a fastq file. Include the following lines in your job script. Determine what should be the appropriate -n value. Use the top command to watch the CPU and memory while the job is running on a compute node.

Now you want to process multiple fastq files and run the script in parallel. One way to do is to make the fastq filename as an argument and sbatch the script with the filename in the argument. Create a new script named ex2.sh and include the following lines

You can submit the script twice with arguments

Running special software through Slurm

Running Jupyter notebook

  • Get an interactive terminal

You will get a node allocated by the slurm scheduler. For example, c2.

  • Start notebook on the allocated node (e.g. c2).

  • Open ssh conncetion to Luria.mit.edu and the node (e.g. c2) from your local machine, i.e. your local desktop or laptop SSH client from either the Terminal (Mac) or the PowerShell (Windows). Replace username with your own username, and c2 with the actual compute node.

The above commands are actually running

It is likely that some other user has taken port 8888 ($HEAD_NODE_PORT) on the head node. In that case, you will get an error "bind: Address already in use". You should then change $HEAD_NODE_PORT from 8888 to a different port such as 8887 or 8886.

You can also change $MY_MACHINE_PORT and $COMPUTE_NODE_PORT, but that is only needed if you have another process that has taken 8888 on your local machine, or another user happens to take 8888 on the same compute node.

  • Tunnel from your local machine (either Windows or Mac) to Jupyter notebook running on $COMPUTE_NODE_NAME

Direct your browser on your local machine to

  • Close connection when finished

Running Rstudio server

There is no Rstudio server installed on Luria. You can run Rstudio server using a singularity image with a wrapper script. Please see an . The steps of getting an interactive terminal, and opening ssh tunnel from your local machine to the allocated computing node are similar to the Running Jupyter notebook steps above, but with a different module (module load singularity) and a different port by default (for example, in your local machine). Here local machine refers to your Mac or Windows PC. Run the ssh command from the Terminal (Mac), or the Windows PowerShell (Windows). For example

Tip 1: To allow the Rstudio server singularity container to access your data located in your storage server (e.g. rowley or bmc-lab2), you need to edit the rstudio.sh file to bind your data path. At the end of the rstudio.sh script, you will see a singularity exec command with many --bind arguments. In the script, you should add additional --bind arguments, for example, --bind /net/rowley/ifs/data// or --bind /net/bmc-lab2/data/lab//, where you need to replace labname and username with actual values.

Tip 2: Ignore the open an SSH tunnel command printed in the rstudio.sh standard output, use the ssh command in the above example instead on your local machine.

Tip 3: If someone else has taken the port 8787 on the head node of Luria, you will get an error like "bind: Address already in use" when you run the ssh command from your laptop. In that case, choose a different port number, e.g. 8777 (please refer to the previous section on Jupyter notebook for an explanation of port numbers). For example

Alternatively, you can choose a different port number before starting the rstudio.sh script on the compute node, for example

Tip 4: The script uses a custom image with Seurat dependencies pre-installed. You can select your own R version or image based on the documentation of the

Tip 5: If you are using Windows Secure CRT to connect to Luria, you can set up Port Forwarding (Tunneling). Select Options -> Session Options. Click on the "Port Forwarding" and then Click on the "Add" button to add a forwarding setup. You will get the Local Port Forwarding Properties dialog. Choose a Name, and then set the port for both Local and Remote. For example, 8777. On the head node of Luria, you will also need to run

Running Matlab

  • An example MATLAB script: matlabcode.m

Note: fname need to be changed correspondingly

  • A shell job submission script submitting matlabcode.m to a compute nodes

  • To automate Matlab script, interactive session can be turned off.

  • On a UNIX system, MATLAB uses the X-Windows Interface to interact with the user. In a batch execution, this communication is not necessary.

  • We don't need the initial "splash" which displays the MATLAB logo. We can turn this off with the -nosplash option.

  • We also don't want the interactive menu window to be set up, which can be suppressed with the -nodisplay option

  • Two other options are available which may be useful in suppressing visual output and logs, the -nojvm option ("no Java Virtual Machine") and -nodesktop.

#!/bin/bash
#SBATCH -N 1                      # Number of nodes. You must always set -N 1 unless you receive special instruction from the system admin
#SBATCH -n 1                      # Number of tasks. Don't specify more than 16 unless approved by the system admin
#SBATCH --mail-type=END           # Type of email notification- BEGIN,END,FAIL,ALL. Equivalent to the -m option in SGE
#SBATCH --mail-user=[]            # Email to which notifications will be sent. Equivalent to the -M option in SGE. You must replace [] with your email address.
#############################################
echo print all system information
uname -a
echo print effective userid
whoami
echo Today date is:
date
echo Your current directory is:
pwd
echo The files in your current directory are:
ls -lt
echo Have a nice day!
sleep 20
sbatch myjob.sh
 sbatch -p [normal] -w [cX] [script_file]
 sbatch --exclude c[5-22] myjob.sh
srun --pty bash
#SBATCH --nice=100000
#SBATCH --exclude=c[5-22]
 squeue
 squeue -u username
alias qf='squeue --format="%.18i %.16j %.8u %.8T %.10M %.4C %.20V %.8R"'
qf
scontrol show job <jobid>
scancel XXX
sinfo -N 
mix : consumable resources partially allocated
idle : available to requests consumable resources
drain : unavailable for use per system administrator request
drng : currently executing a job, but will not be allocated to additional jobs.
alloc : consumable resources fully allocated
down : unavailable for use. Slurm can automatically place nodes in this state if some failure occurs.
alias qn='sinfo -N --format="%.8N %.9P %.13C %.8O %.8e %.8m %.8T"'
qn
#!/bin/bash

#SBATCH -N 1
#SBATCH -n 4
#SBATCH --array=1-2

module load fastqc/0.11.5
module load bwa/0.7.17
FASTQDIR=/net/rowley/ifs/data/dropbox/UNIX
WORKDIR=~/data/class
mkdir -p $WORKDIR
cd $WORKDIR
FILE=$(ls $FASTQDIR/*.fastq | sed -n ${SLURM_ARRAY_TASK_ID}p)
fastqc -o $WORKDIR $FILE
bwa mem -t4 -f $(basename $FILE).sam /home/Genomes/bwa_indexes/mm10.fa $FILE
module load fastqc/0.11.5
module load bwa/0.7.17
mkdir -p ~/data/class
cd ~/data/class
fastqc -o ~/data/class /net/rowley/ifs/data/dropbox/UNIX/test_1.fastq
bwa mem -t16 -f ex1.sam /home/Genomes/bwa_indexes/mm10.fa /net/rowley/ifs/data/dropbox/UNIX/test_1.fastq
module load fastqc/0.11.5
module load bwa/0.7.17
FILE=$1
WORKDIR=~/data/class
mkdir -p $WORKDIR
cd $WORKDIR
fastqc -o $WORKDIR $FILE
bwa mem -t16 -f $(basename $FILE).sam /home/Genomes/bwa_indexes/mm10.fa $FILE
sbatch ex2.sh /net/rowley/ifs/data/dropbox/UNIX/test_1.fastq
sbatch ex2.sh /net/rowley/ifs/data/dropbox/UNIX/test_2.fastq
$ srun --pty bash
$ module load python3/3.6.4 # skip this if you have your own jupyter notebook installed through conda, pip or any other approach
$ export XDG_RUNTIME_DIR="" # skip this if you have your own jupyter notebook installed through conda, pip or any other approach
$ jupyter notebook --no-browser --port=8888 # 8888 is the port running on the compute node $COMPUTE_NODE_PORT
$ MY_MACHINE_PORT=8888
$ HEAD_NODE_PORT=8888
$ COMPUTE_NODE_PORT=8888
$ COMPUTE_NODE_NAME=c2
$ ssh -t [email protected] -L ${MY_MACHINE_PORT}:localhost:${HEAD_NODE_PORT} ssh ${COMPUTE_NODE_NAME} -L ${HEAD_NODE_PORT}:localhost:${COMPUTE_NODE_PORT}
$ ssh -t [email protected] -L 8888:localhost:8888 ssh c2 -L 8888:localhost:8888
ctrl-c
Run on head node: $ srun --pty bash # Get interactive terminal
Run on compute node (e.g. c7): $ module load singularity/3.5.0 # load singularity module on computing node
Run on compute node (e.g. c7): $ IMAGE=pansapiens/rocker-seurat:4.1.2-4.1.0 ./rstudio.sh # Launch wrapper script on computing node (which downloads the Singularity image only once if run for the first time)
Run on your local machine: % ssh -t [email protected] -L 8787:localhost:8787 ssh c7 -L 8787:localhost:8787 # replace username and c7 with actual values
Run on your local machine: open a browser and point URL to http://localhost:8787
Run on your local machine: % ssh -t [email protected] -L 8787:localhost:8777 ssh c7 -L 8777:localhost:8787 # replace username and c7 with actual values
Run on your local machine: open a browser and point URL to http://localhost:8787
Run on compute node (e.g. c7): $ export RSTUDIO_PORT=8777
Run on compute node (e.g. c7): $ IMAGE=pansapiens/rocker-seurat:4.1.2-4.1.0 ./rstudio.sh
Run on your local machine: % ssh -t [email protected] -L 8777:localhost:8777 ssh c7 -L 8777:localhost:8777 # replace username and c7 with actual values
Run on your local machine: open a browser and point URL to http://localhost:8777
Run on head node: $ ssh c7 -L 8777:localhost:8787 # replace c7 with actual node where the Rstudio server is running
Run on your local machine: open a browser and point URL to http://localhost:8777
fprintf('create a 3 by 3 random matrix X\n')
X=rand(3)
fprintf('create a 3 by 4 random matrix Y\n')
Y=rand(3,4)
fprintf('calculate the product of matrices X and Y\n')
Z=X*Y
fprintf('calculate the inverse of matrix X\n')
A=inv(X)
fprintf('transpose matrix Z\n')
B=Z'
fprintf('find out the toolboxes installed on rous\n')
ver
fprintf('find out the location of matlab toolboxes on rous\n')
matlabroot
fprintf('find out how to use Matlab Bioinformatics Toolbox\n')
help bioinfo
fprintf('A brief Example of using Matlab Bioinformatics Toolbox\n')
fprintf('Load data into the Matlab enviroment\n')
load yeastdata.mat
fprintf('Get the size of the data\n')
numel(genes)
fprintf('Display the 15th gene\n')
genes{15}
fprintf('Display the expression of the 15th gene\n')
yeastvalues(15,:)
fprintf('Display the time series in hours\n')
times
fprintf('Plot expression profile for gene15\n')
figure1=figure
plot(times, yeastvalues(15,:))
xlabel('Time Hours')
ylabel('Log2 Relative Expression Level')
fname='/home/duan/course/2015/June'
saveas(figure1,fullfile(fname,'YAL054C_expression_profile'),'jpg')
#!/bin/sh
#$ -S /bin/sh
#$ -cwd
#$ -V
#$ -m e
#$ -M [email protected]
#$ -pe whole_nodes 1
#############################################
module load matlab/2011b
matlab -nosplash -nodisplay -nojvm -nodesktop <matlabcode.m >output
http://localhost:8888
example wrapper
http://localhost:8787
example wrapper
Meta.jpg