LogoLogo
LogoLogo
  • The Barbara K. Ostrom (1978) Bioinformatics and Computing Facility
  • Computing Resources
    • Active Data Storage
    • Archive Data Storage
    • Luria Cluster
      • FAQs
    • Other Resources
  • Bioinformatics Topics
    • Tools - A Basic Bioinformatics Toolkit
      • Getting more out of Microsoft Excel
      • Bioinformatics Applications of Unix
        • Unix commands applied to bioinformatics
        • Manipulate NGS files using UNIX commands
        • Manipulate alignment files using UNIX commands
      • Alignments and Mappers
      • Relational databases
        • Running Joins on Galaxy
      • Spotfire
    • Tasks - Bioinformatics Methods
      • UCSC Genome Bioinformatics
        • Interacting with the UCSC Genome Browser
        • Obtaining DNA sequence from the UCSC Database
        • Obtaining genomic data from the UCSC database using table browser queries
        • Filtering table browser queries
        • Performing a BLAT search
        • Creating Custom Tracks
        • UCSC Intersection Queries
        • Viewing cross-species alignments
        • Galaxy
          • Intro to Galaxy
          • Galaxy NGS Illumina QC
          • Galaxy NGS Illumina SE Mapping
          • Galaxy SNP Interval Data
        • Editing and annotation gene structures with Argo
      • GeneGO MetaCore
        • GeneGo Introduction
        • Loading Data Into GeneGO
        • Data Management in GeneGO
        • Setting Thresholds and Background Sets
        • Search And Browse Content Tab
        • Workflows and Reports Tab
        • One-click Analysis Tab
        • Building Network for Your Experimental Data
      • Functional Annotation of Gene Lists
      • Multiple Sequence Alignment
        • Clustalw2
      • Phylogenetic analysis
        • Neighbor Joining method in Phylip
      • Microarray data processing with R/Bioconductor
    • Running Jupyter notebooks on luria cluster nodes
  • Data Management
    • Globus
  • Mini Courses
    • Schedule
      • Previous Teaching
    • Introduction to Unix and KI Computational Resources
      • Basic Unix
        • Why Unix?
        • The Unix Tree
        • The Unix Terminal and Shell
        • Anatomy of a Unix Command
        • Basic Unix Commands
        • Output Redirection and Piping
        • Manual Pages
        • Access Rights
        • Unix Text Editors
          • nano
          • vi / vim
          • emacs
        • Shell Scripts
      • Software Installation
        • Module
        • Conda Environment
      • Slurm
    • Introduction to Unix
      • Why Unix?
      • The Unix Filesystem
        • The Unix Tree
        • Network Filesystems
      • The Unix Shell
        • About the Unix Shell
        • Unix Shell Manual Pages
        • Using the Unix Shell
          • Viewing the Unix Tree
          • Traversing the Unix Tree
          • Editing the Unix Tree
          • Searching the Unix Tree
      • Files
        • Viewing File Contents
        • Creating and Editing Files
        • Manipulating Files
        • Symbolic Links
        • File Ownership
          • How Unix File Ownership Works
          • Change File Ownership and Permissions
        • File Transfer (in-progress)
        • File Storage and Compression
      • Getting System Information
      • Writing Scripts
      • Schedule Scripts Using Crontab
    • Advanced Utilization of IGB Computational Resources
      • High Performance Computing Clusters
      • Slurm
        • Checking the Status of Computing Nodes
        • Submitting Jobs / Slurm Scripts
        • Interactive Sessions
      • Package Management
        • The System Package Manager
        • Environment Modules
        • Conda Environments
      • SSH Port Forwarding
        • SSH Port Forwarding Jupyter Notebooks
      • Containerization
        • Docker
          • Docker Installation
          • Running Docker Images
          • Building Docker Images
        • Singularity
          • Differences from Docker
          • Running Images in Singularity
      • Running Nextflow / nf-core Pipelines
    • Python
      • Introduction to Python for Biologists
        • Interactive Python
        • Types
          • Strings
          • Lists
          • Tuples
          • Dictionaries
        • Control Flow
        • Loops
          • For Loops
          • While Loops
        • Control Flows and Loops
        • Storing Programs for Re-use
        • Reading and Writing Files
        • Functions
      • Biopython
        • About Biopython
        • Quick Start
          • Basic Sequence Analyses
          • SeqRecord
          • Sequence IO
          • Exploration of Entrez Databases
        • Example Projects
          • Coronavirus Exploration
          • Translating a eukaryotic FASTA file of CDS entries
        • Further Resources
      • Machine Learning with Python
        • About Machine Learning
        • Hands-On
          • Project Introduction
          • Supervised Approaches
            • The Logistic Regression Model
            • K-Nearest Neighbors
          • Unsupervised Approaches
            • K-Means Clustering
          • Further Resources
      • Data Processing with Python
        • Pandas
          • About Pandas
          • Making DataFrames
          • Inspecting DataFrames
          • Slicing DataFrames
          • Selecting from DataFrames
          • Editing DataFrames
        • Matplotlib
          • About Matplotlib
          • Basic Plotting
          • Advanced Plotting
        • Seaborn
          • About Seaborn
          • Basic Plotting
          • Visualizing Statistics
          • Visualizing Proteomics Data
          • Visualizing RNAseq Data
    • R
      • Intro to R
        • Before We Start
        • Getting to Know R
        • Variables in R
        • Functions in R
        • Data Manipulation
        • Simple Statistics in R
        • Basic Plotting in R
        • Advanced Plotting in R
        • Writing Figures to a File
        • Further Resources
    • Version Control with Git
      • About Version Control
      • Setting up Git
      • Creating a Repository
      • Tracking Changes
        • Exercises
      • Exploring History
        • Exercises
      • Ignoring Things
      • Remotes in Github
      • Collaborating
      • Conflicts
      • Open Science
      • Licensing
      • Citation
      • Hosting
      • Supplemental
Powered by GitBook

MIT Resources

  • https://accessibility.mit.edu

Massachusetts Institute of Technology

On this page

Was this helpful?

Export as PDF
  1. Bioinformatics Topics
  2. Tasks - Bioinformatics Methods
  3. Multiple Sequence Alignment

Clustalw2

PreviousMultiple Sequence AlignmentNextPhylogenetic analysis

Last updated 1 year ago

Was this helpful?

Multiple sequence alignment and phylogenetic analysis allow the identification of conserved positions in protein and nucleic acid sequences. This can lead to an appreciation of the evolutionary history of a group of sequences.

  • The clustal family of programs is commonly used to produce multiple sequence alignments. Other options are available as well:

  • There are 3 different ways to use the clustal programs.

  1. Web-based clustalw can be used .

  2. ClustalX GUI (and also the command-line clustalw executable) is available for download .

  3. Command-line clustalw2 is installed on rous

  • The major difference between the 3 options is the interface and the calculation of bootstrap values for the tree, which is only available in the command-line and GUI versions. Other details of running the program are the same. For this lesson, we will use the web-based version to align these sequences.

  • Login to rous and copy the clustal training files to your home directory.

cp -r /net/n3/data/Teaching/IAP_2010_day3/clustal .
NOTE: -r means "recursive", copy the folder and everything inside it to the new location "."
The lack of a trailing "/" on the clustal path ensures that the folder is moved, not just it's contents.
  • enter the clustal directory:

cd clustal
  • view the raw sequence files:

cat *.pep
  • launch the clustalw application:

clustalw2
 **************************************************************
 ******** CLUSTAL 2.0.12 Multiple Sequence Alignments  ********
 **************************************************************


     1. Sequence Input From Disc
     2. Multiple Alignments
     3. Profile / Structure Alignments
     4. Phylogenetic trees

     S. Execute a system command
     H. HELP
     X. EXIT (leave program)


Your choice: 
  • Select option 1 to load sequence and specify the file "tiny.pep". Before being returned to the original meny, you should see:

Sequences should all be in 1 file.

7 formats accepted: 
NBRF/PIR, EMBL/SwissProt, Pearson (Fasta), GDE, Clustal, GCG/MSF,                  RSF.


Enter the name of the sequence file : tiny.pep
Sequence format is Pearson
Sequences assumed to be PROTEIN


Sequence 1: zf_12a1_a     28 aa
Sequence 2: zf_12a1_b     28 aa
Sequence 3: hs_a11        28 aa
Sequence 4: hs_22a1       28 aa
Sequence 5: zf_a11        28 aa
  • From the original menu, now select option 2. Multiple Alignments. This activates the Alignment menu:

****** MULTIPLE ALIGNMENT MENU ******


    1.  Do complete multiple alignment now Slow/Accurate
    2.  Produce guide tree file only
    3.  Do alignment using old guide tree file

    4.  Toggle Slow/Fast pairwise alignments = SLOW

    5.  Pairwise alignment parameters
    6.  Multiple alignment parameters

    7.  Reset gaps before alignment? = OFF
    8.  Toggle screen display          = ON
    9.  Output format options
    I. Iteration = NONE

    S.  Execute a system command
    H.  HELP
    or press [RETURN] to go back to main menu
  • All default options are correct except option 9,select this to see the output format menu:

 ********* Format of Alignment Output *********


     F. Toggle FASTA format output       =  OFF

     1. Toggle CLUSTAL format output     =  ON
     2. Toggle NBRF/PIR format output    =  OFF
     3. Toggle GCG/MSF format output     =  OFF
     4. Toggle PHYLIP format output      =  OFF
     5. Toggle NEXUS format output       =  OFF
     6. Toggle GDE format output         =  OFF

     7. Toggle GDE output case           =  LOWER
     8. Toggle CLUSTALW sequence numbers =  OFF
     9. Toggle output order              =  ALIGNED

     0. Create alignment output file(s) now?

     T. Toggle parameter output          = OFF
     R. Toggle sequence range numbers =  OFF

     H. HELP
  • User toggle #4 to activate PHYLIP output, hit return to revisit the alignment menu

  • Select option 1 to do the alignment. Accept the default conditions and hit return until you end up at the main menu. Exit the program with "X".

cat tiny.[ap][lh][ny]
NOTE: Square brackets are regular expression syntax. For example [ap] = either an "a" or a "p" at that position.

This is the clustal format output:

CLUSTAL 2.0.12 multiple sequence alignment


zf_12a1_a       VADLVFLVDGSWSVGRENFRFIRSFIGA--
zf_12a1_b       KADLVFLIDGSWSIGDDSFAKVRQFVFS--
hs_22a1         HYDLVFLLDTSSSVGKEDFEKVRQWVAN--
hs_a11          YMDIVIVLDGSNSIYP--WVEVQHFLINIL
zf_a11          YMDIVIVLDGSNSIYP--WNEVQDFLINIL
                  *:*:::* * *:    :  :: ::  

This is the phylip format output:

     5     30
zf_12a1_a  VADLVFLVDG SWSVGRENFR FIRSFIGA-- 
zf_12a1_b  KADLVFLIDG SWSIGDDSFA KVRQFVFS-- 
hs_22a1    HYDLVFLLDT SSSVGKEDFE KVRQWVAN-- 
hs_a11     YMDIVIVLDG SNSIYP--WV EVQHFLINIL 
zf_a11     YMDIVIVLDG SNSIYP--WN EVQDFLINIL

The following interactive menu appears, options 1 and 2 will be used in this demonstration. Additional information is available

Muscle
T-Coffee
HERE
HERE
exampledata
HERE