Clustalw2

Multiple sequence alignment and phylogenetic analysis allow the identification of conserved positions in protein and nucleic acid sequences. This can lead to an appreciation of the evolutionary history of a group of sequences.

  • The clustal family of programs is commonly used to produce multiple sequence alignments. Other options are available as well:

  • There are 3 different ways to use the clustal programs.

  1. Web-based clustalw can be used HERE.

  2. ClustalX GUI (and also the command-line clustalw executable) is available for download HERE. exampledata

  3. Command-line clustalw2 is installed on rous

  • The major difference between the 3 options is the interface and the calculation of bootstrap values for the tree, which is only available in the command-line and GUI versions. Other details of running the program are the same. For this lesson, we will use the web-based version to align these sequences.

  • Login to rous and copy the clustal training files to your home directory.

cp -r /net/n3/data/Teaching/IAP_2010_day3/clustal .
NOTE: -r means "recursive", copy the folder and everything inside it to the new location "."
The lack of a trailing "/" on the clustal path ensures that the folder is moved, not just it's contents.
  • enter the clustal directory:

cd clustal
  • view the raw sequence files:

cat *.pep
  • launch the clustalw application:

clustalw2
  • The following interactive menu appears, options 1 and 2 will be used in this demonstration. Additional information is available HERE

 **************************************************************
 ******** CLUSTAL 2.0.12 Multiple Sequence Alignments  ********
 **************************************************************


     1. Sequence Input From Disc
     2. Multiple Alignments
     3. Profile / Structure Alignments
     4. Phylogenetic trees

     S. Execute a system command
     H. HELP
     X. EXIT (leave program)


Your choice: 
  • Select option 1 to load sequence and specify the file "tiny.pep". Before being returned to the original meny, you should see:

Sequences should all be in 1 file.

7 formats accepted: 
NBRF/PIR, EMBL/SwissProt, Pearson (Fasta), GDE, Clustal, GCG/MSF,                  RSF.


Enter the name of the sequence file : tiny.pep
Sequence format is Pearson
Sequences assumed to be PROTEIN


Sequence 1: zf_12a1_a     28 aa
Sequence 2: zf_12a1_b     28 aa
Sequence 3: hs_a11        28 aa
Sequence 4: hs_22a1       28 aa
Sequence 5: zf_a11        28 aa
  • From the original menu, now select option 2. Multiple Alignments. This activates the Alignment menu:

****** MULTIPLE ALIGNMENT MENU ******


    1.  Do complete multiple alignment now Slow/Accurate
    2.  Produce guide tree file only
    3.  Do alignment using old guide tree file

    4.  Toggle Slow/Fast pairwise alignments = SLOW

    5.  Pairwise alignment parameters
    6.  Multiple alignment parameters

    7.  Reset gaps before alignment? = OFF
    8.  Toggle screen display          = ON
    9.  Output format options
    I. Iteration = NONE

    S.  Execute a system command
    H.  HELP
    or press [RETURN] to go back to main menu
  • All default options are correct except option 9,select this to see the output format menu:

 ********* Format of Alignment Output *********


     F. Toggle FASTA format output       =  OFF

     1. Toggle CLUSTAL format output     =  ON
     2. Toggle NBRF/PIR format output    =  OFF
     3. Toggle GCG/MSF format output     =  OFF
     4. Toggle PHYLIP format output      =  OFF
     5. Toggle NEXUS format output       =  OFF
     6. Toggle GDE format output         =  OFF

     7. Toggle GDE output case           =  LOWER
     8. Toggle CLUSTALW sequence numbers =  OFF
     9. Toggle output order              =  ALIGNED

     0. Create alignment output file(s) now?

     T. Toggle parameter output          = OFF
     R. Toggle sequence range numbers =  OFF

     H. HELP
  • User toggle #4 to activate PHYLIP output, hit return to revisit the alignment menu

  • Select option 1 to do the alignment. Accept the default conditions and hit return until you end up at the main menu. Exit the program with "X".

cat tiny.[ap][lh][ny]
NOTE: Square brackets are regular expression syntax. For example [ap] = either an "a" or a "p" at that position.

This is the clustal format output:

CLUSTAL 2.0.12 multiple sequence alignment


zf_12a1_a       VADLVFLVDGSWSVGRENFRFIRSFIGA--
zf_12a1_b       KADLVFLIDGSWSIGDDSFAKVRQFVFS--
hs_22a1         HYDLVFLLDTSSSVGKEDFEKVRQWVAN--
hs_a11          YMDIVIVLDGSNSIYP--WVEVQHFLINIL
zf_a11          YMDIVIVLDGSNSIYP--WNEVQDFLINIL
                  *:*:::* * *:    :  :: ::  

This is the phylip format output:

     5     30
zf_12a1_a  VADLVFLVDG SWSVGRENFR FIRSFIGA-- 
zf_12a1_b  KADLVFLIDG SWSIGDDSFA KVRQFVFS-- 
hs_22a1    HYDLVFLLDT SSSVGKEDFE KVRQWVAN-- 
hs_a11     YMDIVIVLDG SNSIYP--WV EVQHFLINIL 
zf_a11     YMDIVIVLDG SNSIYP--WN EVQDFLINIL

Last updated