Clustalw2
Multiple sequence alignment and phylogenetic analysis allow the identification of conserved positions in protein and nucleic acid sequences. This can lead to an appreciation of the evolutionary history of a group of sequences.
The clustal family of programs is commonly used to produce multiple sequence alignments. Other options are available as well:
There are 3 different ways to use the clustal programs.
Web-based clustalw can be used HERE.
ClustalX GUI (and also the command-line clustalw executable) is available for download HERE. exampledata
Command-line clustalw2 is installed on rous
The major difference between the 3 options is the interface and the calculation of bootstrap values for the tree, which is only available in the command-line and GUI versions. Other details of running the program are the same. For this lesson, we will use the web-based version to align these sequences.
Login to rous and copy the clustal training files to your home directory.
cp -r /net/n3/data/Teaching/IAP_2010_day3/clustal .
NOTE: -r means "recursive", copy the folder and everything inside it to the new location "."
The lack of a trailing "/" on the clustal path ensures that the folder is moved, not just it's contents.
enter the clustal directory:
cd clustal
view the raw sequence files:
cat *.pep
launch the clustalw application:
clustalw2
The following interactive menu appears, options 1 and 2 will be used in this demonstration. Additional information is available HERE
**************************************************************
******** CLUSTAL 2.0.12 Multiple Sequence Alignments ********
**************************************************************
1. Sequence Input From Disc
2. Multiple Alignments
3. Profile / Structure Alignments
4. Phylogenetic trees
S. Execute a system command
H. HELP
X. EXIT (leave program)
Your choice:
Select option 1 to load sequence and specify the file "tiny.pep". Before being returned to the original meny, you should see:
Sequences should all be in 1 file.
7 formats accepted:
NBRF/PIR, EMBL/SwissProt, Pearson (Fasta), GDE, Clustal, GCG/MSF, RSF.
Enter the name of the sequence file : tiny.pep
Sequence format is Pearson
Sequences assumed to be PROTEIN
Sequence 1: zf_12a1_a 28 aa
Sequence 2: zf_12a1_b 28 aa
Sequence 3: hs_a11 28 aa
Sequence 4: hs_22a1 28 aa
Sequence 5: zf_a11 28 aa
From the original menu, now select option 2. Multiple Alignments. This activates the Alignment menu:
****** MULTIPLE ALIGNMENT MENU ******
1. Do complete multiple alignment now Slow/Accurate
2. Produce guide tree file only
3. Do alignment using old guide tree file
4. Toggle Slow/Fast pairwise alignments = SLOW
5. Pairwise alignment parameters
6. Multiple alignment parameters
7. Reset gaps before alignment? = OFF
8. Toggle screen display = ON
9. Output format options
I. Iteration = NONE
S. Execute a system command
H. HELP
or press [RETURN] to go back to main menu
All default options are correct except option 9,select this to see the output format menu:
********* Format of Alignment Output *********
F. Toggle FASTA format output = OFF
1. Toggle CLUSTAL format output = ON
2. Toggle NBRF/PIR format output = OFF
3. Toggle GCG/MSF format output = OFF
4. Toggle PHYLIP format output = OFF
5. Toggle NEXUS format output = OFF
6. Toggle GDE format output = OFF
7. Toggle GDE output case = LOWER
8. Toggle CLUSTALW sequence numbers = OFF
9. Toggle output order = ALIGNED
0. Create alignment output file(s) now?
T. Toggle parameter output = OFF
R. Toggle sequence range numbers = OFF
H. HELP
User toggle #4 to activate PHYLIP output, hit return to revisit the alignment menu
Select option 1 to do the alignment. Accept the default conditions and hit return until you end up at the main menu. Exit the program with "X".
cat tiny.[ap][lh][ny]
NOTE: Square brackets are regular expression syntax. For example [ap] = either an "a" or a "p" at that position.
This is the clustal format output:
CLUSTAL 2.0.12 multiple sequence alignment
zf_12a1_a VADLVFLVDGSWSVGRENFRFIRSFIGA--
zf_12a1_b KADLVFLIDGSWSIGDDSFAKVRQFVFS--
hs_22a1 HYDLVFLLDTSSSVGKEDFEKVRQWVAN--
hs_a11 YMDIVIVLDGSNSIYP--WVEVQHFLINIL
zf_a11 YMDIVIVLDGSNSIYP--WNEVQDFLINIL
*:*:::* * *: : :: ::
This is the phylip format output:
5 30
zf_12a1_a VADLVFLVDG SWSVGRENFR FIRSFIGA--
zf_12a1_b KADLVFLIDG SWSIGDDSFA KVRQFVFS--
hs_22a1 HYDLVFLLDT SSSVGKEDFE KVRQWVAN--
hs_a11 YMDIVIVLDG SNSIYP--WV EVQHFLINIL
zf_a11 YMDIVIVLDG SNSIYP--WN EVQDFLINIL
Last updated
Was this helpful?