# Clustalw2

Multiple sequence alignment and phylogenetic analysis allow the identification of conserved positions in protein and nucleic acid sequences. This can lead to an appreciation of the evolutionary history of a group of sequences.

* The clustal family of programs is commonly used to produce multiple sequence alignments. Other options are available as well:

1. [Muscle](http://www.ebi.ac.uk/Tools/muscle/index.html)
2. [T-Coffee](http://www.ebi.ac.uk/Tools/t-coffee/)

* There are 3 different ways to use the clustal programs.

1. Web-based clustalw can be used [HERE](http://www.ebi.ac.uk/Tools/clustalw2/index.html).
2. ClustalX GUI (and also the command-line clustalw executable) is available for download [HERE](ftp://ftp.ebi.ac.uk/pub/software/clustalw2/2.0.12/). [exampledata](http://luria.mit.edu/Bio2Bioinfo/clustal/)
3. Command-line clustalw2 is installed on rous

* The major difference between the 3 options is the interface and the calculation of bootstrap values for the tree, which is only available in the command-line and GUI versions. Other details of running the program are the same. For this lesson, we will use the web-based version to align these sequences.
* Login to rous and copy the clustal training files to your home directory.

```
cp -r /net/n3/data/Teaching/IAP_2010_day3/clustal .
```

```
NOTE: -r means "recursive", copy the folder and everything inside it to the new location "."
The lack of a trailing "/" on the clustal path ensures that the folder is moved, not just it's contents.
```

* enter the clustal directory:

```
cd clustal
```

* view the raw sequence files:

```
cat *.pep
```

* launch the clustalw application:

```
clustalw2
```

* The following interactive menu appears, options 1 and 2 will be used in this demonstration. Additional information is available [HERE](http://luria.mit.edu/bio2bioinfo/clustalw_help)

```
 **************************************************************
 ******** CLUSTAL 2.0.12 Multiple Sequence Alignments  ********
 **************************************************************


     1. Sequence Input From Disc
     2. Multiple Alignments
     3. Profile / Structure Alignments
     4. Phylogenetic trees

     S. Execute a system command
     H. HELP
     X. EXIT (leave program)


Your choice: 
```

* Select option 1 to load sequence and specify the file "tiny.pep". Before being returned to the original meny, you should see:

```
Sequences should all be in 1 file.

7 formats accepted: 
NBRF/PIR, EMBL/SwissProt, Pearson (Fasta), GDE, Clustal, GCG/MSF,                  RSF.


Enter the name of the sequence file : tiny.pep
Sequence format is Pearson
Sequences assumed to be PROTEIN


Sequence 1: zf_12a1_a     28 aa
Sequence 2: zf_12a1_b     28 aa
Sequence 3: hs_a11        28 aa
Sequence 4: hs_22a1       28 aa
Sequence 5: zf_a11        28 aa
```

* From the original menu, now select option 2. Multiple Alignments. This activates the Alignment menu:

```
****** MULTIPLE ALIGNMENT MENU ******


    1.  Do complete multiple alignment now Slow/Accurate
    2.  Produce guide tree file only
    3.  Do alignment using old guide tree file

    4.  Toggle Slow/Fast pairwise alignments = SLOW

    5.  Pairwise alignment parameters
    6.  Multiple alignment parameters

    7.  Reset gaps before alignment? = OFF
    8.  Toggle screen display          = ON
    9.  Output format options
    I. Iteration = NONE

    S.  Execute a system command
    H.  HELP
    or press [RETURN] to go back to main menu
```

* All default options are correct except option 9,select this to see the output format menu:

```
 ********* Format of Alignment Output *********


     F. Toggle FASTA format output       =  OFF

     1. Toggle CLUSTAL format output     =  ON
     2. Toggle NBRF/PIR format output    =  OFF
     3. Toggle GCG/MSF format output     =  OFF
     4. Toggle PHYLIP format output      =  OFF
     5. Toggle NEXUS format output       =  OFF
     6. Toggle GDE format output         =  OFF

     7. Toggle GDE output case           =  LOWER
     8. Toggle CLUSTALW sequence numbers =  OFF
     9. Toggle output order              =  ALIGNED

     0. Create alignment output file(s) now?

     T. Toggle parameter output          = OFF
     R. Toggle sequence range numbers =  OFF

     H. HELP
```

* User toggle #4 to activate PHYLIP output, hit return to revisit the alignment menu
* Select option 1 to do the alignment. Accept the default conditions and hit return until you end up at the main menu. Exit the program with "X".

```
cat tiny.[ap][lh][ny]
```

```
NOTE: Square brackets are regular expression syntax. For example [ap] = either an "a" or a "p" at that position.
```

This is the clustal format output:

```
CLUSTAL 2.0.12 multiple sequence alignment


zf_12a1_a       VADLVFLVDGSWSVGRENFRFIRSFIGA--
zf_12a1_b       KADLVFLIDGSWSIGDDSFAKVRQFVFS--
hs_22a1         HYDLVFLLDTSSSVGKEDFEKVRQWVAN--
hs_a11          YMDIVIVLDGSNSIYP--WVEVQHFLINIL
zf_a11          YMDIVIVLDGSNSIYP--WNEVQDFLINIL
                  *:*:::* * *:    :  :: ::  
```

\
This is the phylip format output:

```
     5     30
zf_12a1_a  VADLVFLVDG SWSVGRENFR FIRSFIGA-- 
zf_12a1_b  KADLVFLIDG SWSIGDDSFA KVRQFVFS-- 
hs_22a1    HYDLVFLLDT SSSVGKEDFE KVRQWVAN-- 
hs_a11     YMDIVIVLDG SNSIYP--WV EVQHFLINIL 
zf_a11     YMDIVIVLDG SNSIYP--WN EVQDFLINIL
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://igb.mit.edu/bioinformatics-topics/tasks-bioinformatics-methods/multiple-sequence-alignment/clustalw2.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
