# Translating a eukaryotic FASTA file of CDS entries

From DNA sequence to predicted protein is an example in [Biopython Cookbook "From gene sequence to predicted protein with the GFF module"](https://biopython.org/wiki/Gene_predictions_to_protein_sequences). Basing on DNA sequence and [GlimmerHMM](http://ccb.jhu.edu/software/glimmerhmm/) output in [GFF3](https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md) format, the project script outputs protein coding sequences in fasta format.

The project script and example input files are available from: <https://ki-data.mit.edu/bcc/teaching/Biopython>

Login and password will be distributed during class

```
** install GFF module: pip install bcbio-gff

** Download reference DNA sequence file: ref.fa

** Download GlimmerHMM GFF3 output file: glimmer.gff

** Download project script: glimmergff_to_proteins.py 

** run glimmergff_to_proteins.py with GFF3 file as the 1st argument and reference DNA file as the second argument

** The output protein sequence is a fasta file with the same suffix as the gff file and with the ending as proteins.fa
```
