Translating a eukaryotic FASTA file of CDS entries

From DNA sequence to predicted protein is an example in Biopython Cookbook "From gene sequence to predicted protein with the GFF module". Basing on DNA sequence and GlimmerHMM output in GFF3 format, the project script outputs protein coding sequences in fasta format.

The project script and example input files are available from: https://ki-data.mit.edu/bcc/teaching/Biopython

Login and password will be distributed during class

** install GFF module: pip install bcbio-gff

** Download reference DNA sequence file: ref.fa

** Download GlimmerHMM GFF3 output file: glimmer.gff

** Download project script: glimmergff_to_proteins.py 

** run glimmergff_to_proteins.py with GFF3 file as the 1st argument and reference DNA file as the second argument

** The output protein sequence is a fasta file with the same suffix as the gff file and with the ending as proteins.fa

Last updated

Massachusetts Institute of Technology