Translating a eukaryotic FASTA file of CDS entries

From DNA sequence to predicted protein is an example in Biopython Cookbook "From gene sequence to predicted protein with the GFF module"arrow-up-right. Basing on DNA sequence and GlimmerHMMarrow-up-right output in GFF3arrow-up-right format, the project script outputs protein coding sequences in fasta format.

The project script and example input files are available from: https://ki-data.mit.edu/bcc/teaching/Biopythonarrow-up-right

Login and password will be distributed during class

** install GFF module: pip install bcbio-gff

** Download reference DNA sequence file: ref.fa

** Download GlimmerHMM GFF3 output file: glimmer.gff

** Download project script: glimmergff_to_proteins.py 

** run glimmergff_to_proteins.py with GFF3 file as the 1st argument and reference DNA file as the second argument

** The output protein sequence is a fasta file with the same suffix as the gff file and with the ending as proteins.fa

Last updated

Was this helpful?