Coronavirus Exploration

We are still impacted by Covid19. Let's see how Biopython can help us explore and understand these coronaviruses.

  • 1. we search PubMed for papers in order to access the timely coronavirus findings. In this way, we will have a good idea what people already know and still do not know about coronaviruses.

search PubMed for coronavirus information:
In [1]: import Bio
In [2]: Bio.__version__
In [3]: from Bio import Entrez
In [4]: handle=Entrez.esearch(db="pubmed",retmax=10,term="coronavirus")
In [5]: record=Entrez.read(handle)
In [6]: record["IdList"]
Out[6]: ['32526774', '32526763', '32526759', '32526746', '32526627', 
'32526560', '32526559', '32526545', '32526541', '32526530']
  • 2. we look into specific terms. We will use '32526774' as an example

In [1]: import Bio
In [2]: Bio.__version__
In [3]: from Bio import Entrez
In [4]: handle=Entrez.efetch(db='pubmed',id='32526774')
In [5]: print(handle.read())
  • 3. we mine nucleotide database to start bioinformatics study

In [1]: import Bio
In [2]: Bio.__version__
In [3]: from Bio import Entrez,SeqIO
In [4]: handle=Entrez.esearch(db="nucleotide",retmax=10,term="coronavirus")
In [5]: record=Entrez.read(handle)
In [6]: record["IdList"]
  • 4. we retrieve individual coronavirus genome sequences. Here, we use '1850952228' as an example

  • 5. Identifying open reading frames from genomic sequence. Here, we use '1850952228' as an example

The code is from "Cookbook: Translating a FASTA file of CDS entries"arrow-up-right

  • 6. build up a list of the candidate proteins and keep track of where the proteins are. Here, we use '1850952228' as an example

The code is from "Cookbook 20.1.13 Identifying open reading frames"arrow-up-right

Now we have coronavirus protein sequences. There is so much more that we can do. I will leave it to you to do great jobs using Biopython.

Last updated

Was this helpful?