Coronavirus Exploration
We are still impacted by Covid19. Let's see how Biopython can help us explore and understand these coronaviruses.
1. we search PubMed for papers in order to access the timely coronavirus findings. In this way, we will have a good idea what people already know and still do not know about coronaviruses.
search PubMed for coronavirus information:
In [1]: import Bio
In [2]: Bio.__version__
In [3]: from Bio import Entrez
In [4]: handle=Entrez.esearch(db="pubmed",retmax=10,term="coronavirus")
In [5]: record=Entrez.read(handle)
In [6]: record["IdList"]
Out[6]: ['32526774', '32526763', '32526759', '32526746', '32526627',
'32526560', '32526559', '32526545', '32526541', '32526530']2. we look into specific terms. We will use '32526774' as an example
In [1]: import Bio
In [2]: Bio.__version__
In [3]: from Bio import Entrez
In [4]: handle=Entrez.efetch(db='pubmed',id='32526774')
In [5]: print(handle.read())3. we mine nucleotide database to start bioinformatics study
In [1]: import Bio
In [2]: Bio.__version__
In [3]: from Bio import Entrez,SeqIO
In [4]: handle=Entrez.esearch(db="nucleotide",retmax=10,term="coronavirus")
In [5]: record=Entrez.read(handle)
In [6]: record["IdList"]4. we retrieve individual coronavirus genome sequences. Here, we use '1850952228' as an example
5. Identifying open reading frames from genomic sequence. Here, we use '1850952228' as an example
The code is from "Cookbook: Translating a FASTA file of CDS entries"
6. build up a list of the candidate proteins and keep track of where the proteins are. Here, we use '1850952228' as an example
The code is from "Cookbook 20.1.13 Identifying open reading frames"
Now we have coronavirus protein sequences. There is so much more that we can do. I will leave it to you to do great jobs using Biopython.
Last updated
Was this helpful?
