Before using Biopython to access the NCBI's online resources(via Bio.Entrez or some of the other modules), please read the NCBI's Entrez User Requirements. If NCBI finds you are abusing their systems, they can and will ban your access!
To paraphrase: For any series of more than 100 requests, do this at weekends or outside USA peak times. This is up to you to obey. Use the http://eutils.ncbi.nlm.nih.gov address, not the standard NCBI Web address. Biopython uses this web address. You can make no more than 10 queries per second if using a API key, otherwise at most 3 queries per second (relaxed form at most one request every three seconds in early 2009). This is automatically enforced by Biopython. Use the optional email parameter so the NCBI can contact you if there is a problem. You can either explicitly set this as a paraemter with each call to Entrez(e.g. include Entrez.email = "A.N.Other@example.com" in the argument list), or you can set a global email address as follow:
from Bio import Entrez
Entrez.email = "A.N.Other@example.com"
What database do I have access to?
In [1]: import Bio
In [2]: from Bio import Entrez
In [3]: Entrez.email="duan@mit.edu"
In [4]: handle=Entrez.einfo()
In [5]: record=Entrez.read(handle)
In [6]: record["DbList"]
Out[6]: ['pubmed', 'protein', 'nuccore', 'nucleotide', 'nucgss', 'nucest', 'structure',
'genome', 'gpipe', 'annotinfo', 'assembly', 'bioproject', 'biosample', 'blastdbinfo',
'books', 'cdd', 'clinvar', 'clone', 'gap', 'gapplus', 'grasp', 'dbvar', 'epigenomics',
'gene', 'gds', 'geoprofiles', 'homologene', 'medgen', 'mesh', 'ncbisearch', 'nlmcatalog',
'omim', 'orgtrack', 'pmc', 'popset', 'probe', 'proteinclusters', 'pcassay',
'biosystems', 'pccompound', 'pcsubstance', 'pubmedhealth', 'seqannot', 'snp', 'sra',
'taxonomy', 'unigene', 'gencoll', 'gtr']
What if I want info about a database?
In [1]: import Bio
In [2]: from Bio import Entrez
In [3]: handle=Entrez.einfo(db="pubmed")
In [4]: record=Entrez.read(handle)
In [5]: record["DbInfo"]["Description"]
Out[5]: 'PubMed bibliographic record'
In [6]: record["DbInfo"]["Count"]
Out[6]: '36234233'
How do I search for a given term?
Example 1:
In [1]: import Bio
In [2]: from Bio import Entrez
In [3]: handle=Entrez.esearch(db="pubmed",term="biopython")
In [4]: record=Entrez.read(handle)
In [5]: record["IdList"]
Out[5]: ['29641230', '28011774', '24929426', '24497503', '24267035', '24194598', '23842806', '23157543',
'22909249', '22399473', '21666252', '21210977', '20015970', '19811691', '19773334', '19304878',
'18606172', '21585724', '16403221', '16377612']
Example 2:
In [1]: import Bio
In [2]: from Bio import Entrez
In [3]: handle = Entrez.esearch(db="nucleotide", retmax=10, term="human[ORGN] tp53", idtype="acc")
In [4]: record=Entrez.read(handle)
In [5]: record["Count"]
Out[5]: '4253'
How do I retrieve a specific term?
Example 1: retrieve a previously identified biopython article (id=24929426) from pubmed
In [1]: import Bio
In [2]: from Bio import Entrez
In [3]: handle=Entrez.efetch(db='pubmed',id='29641230')
In [4]: print(handle.read())
Example 2: retrieve gene information from genbank
In [1]: import Bio
In [2]: from Bio import Entrez,SeqIO
In [3]: handle=Entrez.efetch(db='nucleotide',id='AF307851',rettype='gb',retmode='text')
In [4]: record=SeqIO.read(handle,'genbank')
In [5]: handle.close()
In [6]: print(record)
ID: AF307851.1
Name: AF307851
Description: Homo sapiens p53 protein mRNA, complete cds
Number of features: 2
/taxonomy=['Eukaryota', 'Metazoa', 'Chordata', 'Craniata', 'Vertebrata', 'Euteleostomi', 'Mammalia', 'Eutheria', 'Euarchontoglires', 'Primates', 'Haplorrhini', 'Catarrhini', 'Hominidae', 'Homo']
/keywords=['']
/data_file_division=PRI
/organism=Homo sapiens
/sequence_version=1
/molecule_type=mRNA
/source=Homo sapiens (human)
/topology=linear
/date=29-JAN-2001
/references=[Reference(title='Hyaluronidase induction of a WW domain-containing oxidoreductase that enhances tumor necrosis factor cytotoxicity', ...), Reference(title='Direct Submission', ...)]
/accessions=['AF307851']
Seq('GGCACGAGCCACCGTCCAGGGAGCAGGTAGCTGCTGGGCTCCGGGGACACTTTG...AAA', IUPACAmbiguousDNA())