LogoLogo
LogoLogo
  • The Barbara K. Ostrom (1978) Bioinformatics and Computing Facility
  • Computing Resources
    • Active Data Storage
    • Archive Data Storage
    • Luria Cluster
      • FAQs
    • Other Resources
  • Bioinformatics Topics
    • Tools - A Basic Bioinformatics Toolkit
      • Getting more out of Microsoft Excel
      • Bioinformatics Applications of Unix
        • Unix commands applied to bioinformatics
        • Manipulate NGS files using UNIX commands
        • Manipulate alignment files using UNIX commands
      • Alignments and Mappers
      • Relational databases
        • Running Joins on Galaxy
      • Spotfire
    • Tasks - Bioinformatics Methods
      • UCSC Genome Bioinformatics
        • Interacting with the UCSC Genome Browser
        • Obtaining DNA sequence from the UCSC Database
        • Obtaining genomic data from the UCSC database using table browser queries
        • Filtering table browser queries
        • Performing a BLAT search
        • Creating Custom Tracks
        • UCSC Intersection Queries
        • Viewing cross-species alignments
        • Galaxy
          • Intro to Galaxy
          • Galaxy NGS Illumina QC
          • Galaxy NGS Illumina SE Mapping
          • Galaxy SNP Interval Data
        • Editing and annotation gene structures with Argo
      • GeneGO MetaCore
        • GeneGo Introduction
        • Loading Data Into GeneGO
        • Data Management in GeneGO
        • Setting Thresholds and Background Sets
        • Search And Browse Content Tab
        • Workflows and Reports Tab
        • One-click Analysis Tab
        • Building Network for Your Experimental Data
      • Functional Annotation of Gene Lists
      • Multiple Sequence Alignment
        • Clustalw2
      • Phylogenetic analysis
        • Neighbor Joining method in Phylip
      • Microarray data processing with R/Bioconductor
    • Running Jupyter notebooks on luria cluster nodes
  • Data Management
    • Globus
  • Mini Courses
    • Schedule
      • Previous Teaching
    • Introduction to Unix and KI Computational Resources
      • Basic Unix
        • Why Unix?
        • The Unix Tree
        • The Unix Terminal and Shell
        • Anatomy of a Unix Command
        • Basic Unix Commands
        • Output Redirection and Piping
        • Manual Pages
        • Access Rights
        • Unix Text Editors
          • nano
          • vi / vim
          • emacs
        • Shell Scripts
      • Software Installation
        • Module
        • Conda Environment
      • Slurm
    • Introduction to Unix
      • Why Unix?
      • The Unix Filesystem
        • The Unix Tree
        • Network Filesystems
      • The Unix Shell
        • About the Unix Shell
        • Unix Shell Manual Pages
        • Using the Unix Shell
          • Viewing the Unix Tree
          • Traversing the Unix Tree
          • Editing the Unix Tree
          • Searching the Unix Tree
      • Files
        • Viewing File Contents
        • Creating and Editing Files
        • Manipulating Files
        • Symbolic Links
        • File Ownership
          • How Unix File Ownership Works
          • Change File Ownership and Permissions
        • File Transfer (in-progress)
        • File Storage and Compression
      • Getting System Information
      • Writing Scripts
      • Schedule Scripts Using Crontab
    • Advanced Utilization of IGB Computational Resources
      • High Performance Computing Clusters
      • Slurm
        • Checking the Status of Computing Nodes
        • Submitting Jobs / Slurm Scripts
        • Interactive Sessions
      • Package Management
        • The System Package Manager
        • Environment Modules
        • Conda Environments
      • SSH Port Forwarding
        • SSH Port Forwarding Jupyter Notebooks
      • Containerization
        • Docker
          • Docker Installation
          • Running Docker Images
          • Building Docker Images
        • Singularity
          • Differences from Docker
          • Running Images in Singularity
      • Running Nextflow / nf-core Pipelines
    • Python
      • Introduction to Python for Biologists
        • Interactive Python
        • Types
          • Strings
          • Lists
          • Tuples
          • Dictionaries
        • Control Flow
        • Loops
          • For Loops
          • While Loops
        • Control Flows and Loops
        • Storing Programs for Re-use
        • Reading and Writing Files
        • Functions
      • Biopython
        • About Biopython
        • Quick Start
          • Basic Sequence Analyses
          • SeqRecord
          • Sequence IO
          • Exploration of Entrez Databases
        • Example Projects
          • Coronavirus Exploration
          • Translating a eukaryotic FASTA file of CDS entries
        • Further Resources
      • Machine Learning with Python
        • About Machine Learning
        • Hands-On
          • Project Introduction
          • Supervised Approaches
            • The Logistic Regression Model
            • K-Nearest Neighbors
          • Unsupervised Approaches
            • K-Means Clustering
          • Further Resources
      • Data Processing with Python
        • Pandas
          • About Pandas
          • Making DataFrames
          • Inspecting DataFrames
          • Slicing DataFrames
          • Selecting from DataFrames
          • Editing DataFrames
        • Matplotlib
          • About Matplotlib
          • Basic Plotting
          • Advanced Plotting
        • Seaborn
          • About Seaborn
          • Basic Plotting
          • Visualizing Statistics
          • Visualizing Proteomics Data
          • Visualizing RNAseq Data
    • R
      • Intro to R
        • Before We Start
        • Getting to Know R
        • Variables in R
        • Functions in R
        • Data Manipulation
        • Simple Statistics in R
        • Basic Plotting in R
        • Advanced Plotting in R
        • Writing Figures to a File
        • Further Resources
    • Version Control with Git
      • About Version Control
      • Setting up Git
      • Creating a Repository
      • Tracking Changes
        • Exercises
      • Exploring History
        • Exercises
      • Ignoring Things
      • Remotes in Github
      • Collaborating
      • Conflicts
      • Open Science
      • Licensing
      • Citation
      • Hosting
      • Supplemental
Powered by GitBook

MIT Resources

  • https://accessibility.mit.edu

Massachusetts Institute of Technology

On this page

Was this helpful?

Export as PDF
  1. Mini Courses
  2. Python
  3. Introduction to Python for Biologists

Reading and Writing Files

PreviousStoring Programs for Re-useNextFunctions

Last updated 1 year ago

Was this helpful?

  • We open files using the built-in open function. We need to tell the function if the file is to be used for reading, writing, or appending with the r, w, and a flags.

  • All the test files for the course are located at .

  • If you are on our cluster, you can copy them all to the current directory by typing:

cp /net/bmc-pub15/data/bmc/public/BCC/external/teaching/IntroToPython/* ./

Examples

  • Reading file seq.txt

In [1]: fin=open('seq.txt')

In [2]: fin=open('/net/rowley/ifs/data/bcc/dropbox/teaching_python/seq.txt')

In [3]: fin=open('seq.txt','r')
  • Writing to file seq2.txt

In [1]: Aa="GLECDGRTNLCCRQQFF"
In [2]: fo=open('seq2.txt','w')
In [3]: fo.write(Aa)
In [4]: fo.close()
Out[4]: <function close>
In [5]: less seq2.txt
GLECDGRTNLCCRQQFF

*no need to remember to close the file handler if using "with" statement 
In [1]: with open('seq2.txt','w') as fo:
   ...:     fo.write("ABC")
   ...:     
In [2]: less seq2.txt
ABC

Note: writing to a file will delete the existing content of the file
  • Appending to file seq2.txt

In [1]: with open('seq2.txt','a') as fo:
   ...:     fo.write("CDE")
   ...:     
In [2]: less seq2.txt
ABCCDE
  • Reading files with read() and readlines()

read and readlines methods both store the contents of the read in file for further processing
The difference is that read returns the content as a single string, 
while readlines returns it as a list of lines

In [1]: seq=open('seq.txt','r').read()

In [2]: seq
Out[2]: 'ACTGATG\nACTGGTCA\nATGATG\nTCGAAGCT\nGCAGGCG\nGATCCTAG\nCATGTCGT\nCTCTATCTC\n'

In [3]: type(seq)
Out[3]: str


In [1]: seq=open ('seq.txt','r').readlines() 

In [2]: seq
Out[2]: 
['ACTGATG\n',
 'ACTGGTCA\n',
 'ATGATG\n',
 'TCGAAGCT\n',
 'GCAGGCG\n',
 'GATCCTAG\n',
 'CATGTCGT\n',
 'CTCTATCTC\n']

In [3]: type(seq)
Out[3]: list
  • We can read in a file using our Python script, process it, and output the results to an output file

    • Let's read in file seq.txt

    • find the palindrome sequences using our python script

    • Then output the palindrome sequences to file palindrome.txt

example1: select palindrome sequences

write palindrome2.py using a text editor:

manyseqs=open ('seq.txt','r').readlines()
for seq in manyseqs:
     s=seq.strip()
     if (s==s[::-1]):
          with open ('palindrome.txt','a') as fo:
               fo.write(s)
               fo.write("\n")



In Unix:
python palindrome2.py 
less palindrome.txt 
ACTGGTCA
TCGAAGCT
GATCCTAG
CTCTATCTC
  • Let's do an exercise by writing a Python script to say hello to the class

    • First read in file class_list as a list

    • Then output our greetings to file greetings

example2: Say Hi to our class

write Hello_class.py using a text editor:

classlist=open('class_list.txt','r').readlines()
for student in classlist:
        with open('greetings','a') as fo:
                fo.write("Hello,")
                fo.write(student)


In Unix:
python hello_class.py
less greetings
Hello,Manijeh
Hello,Shawn
Hello,Giorgio
Hello,Shuyu
Hello,Britt
Hello,Tu
Hello,Benjamin
Hello,Priyanka
Hello,Sabrina
Hello,Eric
  • To avoid changing scripts, we can use arguments to read input files and to write output files

    • ./hello_class2.py class_list greetings_again

hello_class2.py


#!/usr/bin/env python
import sys

InFileName=sys.argv[1]
OutFileName=sys.argv[2]

#open input file
classlist=open(InFileName,'r').readlines()

for students in classlist:
        student=students.strip()
        with open(OutFileName,'a') as fo:
                fo.write("Hello,")
                fo.write(student)
                fo.write(". It is nice to have you here!\n")
  • Input another class list to hello_class2.py will output greetings to another class

    • ./hello_class2.py future_class_list greetings_to_future_class

https://ki-data.mit.edu/bcc/teaching/IntroToPython.tgz