About Pandas

Pandas is a great tool for holding/manipulating/plotting labeled datasets. It is similar to Numpy, but differs in two important ways:

  1. The rows and columns can be indexed by 'labels' (strings) OR numbers, unlike numpy arrays which always use number indices

  2. The contents can be different types - numpy requires each element is the same datatype

Why pandas?

It is a great way to:

  1. Read data from a file into a structured easy-to-access format

  2. Clean data - deal with missing values, etc.

  3. Select data of interest

  4. Analyze/plot

  5. Output results

Pandas provides two basic data structures - Series (1-D) and DataFrames (2-D). You can think of DataFrames as a grouping of Series, each with a column label. What's nice is that you can 'slice' these Dataframes either across rows or down columns using the row and column labels, much like you would with dictionaries.

You can learn more about pandas [here]

Last updated

Massachusetts Institute of Technology