Editing DataFrames

You can trivially add new columns or change the values in existing column.

Add a new column

  1. Be sure that the column you are adding has the same indices as the old dataframe.

  2. This is most easily accomplished by manipulating an old column and saving the value

  3. Example: dataframe['new_column_name'] = ms['old_column']*value

  4. Example: dataframe['new_column_name'] = ms['old_column1']*ms['old_column2']

Alter a column

  1. You select a set of cells using the tools from above change their values

  2. Note that dataframes are MUTABLE!

  3. dataframe.loc[selection,selection] = 7

Dealing with missing data

  1. The dataframe.dropna() and .fillna() funtions are super helpful in removing/replacing missing values

  2. Example: only_complete_rows = dataframe.dropna(how='any')

  3. Example: replace_with_0 = dataframe.fillna(value=0.0)

# what is this line doing?
ms['light +1 charge mass']=ms['light Precursor Mz']*ms['Precursor Charge'] - ((ms['Precursor Charge']-1)*1.0078)
ms[['Peptide Modified Sequence', 'light Precursor Mz', 'Precursor Charge', 'light +1 charge mass']]
#think through what this line is doing
ms.loc[ms['light +1 charge mass']>2000,['Peptide Modified Sequence', 'light Precursor Mz', 'Precursor Charge', 'light +1 charge mass']]
ms.loc[ms['light +1 charge mass']>2000,'light +1 charge mass'] = 'way too big!'
#you can quickly save your work at a .csv using the command .to_csv(path_to_file)
ms.to_csv("C:\\Users\duan\Desktop\PythonDataProcessingVisualization\mass_spec_new.csv")
#look in your directory for a new .csv file!

Last updated

Massachusetts Institute of Technology