Selecting from DataFrames

You can "search/select" data by generating "boolean" arrays based on some criteria. This works by effectively generating a column of True/False values that Pandas uses to select particular rows (those that are true). There are a few ways to generate this true/false selection column.

Value-based selections

You provide a selection criteria for a particular column. Example:

# generates the true/false array
my_dataframe['my_column']>=some_value

Is-in based selections

You provide a list of values you want to search for. Example:

subset_of_rows = my_dataframe['column_name'].isin([list_of_values])

Other

There are lots of ways to do this - you can learn more here

Boolean Indexing

ms['Precursor Charge']==3

This is boolean indexing - you can make very complicated selection criteria to just pull out the data you want

selection_criteria = ms['Precursor Charge']==3 #now we have saved the selection criteria

selection_criteria

ms[selection_criteria] #note that only the "True" rows are selected

ms[ms['Precursor Charge']==3]

# Try to select all of the rows with "light Precursor Mz" greater than 800, and do it in one line.
ms[ms['light Precursor Mz']>800]

ms[ms['Peptide Modified Sequence'].str.contains('Q')][['Protein Preferred Name', 'Peptide Modified Sequence']]

ms[ms['Peptide Modified Sequence'].str.contains('SV')]

# Edit the above to only get peptides with the motif 'SV' and only output interested columns
ms[ms['Peptide Modified Sequence'].str.contains('SV')][['Protein Preferred Name', 'Peptide Modified Sequence']]

# now let's try using "isin"
ms[ms['Protein Preferred Name'].isin(['RL27_ECOLI'])]

PreviousSlicing DataFrames NextEditing DataFrames

Last updated 1 year ago

Was this helpful?