Selecting from DataFrames

You can "search/select" data by generating "boolean" arrays based on some criteria. This works by effectively generating a column of True/False values that Pandas uses to select particular rows (those that are true). There are a few ways to generate this true/false selection column.

Value-based selections

You provide a selection criteria for a particular column. Example:

# generates the true/false array
my_dataframe['my_column']>=some_value

Is-in based selections

You provide a list of values you want to search for. Example:

subset_of_rows = my_dataframe['column_name'].isin([list_of_values])

Other

There are lots of ways to do this - you can learn more here

Boolean Indexing

ms['Precursor Charge']==3

This is boolean indexing - you can make very complicated selection criteria to just pull out the data you want

selection_criteria = ms['Precursor Charge']==3 #now we have saved the selection criteria
selection_criteria
ms[selection_criteria] #note that only the "True" rows are selected
ms[ms['Precursor Charge']==3]
# Try to select all of the rows with "light Precursor Mz" greater than 800, and do it in one line.
ms[ms['light Precursor Mz']>800]
ms[ms['Peptide Modified Sequence'].str.contains('Q')][['Protein Preferred Name', 'Peptide Modified Sequence']]
ms[ms['Peptide Modified Sequence'].str.contains('SV')]
# Edit the above to only get peptides with the motif 'SV' and only output interested columns
ms[ms['Peptide Modified Sequence'].str.contains('SV')][['Protein Preferred Name', 'Peptide Modified Sequence']]
# now let's try using "isin"
ms[ms['Protein Preferred Name'].isin(['RL27_ECOLI'])]

Last updated

Was this helpful?