Selecting from DataFrames
You can "search/select" data by generating "boolean" arrays based on some criteria. This works by effectively generating a column of True/False values that Pandas uses to select particular rows (those that are true). There are a few ways to generate this true/false selection column.
Value-based selections
You provide a selection criteria for a particular column. Example:
# generates the true/false array
my_dataframe['my_column']>=some_value
Is-in based selections
You provide a list of values you want to search for. Example:
subset_of_rows = my_dataframe['column_name'].isin([list_of_values])
Other
There are lots of ways to do this - you can learn more here
Boolean Indexing
ms['Precursor Charge']==3
This is boolean indexing - you can make very complicated selection criteria to just pull out the data you want
selection_criteria = ms['Precursor Charge']==3 #now we have saved the selection criteria
selection_criteria

ms[selection_criteria] #note that only the "True" rows are selected

ms[ms['Precursor Charge']==3]

# Try to select all of the rows with "light Precursor Mz" greater than 800, and do it in one line.
ms[ms['light Precursor Mz']>800]

ms[ms['Peptide Modified Sequence'].str.contains('Q')][['Protein Preferred Name', 'Peptide Modified Sequence']]

ms[ms['Peptide Modified Sequence'].str.contains('SV')]

# Edit the above to only get peptides with the motif 'SV' and only output interested columns
ms[ms['Peptide Modified Sequence'].str.contains('SV')][['Protein Preferred Name', 'Peptide Modified Sequence']]

# now let's try using "isin"
ms[ms['Protein Preferred Name'].isin(['RL27_ECOLI'])]

Last updated
Was this helpful?