Selecting from DataFrames
You can "search/select" data by generating "boolean" arrays based on some criteria. This works by effectively generating a column of True/False values that Pandas uses to select particular rows (those that are true). There are a few ways to generate this true/false selection column.
Value-based selections
You provide a selection criteria for a particular column. Example:
# generates the true/false array
my_dataframe['my_column']>=some_valueIs-in based selections
You provide a list of values you want to search for. Example:
subset_of_rows = my_dataframe['column_name'].isin([list_of_values])Other
There are lots of ways to do this - you can learn more here
Boolean Indexing
ms['Precursor Charge']==3This is boolean indexing - you can make very complicated selection criteria to just pull out the data you want
selection_criteria = ms['Precursor Charge']==3 #now we have saved the selection criteriaselection_criteria
ms[selection_criteria] #note that only the "True" rows are selected
ms[ms['Precursor Charge']==3]
# Try to select all of the rows with "light Precursor Mz" greater than 800, and do it in one line.
ms[ms['light Precursor Mz']>800]
ms[ms['Peptide Modified Sequence'].str.contains('Q')][['Protein Preferred Name', 'Peptide Modified Sequence']]
ms[ms['Peptide Modified Sequence'].str.contains('SV')]
# Edit the above to only get peptides with the motif 'SV' and only output interested columns
ms[ms['Peptide Modified Sequence'].str.contains('SV')][['Protein Preferred Name', 'Peptide Modified Sequence']]
# now let's try using "isin"
ms[ms['Protein Preferred Name'].isin(['RL27_ECOLI'])]
Last updated
Was this helpful?
