Slicing DataFrames

You can access subsets of your dataframe (views) in a few different ways, but we will focus on two here.

Name-based indexing

You provide a row_index and a column_index - they can be slices or lists or whatever to the .loc[row_names, col_names] indexer
example: [your_dataframe_name].loc[my_row_names, my col_names].

Index-based indexing

You provide the row and column numbers to the .iloc[row_numbers, col_numbers] 
example: [your_dataframe_name].iloc[my_row_numbers, my col_numbers]
ms.loc[:,'Protein Name'] #get all row (:), 'Protein Name' column
#How would you get the first 10 rows using .loc (note that here the row "names" are just numbers
ms.loc[:9, 'Protein Name']
ms.loc[0:10,['Protein Name', 'Protein Gene']] #what will this return?
# Note that you can pass any list of column names to the column indexer
ms.loc[:8,[col for col in ms.columns if "Protein" in col]] #what is this doing?

Side topic: get familiar with [List Comprehension]

my_list =[ ]
for col in ms.columns:
    if "Protein" in col:
        my_list.append(col)
my_list

['Protein Name', 'Protein Preferred Name', 'Protein Gene']

my_list = [col for col in ms.columns if "Protein" in col]
my_list

['Protein Name', 'Protein Preferred Name', 'Protein Gene']

ms.loc[:5,my_list] #what is this doing?
list(ms.columns) #this provides the full list of the columns in the dataframe
# write a line to access all columns related to sample BT2_HFX_6
ms.loc[:,[col for col in ms.columns if "BT2_HFX_6" in col]]
# Now let's try indexing with .iloc
ms.iloc[:5,3:9] #note the difference in how iloc and loc work!>
ms.iloc[:20,'Precursor Charge'] #Will this work?
ms.iloc[:20,4]

Last updated

Was this helpful?