# Selecting from DataFrames

You can "search/select" data by generating "boolean" arrays based on some criteria. This works by effectively generating a column of True/False values that Pandas uses to select particular rows (those that are true). There are a few ways to generate this true/false selection column.

### **Value-based selections**

You provide a selection criteria for a particular column. Example:

<pre><code># generates the true/false array
<strong>my_dataframe['my_column']>=some_value
</strong></code></pre>

### **Is-in based selections**

You provide a list of values you want to search for. Example:

```
subset_of_rows = my_dataframe['column_name'].isin([list_of_values])
```

### **Other**

There are lots of ways to do this - you can learn more [here](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing-boolean)

### Boolean Indexing

```
ms['Precursor Charge']==3
```

This is boolean indexing - you can make very complicated selection criteria to just pull out the data you want

```
selection_criteria = ms['Precursor Charge']==3 #now we have saved the selection criteria
```

```
selection_criteria
```

<figure><img src="https://498238201-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FWuHhstIreJ3jFvE4gQ3y%2Fuploads%2F501B0dkEflYl5gGFbh41%2Fimage.png?alt=media&#x26;token=ca41be49-6e34-4abd-94a7-0e0678dae20d" alt=""><figcaption></figcaption></figure>

```
ms[selection_criteria] #note that only the "True" rows are selected
```

<figure><img src="https://498238201-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FWuHhstIreJ3jFvE4gQ3y%2Fuploads%2FTfkPiNnCwQIuUrG3Ijcu%2Fimage.png?alt=media&#x26;token=32e061cd-7a57-4794-b816-e3830b6c1476" alt=""><figcaption></figcaption></figure>

```
ms[ms['Precursor Charge']==3]
```

<figure><img src="https://498238201-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FWuHhstIreJ3jFvE4gQ3y%2Fuploads%2FHh9p52iWAGGh9uImE2Ex%2Fimage.png?alt=media&#x26;token=9a626770-bce7-4187-b9f7-d722c71193d8" alt=""><figcaption></figcaption></figure>

```
# Try to select all of the rows with "light Precursor Mz" greater than 800, and do it in one line.
ms[ms['light Precursor Mz']>800]
```

<figure><img src="https://498238201-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FWuHhstIreJ3jFvE4gQ3y%2Fuploads%2FgWlG1IGLrujpiIlPFDgd%2Fimage.png?alt=media&#x26;token=c0084345-3ee1-4754-9b25-218a00adfd82" alt=""><figcaption></figcaption></figure>

```
ms[ms['Peptide Modified Sequence'].str.contains('Q')][['Protein Preferred Name', 'Peptide Modified Sequence']]
```

<figure><img src="https://498238201-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FWuHhstIreJ3jFvE4gQ3y%2Fuploads%2F0xefFUPJOU0Hep0glKuZ%2Fimage.png?alt=media&#x26;token=8c180d94-14a8-4da2-aec4-2f6f8419ccb2" alt=""><figcaption></figcaption></figure>

```
ms[ms['Peptide Modified Sequence'].str.contains('SV')]
```

<figure><img src="https://498238201-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FWuHhstIreJ3jFvE4gQ3y%2Fuploads%2FLMoOGoXySdf6jtvWYKvv%2Fimage.png?alt=media&#x26;token=1a38fd83-f51c-4444-b6db-5947741f45e6" alt=""><figcaption></figcaption></figure>

```
# Edit the above to only get peptides with the motif 'SV' and only output interested columns
ms[ms['Peptide Modified Sequence'].str.contains('SV')][['Protein Preferred Name', 'Peptide Modified Sequence']]
```

<figure><img src="https://498238201-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FWuHhstIreJ3jFvE4gQ3y%2Fuploads%2FKVius33KHRZxporPfObI%2Fimage.png?alt=media&#x26;token=61fbad1b-5a35-4bf4-b40c-db3b4e5cd4c9" alt=""><figcaption></figcaption></figure>

```
# now let's try using "isin"
ms[ms['Protein Preferred Name'].isin(['RL27_ECOLI'])]
```

<figure><img src="https://498238201-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FWuHhstIreJ3jFvE4gQ3y%2Fuploads%2FRefus7csy0NU4tC6ilEW%2Fimage.png?alt=media&#x26;token=343fd25f-735e-4fd1-9ffe-66f4ed6e0185" alt=""><figcaption></figcaption></figure>
