# K-Nearest Neighbors

### A Quick Review

<figure><img src="/files/94IopdoJnAuevAz4gIT1" alt=""><figcaption></figcaption></figure>

1. The k-nearest neighbors method is a supervised learning approach that does not need to fit a model to the data.
2. Data points are classified based on the categories of the k nearest neighbors in the training data set.
3. In Biopython, the k-nearest neighbors method is available in Bio.kNN.
4. k is the number of neighbors k that will be considered for the classification.
5. For classification into two classes, choosing an odd number for k lets you avoid tied votes
6. No exact physical or biological rules to define the best value of k
7. Apply different values of K to a part of training data Then evaluate the performance using the other part of training data
8. Low values of k such as 1 or 2 can be noisy and subject to outliers
9. High values of k smooth over things. But you do not want k to be too big because small clusters will be out weighed by other clusters

### Basic Flow of KNN

Import KNN module from Biopython

```
 
from Bio import kNN
```

Define a list containing the distance and the score of similarity in expression profile between the 2 genes

```
xs =      [[-53, -200.78],
          [117, -267.14],
          [57, -163.47],
          [16, -190.30],
          [11, -220.94],
          [85, -193.94],
          [16, -182.71],
          [15, -180.41],
          [-26, -181.73],
          [58, -259.87],
          [126, -414.53],
          [191, -249.57],
          [113, -265.28],
          [145, -312.99],
          [154, -213.83],
          [147, -380.85],
          [93, -291.13]]
```

Define a list specifies if the gene pair belongs to the same operon (1) or different operons (0)

```
ys =     [1,
          1,
          1,
          1,
          1,
          1,
          1,
          1,
          1,
          1,
          0,
          0,
          0,
          0,
          0,
          0,
          0]
```

Define the number of nereast neightours.Choosing an odd number for k lets you avoid tied votes

```
k = 3
```

Create and initialize a k-nearest neighbors model

The function name train is a bit deceiving since no model training is done

This function simply stores xs, ys, and k in model.

```
model = kNN.train(xs, ys, k)
```

\
Using a k-nearest neighbors model for classification Classify yxcE, yxcD

```
pair1 = [6, -173.143442352]
kNN.classify(model, pair1)
```

```
Out[1]: 1
```

Classify yxiB and yxiA

```
pair2 = [309, -271.005880394]
kNN.classify(model, pair2)
```

```
Out[2]: 0
```

This is consistent with the results from logistic regression

Hooray! Let's celebrate again!!!

### Fancier Analyses

To run the code in this section, the following keys steps must be run ahead of time

1. Import KNN module from Biopython
2. Define a list containing the distance and the score of similarity in expression profile between the 2 genes
3. Define a list specifies if the gene pair belongs to the same operon (1) or different operons (0)

#### Print Classification Output in Intuitive Ways

classify yxcE, yxcD

```
pair1 = [6, -173.143442352]
print("yxcE, yxcD:", kNN.classify(model, pair1))
```

Output:

```
yxcE, yxcD: 1
```

Classify yxiB and yxiA

```
pair2 = [309, -271.005880394]
print("yxiB, yxiA:", kNN.classify(model, pair2))
```

Output:

```
yxiB, yxiA: 0
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://igb.mit.edu/mini-courses/python/machine-learning-with-python/hands-on/supervised-approaches/k-nearest-neighbors.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
