K-Nearest Neighbors
Last updated
Last updated
MIT Resources
https://accessibility.mit.eduMassachusetts Institute of Technology
The k-nearest neighbors method is a supervised learning approach that does not need to fit a model to the data.
Data points are classified based on the categories of the k nearest neighbors in the training data set.
In Biopython, the k-nearest neighbors method is available in Bio.kNN.
k is the number of neighbors k that will be considered for the classification.
For classification into two classes, choosing an odd number for k lets you avoid tied votes
No exact physical or biological rules to define the best value of k
Apply different values of K to a part of training data Then evaluate the performance using the other part of training data
Low values of k such as 1 or 2 can be noisy and subject to outliers
High values of k smooth over things. But you do not want k to be too big because small clusters will be out weighed by other clusters
Import KNN module from Biopython
Define a list containing the distance and the score of similarity in expression profile between the 2 genes
Define a list specifies if the gene pair belongs to the same operon (1) or different operons (0)
Define the number of nereast neightours.Choosing an odd number for k lets you avoid tied votes
Create and initialize a k-nearest neighbors model
The function name train is a bit deceiving since no model training is done
This function simply stores xs, ys, and k in model.
Using a k-nearest neighbors model for classification Classify yxcE, yxcD
Classify yxiB and yxiA
This is consistent with the results from logistic regression
Hooray! Let's celebrate again!!!
To run the code in this section, the following keys steps must be run ahead of time
Import KNN module from Biopython
Define a list containing the distance and the score of similarity in expression profile between the 2 genes
Define a list specifies if the gene pair belongs to the same operon (1) or different operons (0)
classify yxcE, yxcD
Output:
Classify yxiB and yxiA
Output: