# The Logistic Regression Model

### Quick Review

<figure><img src="/files/jIXkFgF51YQ5nkh2VnlC" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/Vxx3drscgb5KlcNeE1iq" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/vgh3OiYsNr07Z7gCFzEn" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/cO9JgrQ5LLZsoUTEfFfC" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/TFiU52BnqwLUSQZxmKLQ" alt=""><figcaption></figcaption></figure>

### Basic Flow

Import LogisticRegression module from Biopython

```
 
from Bio import LogisticRegression
```

Define a list containing the distance and the score of similarity in expression profile between the 2 genes

```
xs =      [[-53, -200.78],
          [117, -267.14],
          [57, -163.47],
          [16, -190.30],
          [11, -220.94],
          [85, -193.94],
          [16, -182.71],
          [15, -180.41],
          [-26, -181.73],
          [58, -259.87],
          [126, -414.53],
          [191, -249.57],
          [113, -265.28],
          [145, -312.99],
          [154, -213.83],
          [147, -380.85],
          [93, -291.13]]
```

Define a list specifies if the gene pair belongs to the same operon (1) or different operons (0)

```
ys =     [1,
          1,
          1,
          1,
          1,
          1,
          1,
          1,
          1,
          1,
          0,
          0,
          0,
          0,
          0,
          0,
          0]
```

Run logistic regression on the training data to esitmate coefficients

```
model = LogisticRegression.train(xs, ys)
```

Output the weights estimated from logistic regression training model

```
model.beta
```

```
Out[1]: [8.983029015714463, -0.03596896044485087, 0.02181395662983518]
```

Using the logistic regression model for classification

Let's predict gene pair yxcE — yxcD

```
LogisticRegression.classify(model, [6, -173.143442352])
```

```
Out[2]: 1
```

Let's predict gene pair yxiB — yxiA

```
LogisticRegression.classify(model, [309, -271.005880394])
```

```
Out[3]: 0
```

The above prediction agrees with the biological literature

Hooray! Let's celebrate!!!

### Fancier Analyses

To run the code in this section, the following keys steps must be run ahead of time

1. Import LogisticRegression module from Biopython
2. Define a list containing the distance and the score of similarity in expression profile between the 2 genes
3. Define a list specifies if the gene pair belongs to the same operon (1) or different operons (0)

#### Using arguments in train function

We can use update\_fn argument in the train function to track the progress of the model calculation

First define a fundtion show\_progress to print the desired outputs

```
def show_progress(iteration, loglikelihood):
        print("Log-likelihood function:", loglikelihood)
```

Then call train function with update\_fn argument to output log-likelihood at each iteration

```
model = LogisticRegression.train(xs, ys, update_fn=show_progress)
```

The iteration stops once the increase in the log-likelihood function is less than 0.01. If no convergence is reached after 500 iterations, the train function returns with an AssertionError.

<br>

We can also apply typecode argument in train function to save memory for gigantic data it may be necessary to use single-precision floats rather than double

```
model = LogisticRegression.train(xs, ys, typecode="float")
```

Take a look at the estimated weights

```
model.beta
```

```
Out[1]: [8.983029015714463, -0.03596896044485087, 0.02181395662983518]
```

#### Print the classification output in more intuitive ways

Print the classification results in more intuitive ways

```
print("yxcE, yxcD:", LogisticRegression.classify(model, [6, -173.143442352]))
```

Output

```
yxcE, yxcD: 1
```

```
print("yxiB, yxiA:", LogisticRegression.classify(model, [309, -271.005880394]))
```

Output

```
yxiB, yxiA: 0
```

#### Obtain the probabilities

Obtain the probabilities to find out how confident we can be in these predictions

```
q, p = LogisticRegression.calculate(model, [309, -271.005880394])
print("class OP: probability =", p, "class NOP: probability =", q)
```

Output

```
class OP: probability = 0.00032121125181733316 class NOP: probability = 0.9996787887481826
```

#### Prediction Accuracy

To get some idea of the prediction accuracy of the logistic regression model, we can apply it to the training data:

```
for i in range(len(ys)):
        print("True:", ys[i], "Predicted:", LogisticRegression.classify(model, xs[i]))
```

Output

```
True: 1 Predicted: 1
True: 1 Predicted: 0
True: 1 Predicted: 1
True: 1 Predicted: 1
True: 1 Predicted: 1
True: 1 Predicted: 1
True: 1 Predicted: 1
True: 1 Predicted: 1
True: 1 Predicted: 1
True: 1 Predicted: 1
True: 0 Predicted: 0
True: 0 Predicted: 0
True: 0 Predicted: 0
True: 0 Predicted: 0
True: 0 Predicted: 0
True: 0 Predicted: 0
True: 0 Predicted: 0
```

A more reliable estimate of the prediction accuracy can be found from a leave-one-out analysis the model is recalculated from the training data after removing the gene to be predicted:

```
for i in range(len(ys)):
        model = LogisticRegression.train(xs[:i]+xs[i+1:], ys[:i]+ys[i+1:])
        print("True:", ys[i], "Predicted:", LogisticRegression.classify(model, xs[i]))
```

Output

```
True: 1 Predicted: 1
True: 1 Predicted: 0
True: 1 Predicted: 1
True: 1 Predicted: 1
True: 1 Predicted: 1
True: 1 Predicted: 1
True: 1 Predicted: 1
True: 1 Predicted: 1
True: 1 Predicted: 1
True: 1 Predicted: 1
True: 0 Predicted: 0
True: 0 Predicted: 0
True: 0 Predicted: 0
True: 0 Predicted: 0
True: 0 Predicted: 1
True: 0 Predicted: 0
True: 0 Predicted: 0
```

#### Using optional arguments in classify function to specify distance and weight&#x20;

By default, the Euclidean distance is used.

<figure><img src="/files/ceXX4f0flPRICvYaHEof" alt=""><figcaption></figcaption></figure>

Instead, we could use the city-block (Manhattan) distance:

```
def cityblock(x1, x2):
    assert len(x1)==2
    assert len(x2)==2
    distance = abs(x1[0]-x2[0]) + abs(x1[1]-x2[1])
    return distance

pair1=[6, -173.143442352]
print("yxcE, yxcD:", kNN.classify(model, pair1, distance_fn = cityblock))
```

Output:

```
yxcE, yxcD: 1
```

Using weight function to assign higher weight for closer pairs

<figure><img src="/files/YXZ0msyQTIrE6BNJI0Ma" alt=""><figcaption></figcaption></figure>

```
def weight(x1, x2):
    assert len(x1)==2
    assert len(x2)==2
    return math.exp(-abs(x1[0]-x2[0]) - abs(x1[1]-x2[1]))

x = [6, -173.143442352]
print("yxcE, yxcD:", kNN.classify(model, x, weight_fn = weight))
```

Output:

```
yxcE, yxcD: 1
```

## Call the calculate function to calculate the confidence in predictions <a href="#firstheading" id="firstheading"></a>

This will calculate the total weight assigned to the classes OP and NOP

```
pair1 = [6, -173.143442352]
weight = kNN.calculate(model, pair1)
print("class NOP: weight =", weight[0], "class OP: weight =", weight[1])
```

Output:

```
class NOP: weight = 0.0 class OP: weight = 3.0
```

```
pair2 = [117, -267.14]
weight = kNN.calculate(model, pair2)
print("class NOP: weight =", weight[0], "class OP: weight =", weight[1])
```

Output:

```
class NOP: weight = 2.0 class OP: weight = 1.0
```

#### Prediction Accuracy

```
for i in range(len(ys)):
        print("True:", ys[i], "Predicted:", kNN.classify(model, xs[i]))
```

Output:

```
True: 1 Predicted: 1
True: 1 Predicted: 0
True: 1 Predicted: 1
True: 1 Predicted: 1
True: 1 Predicted: 1
True: 1 Predicted: 1
True: 1 Predicted: 1
True: 1 Predicted: 1
True: 1 Predicted: 1
True: 1 Predicted: 0
True: 0 Predicted: 0
True: 0 Predicted: 0
True: 0 Predicted: 0
True: 0 Predicted: 0
True: 0 Predicted: 0
True: 0 Predicted: 0
True: 0 Predicted: 0
```

The prediction is correct for all but two of the gene pairs

\
A more reliable estimate of the prediction accuracy can be found from a leave-one-out analysis The model is recalculated from the training data after removing the gene to be predicted:

```
k = 3
for i in range(len(ys)):
    model = kNN.train(xs[:i]+xs[i+1:], ys[:i]+ys[i+1:], k)
    print("True:", ys[i], "Predicted:", kNN.classify(model, xs[i]))
```

Output:

```
True: 1 Predicted: 1
True: 1 Predicted: 0
True: 1 Predicted: 1
True: 1 Predicted: 1
True: 1 Predicted: 1
True: 1 Predicted: 1
True: 1 Predicted: 1
True: 1 Predicted: 1
True: 1 Predicted: 1
True: 1 Predicted: 0
True: 0 Predicted: 0
True: 0 Predicted: 0
True: 0 Predicted: 1
True: 0 Predicted: 0
True: 0 Predicted: 0
True: 0 Predicted: 0
True: 0 Predicted: 1
```

The prediction is correct for 13 out of 17 gene pairs, 76%


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://igb.mit.edu/mini-courses/python/machine-learning-with-python/hands-on/supervised-approaches/the-logistic-regression-model.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
