Key metrics to assess logistic regression results

Logistic regression is a powerful technique for categorical prediction tasks in data science. Data professionals often use metrics such as precision, recall, and accuracy, as well as visualizations such as ROC curves, to gauge the performance of their logistic regression models. It is important to evaluate the performance of a model, as this shows how well the model can make predictions.

Some of these metrics are based on a confusion matrix. A confusion matrix is a graphical representation of how accurate a classifier is at predicting the labels for a categorical variable. It helps summarize the performance of a classifier.

True label	0	TN	FP
	1	FN	TP
		0	1
	Predicted label

We may see numbers in a confusion matrix as in scientific notation. Briefly, an e+03 means multiplying the value by 10³ or 1000. Similarly, e-02 means multiply the value by 10^-2 or 0.01.

9.230705e-01 = 0.9230705
8.339610e+03 = 8339.610

Key evaluation metrics

Precision, Recall and Accuracy

Precision measures the proportion of positive predictions that were true positives.

Precision = True Positives / (True Positives + False Positives)

import sklearn.metrics as metrics
metrics.precision_score(y_test, y_pred)

Recall measures the proportion of positives the model was able to identify correctly.

Recall = True Positives / (True Positives + False Negatives)

metrics.recall_score(y_test, y_pred)

Accuracy is the proportion of data points that were correctly categorized.

Recall = (True Positives + True Negatives) / Total Predictions

metrics.accuracy_score(y_test, y_pred)

ROC and AUC

Two other common evaluation techniques that may be helpful when working with classifiers are: the ROC curve (Receiver-operating characteristic) and AUC (Area under the curve).

These concepts are related to thresholds, true positives, and false positives. Although we use a threshold of 0.5 to generate our predictions, sometimes the threshold is determined based on the scenario.

Notice that when we decrease the threshold the true positive increases because we are predicting more observations to be positive, but the false positive rate also increases.

The model’s true positive rate and false positive rate changes at every threshold. With an ideal model there would exist a threshold where the true positive rate is high and the false positive rate is low.

We can use an ROC curve and AUC to examine how the true positive and false positive rate change together at every threshold.

ROC curves

To visualize the performance of a classifier at different classification thresholds, we can graph an ROC curve. In the context of binary classification, a classification threshold is a cutoff for differentiating the positive class from the negative class.

An ROC curve plots two key concepts

True Positive Rate: equivalent to Recall. Recall measures the proportion of data points that are predicted as True, out of all the data points that are actually True.

True Positive Rate = True Positives / (True Positives + False Negatives)

False Positive Rate: The ratio between the False Positives and the total count of observations that should be predicted as False.

False Positive Rate = False Positives / (False Positives + True Negatives)

For each point on the curve, the x and y coordinates represent the False Positive Rate and the True Positive Rate respectively at the corresponding threshold.

We can examine an ROC curve to observe how the False Positive Rate and True Positive Rate change together over the different thresholds. In the ROC curve for an ideal model, there would exist a threshold at which the True Positive Rate is high and the False Positive Rate is low. The more that the ROC curve hugs the top left corner of the plot, the better the model does at classifying the data.

import matplotlib.pyplot as plt
from sklearn.metrics import RocCurveDisplay
RocCurveDisplay.from_predictions(y_test, y_pred)
plt.show()

AUC (Area under the curve)

AUC stands for area under the ROC curve. AUC provides an aggregate measure of performance across all possible classification thresholds. AUC ranges in value from 0.0 to 1.0.

An AUC smaller than 0.5 indicates that the model performs worse than a random classifier (i.e. a classifier that randomly assigns each example to True or False), and an AUC larger than 0.5 indicates that the model performs better than a random classifier.

In Python we can get the AUC score with the simple line of code below:

metrics.roc_auc_score(y_test,y_pred)

Disclaimer: Like most of my posts, this content is intended solely for educational purposes and was created primarily for my personal reference. At times, I may rephrase original texts, and in some cases, I include materials such as graphs, equations, and datasets directly from their original sources.

I typically reference a variety of sources and update my posts whenever new or related information becomes available. For this particular post, the primary source was Google Advanced Data Analytics Professional Certificate.