[26]

Introduction

A Confusion Matrix is an N x N matrix used for evaluating the performance of a classification model, where N is the number of target classes. The matrix compares the actual target values with those predicted by the machine learning model. This gives us a holistic view of how well our classification model is performing and what kinds of errors it is making.

True Positive: Number of positive examples classified correctly

False Negative: Number of positive examples classified incorrectly

False Positive: Number of negative examples classified incorrectly

True Negative: Number of negative examples classified correctly

Classification models have multiple output categories. Most error measures will tell us the total error in our model, but we cannot use it to find out individual instances of errors in our model.

During classification, we also have to overcome the limitations of accuracy. Accuracy can be misleading for classification problems. If there is a significant class imbalance, a model might predict the majority class for all cases and have a high accuracy score.

Why do you need Confusion Matrix?

Confusion matrix shows how any classification model is confused when it makes predictions. It not only gives you insight into the errors being made by your classifier but also types of errors that are being made. This breakdown helps you to overcome the limitation of using classification accuracy alone. Every column of the confusion matrix represents the instances of that predicted class. Each row of the confusion matrix represents the instances of the actual class. It provides insight not only the errors which are made by a classifier but also errors that are being made.

The four quadrants of a Confusion Matrix

In layman terms:

True Positive: You projected positive and its turn out to be true. For example, you had predicted that France would win the world cup, and it won.
True Negative: When you predicted negative, and it’s true. You had predicted that England would not win and it lost.
False Positive: Your prediction is positive, and it is false. You had predicted that England would win, but it lost.
False Negative: Your prediction is negative, and result it is also false. You had predicted that France would not win, but it won.

With respect to a model:

True Positive (TP): The predicted value matches the actual value. The actual value was positive and the model predicted a positive value
True Negative (TN): The predicted value matches the actual value. The actual value was negative and the model predicted a negative value.
False Positive (FP) – Type 1 error: The predicted value was falsely predicted. The actual value was negative but the model predicted a positive value. Also known as the Type 1 error.
False Negative (FN) – Type 2 error: The predicted value was falsely predicted. The actual value was positive but the model predicted a negative value. Also known as the Type 2 error.

Calculations using Confusion Matrix

Classification Measure

Basically, it is an extended version of the confusion matrix. There are measures other than the confusion matrix which can help achieve better understanding and analysis of our model and its performance.

Accuracy
Precision
Recall (TPR, Sensitivity)
F1-Score
FPR (Type I Error)
FNR (Type II Error)

Precision

It can be defined as the number of correct outputs provided by the model or out of all positive classes that have predicted correctly by the model, how many of them were actually true. Precision is defined as the ratio of the total number of correctly classified positive classes divided by the total number of predicted positive classes. Or, out of all the predictive positive classes, how much we predicted correctly. Precision should be high (ideally 1). Precision is a useful metric in cases where False Positive is a higher concern than False Negatives. It can be calculated using the below formula:

$$Precision = \frac{TP}{TP + FP}$$

Recall

It is defined as the out of total positive classes, how our model predicted correctly. The recall must be as high as possible. It is a measure of actual observations which are predicted correctly, i.e. how many observations of positive class are actually predicted as positive. It is also known as Sensitivity. Recall is a valid choice of evaluation metric when we want to capture as many positives as possible. Recall is defined as the ratio of the total number of correctly classified positive classes divide by the total number of positive classes. Or, out of all the positive classes, how much we have predicted correctly. Recall should be high (ideally 1). Recall is a useful metric in cases where False Negative trumps False Positive.

$$Precision = \frac{TP}{TP + FN}$$

Sensitivity and Specificity

In statistics, there are two other evaluation measures like:

Sensitivity: Same as TPR.
Specificity: Also called True Negative Rate (TNR).

$$TNR = \frac{TN}{TN + FP}$$

$$ FPR = 1 - specificity$$

F-measure / F1-score

If two models have low precision and high recall or vice versa, it is difficult to compare these models. So, for this purpose, we can use F-score. This score helps us to evaluate the recall and precision at the same time. The F-score is maximum if the recall is equal to the precision. The F1 score is a number between 0 and 1 and is the harmonic mean of precision and recall. We use harmonic mean because it is not sensitive to extremely large values, unlike simple averages. F1 score sort of maintains a balance between the precision and recall for your classifier. If your precision is low, the F1 is low and if the recall is low again your F1 score is low. The F1-score captures both the trends in a single value. F1 score is a harmonic mean of Precision and Recall. As compared to Arithmetic Mean, Harmonic Mean punishes the extreme values more. F-score should be high (ideally 1). It can be calculated using the below formula:

$$F-measure: \frac{2 * Recall * Precision}{Recall + Precision}$$

Classification Accuracy

It is one of the important parameters to determine the accuracy of the classification problems. It defines how often the model predicts the correct output. It can be calculated as the ratio of the number of correct predictions made by the classifier to all number of predictions made by the classifiers. The formula is given below:

$$Precision = \frac{TP + TN}{TP + FP + FN + TN}$$

Misclassification/Error Rate

It is also termed as Error rate, and it defines how often the model gives the wrong predictions. The value of error rate can be calculated as the number of incorrect predictions to all number of the predictions made by the classifier. The formula is given below:

$$Precision = \frac{FP + FN}{TP + FP + FN + TN}$$

Example

We have a total of 20 cats and dogs and our model predicts whether it is a cat or not.

Actual values = [‘dog’, ‘cat’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’]
Predicted values = [‘dog’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’, ‘cat’, ‘cat’, ‘cat’, ‘dog’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’]

True Positive (TP) = 6: You predicted positive and it’s true. You predicted that an animal is a cat and it actually is.

True Negative (TN) = 11: You predicted negative and it’s true. You predicted that animal is not a cat and it actually is not (it’s a dog).

False Positive (Type 1 Error) (FP) = 2: You predicted positive and it’s false. You predicted that animal is a cat but it actually is not (it’s a dog).

False Negative (Type 2 Error) (FN) = 1: You predicted negative and it’s false. You predicted that animal is not a cat but it actually is.

What's the point of the confusion matrix?

Table of contents