'Confusion Matrix' is a tool devised to express the efficiency of a classification algorithm. It basically tells us how good or bad the classification algorithm is by capturing the success and failure cases in the same table. A ratio between these number gives us a good perspective of the algorithm characteristics.
Usually, the confusion matrix is written in this format:
In the above combinations, the first matrix is the way the decision matrix is usually represented, but I have seen many occasions where people use other combinations as well. Also, in the confusion matrix False/0 usually represent the majority class and True/1 represent the minority class. Another way to put it is, one easy way to get greater than 50% accuracy is by simply predicting all samples as False/0. This is called as NULL accuracy. The goal of the machine learning algorithm is to better characterise and identify minority class of the population by using the sample data set available and do better than all null accuracy.
The outcome a of prediction is written as a combination of two words: True/False + Positive/Negative:
- True Positive
- True Negative
- False Positive
- False Negative
The first word True/False represents the outcome of the prediction. Did the prediction match or did not match. And the second word represents what was the class of prediction Negative meaning majority class and Positive meaning minority class.
So the interpretation of the outcome will be as follows:
- True Positive:- Correctly classified as 1
- True Negative:- Correctly classified as 0
- False Positive:- Incorrectly (falsely) classified as 1
- False Negative:- Incorrectly (falsely) classified as 0
To judge the performance of any classification algorithm people usually use accuracy as a metric. Accuracy is nothing but the sum of all correct predictions divided by the total number of predictions.
Accuracy = (TP+TN)/(Total)
In the confusion matrix combinations mentioned above, we can observe the “True” count is always on the matrix diagonal. A simple way to calculate accuracy without interpreting the matrix axis would be
Accuracy = sum(matrix diagonal)/sum(matrix)
Apart from accuracy, precision and recall metrics are equally important and based on the use case one can decide which metric is more important. Is one trying to maximize hits or minimize misses and which errors are more costly?
Whereas with precision and recall, unlike accuracy, one can not get the values without matrix axes. But since we know True values are on the matrix diagonal one can easily locate the False Positive and False Negatives. The formula for Recall and Precision is given below:
Precision = TP/(TP+FP)
Recall = TP/(TP+FN)
In other words, Precision is expressed as a ratio of True Positive with False Positive, i.e., Type I error. And Recall is expressed as a ratio of True Positive as a ratio with False Negatives, i.e., Type II errors. Usually, in life-threatening type of use cases, Type II errors are more important and hence minimizing recall would be the goal.
As anyways there would be use cases where Type I and Type II errors are both important, in that case, we have F1 which is expressed as a harmonic mean between Precision and Recall.
F1 = (2 * Precision * Recall)/(Precision + Recall)