👉 Note: Reading: Hands-On ML - Chap 3: Classification (section “Confusion Matrices”)
- True Positive (TP): what we predict Positive is really Positive.
- True Negative (FN): what we predict Negative is really Negative.
- False Negative (FN): what we predict Negative is actually Positive.
- False Positive (FP): what we predict Positive is actually Negative.
- True/False indicates what we predicted is right/wrong.
- Positive/Negative is what we predicted (yes or no).
- FP = Type I error = rejection of true null hypothesis = negative results are predicted wrongly = what we predict positive is actually negative.
- FN = Type II error = non-rejection of a false null hypothesis = positive results are predicted wrongly = what we predict negative are actually positive.
Give a general view about our model, "is it really good?" thanks to precision and recall!
- Precision: How many of our positive predictions are really true? (Check the accuracy of our positive predictions).
- Recall: How many of positive results belong to our predictions? (Do we miss some negative predictions?)
- Precision is importantly used when the "wrongly predicted yes" (FP) influences much (e.g. This email is spam? — results yes but actually no and we lost important emails!).
- Recall (Sensitivity) is importantly used when the "wrongly predicted no" (FN) influences much (e.g. In the banking industry, this transaction is fraudulent? — results no but actually yes and we lost money!).
With thresholds, we can use
precision_recall_curve()
to compute precision and recall for all possible thresholds,Trace-off: Higher precision, lower recall and vice versa.
1from sklearn.metrics import precision_recall_curve
2precisions, recalls, thresholds = precision_recall_curve(y_train_5, y_scores)
3
4plt.plot(thresholds, precisions[:-1], "b--", label="Precision")
5plt.plot(thresholds, recalls[:-1], "g-", label="Recall")
6plt.show()
High precision and low recall or vice versa? F1-Score gives us a balance between precision and recall.
F1-score depends on how we label the class "positive". This email is spam? is very different from This email is not spam?
- When you need a balance between precision and recall.
- When we have a "skewed class" problem (uneven class distribution, too many "yes" and very few "no", for example).
- One of precision and recall is improved but the other changes too much, then f1-score will be very small!
Normally, and it gets the higher values, the better our model is.
- The best one (), both precision and recall get .
- One of precision and recall gets very small value (close to 0), is very small, our model is not good!
What if we prefer one of precision and recall than the other? We consider (ref)
is a special case of when :
- When precision is more important than recall, we choose (usually choose ).
- When recall is more important than precision, we choose (usually choose ).
- Accuracy: How accurate our predictions to the whole predictions?
- Specificity: How many negative results belong to our predictions?
- Accuaracy is used when we have symmetric datasets.
- Specificity is used when we care about TN values and don't want to make false alarms of the FP values (e.g. drug test).
- ROC = Receiver operating characteristic.
- A common tool used with binary classifier.
- Diffrent from precision/recall curve, ROC plots the true positive rate (recall) against the false positive rate (1 - specificity).
Trade-off: the higher recall, the more FPR (predict wrong) the classifier produces.
1from sklearn.metrics import roc_curve
2import matplotlib.pyplot as plt
3%matplotlib inline
4
5fpr, tpr, thresholds = roc_curve(y_test, y_pred_prob)
6# create plot
7plt.plot(fpr, tpr, label='ROC curve')
8plt.plot([0, 1], [0, 1], 'k--') # Dashed diagonal
9plt.show()
- AUC = Area under the curve.
- Perfect classifier will have AUC = 1 (fit the rectangle).
- The purely random classifier (dotted line) will have AUC = 0.5.
Read this note.
When to use PR curve and when to use ROC curve? → Read this note.
1from sklearn.metrics import confusion_matrix
2n_classes = target.shape[0]
3confusion_matrix(y_true, y_pred, labels=range(n_classes))
Precision / Reacall / f1-score / support
1from sklearn.metrics import classification_report
2classification_report(y_test, y_pred)
- Classification: Precision and Recall - Google Developers, Machine Learning Crash Course.
- Classification: Check Your Understanding (Accuracy, Precision, Recall) - Google Developers, Machine Learning Crash Course.
- F-measure versus Accuracy - NLP blog.
- Accuracy, Precision, Recall or F1? - Koo Ping Shung, Towards Data Science.
- Dealing with Imbalanced data: undersampling, oversampling and proper cross-validation - Marco Altini.
- Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on? - Salma Ghoneim, Towards Data Science.