Evaluate your binary classification model

Share this post

The purpose of this article is to give you the keys to properly evaluate your binary classification model. For that I would quickly go to the essential because the goal is not to determine how to choose such or such algorithm but to evaluate its relevance. Don’t worry because the choice aspect will of course be the subject of a future article. Let’s see from the same data set and especially from the same work on these data how we must evaluate the execution of several models.

We are already training our models

To do this we will start with the kaggle data of the Titanic (which I have already used several times in my previous articles ). Download them here if you haven’t picked them up yet. After some preparations, we will apply 3 different Machine learning algorithms:

First of all the preparation of the data:

import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.dummy import DummyClassifier

titanic = pd.read_csv("../titanic/data/train.csv")
def Prepare_Modele(X, cabin):
    target = X.Survived
    sexe = pd.get_dummies(X['Sex'], prefix='sex')
    cabin = pd.get_dummies(cabin.str[0], prefix='Cabin')
    age = X['Age'].fillna(X['Age'].mean())
    X = X[['Pclass', 'SibSp']].join(cabin).join(sexe).join(age)
    return X, target
cabin = titanic['Cabin'].fillna('X')
X, y = Prepare_Modele(titanic, cabin)

Once again, the objective here is not to explain how the data was prepared, I refer you for that to the article on the one-hot in particular .

Now let’s train our models, we thus obtain an overall scoring of:

Precision & Recall

An overall scoring is good but when it comes to classification it is important to go further. Indeed if the classification is binary, the errors must be evaluated more closely because the importance of a false positive will not be the same as that of a false negative. You will probably even want to play the cursor between these two types of errors. But what is a False-Positive? and who of a False-Negative?

A False-Positive is a false positive prediction, a False-Negative is a false negative prediction!

Imagine being screened for some disease. A False-Positive verdict tells you that you are sick when you are not. A False-Negative verdict tells you that you are not sick… although you are. An overall scoring can be useful but what you will really want to measure is the False-Negative rate because it is this one that can have a big impact on your prediction.

In Machine Learning parlance, you’re going to want to maximize recall!

How to calculate the recall , and very simply using this formula:

As for precision, it is its counterpart:

A third element measures the f-measure indicator , allowing you to combine the two elements into one indicator:

The precision will allow us in particular to measure the capacity of the model to refuse irrelevant results.

Retrieve all these measurement elements simply with the classification_report (scikit-learn) method.

Confusion matrix

Now how do we just get those True-False Positives-Negatives back? Well, quite simply by using the Confusion Matrix.

Building this matrix is ​​very simple because it represents precisely these 4 values:

Of course scikit-learn provides it to you very simply with the function  confusion_matrix :

from sklearn.metrics import confusion_matrix
c_dm = confusion_matrix (y, p_dm)
print ("Matrice de confusion / Dummy\n", c_dm)
c_rl = confusion_matrix (y, p_rl)
print ("Matrice de confusion / Reg. Linéaire\n", c_rl)
c_tr = confusion_matrix (y, p_tr)
print ("Matrice de confusion / Random Foret\n", c_tr)

ROC and AUC curve

In order to bring a visual touch and above all to be more relevant in the analysis, an effective tool is the ROC curve. This curve allows you to see the rate of false positives versus true positives at a glance.

from sklearn.metrics import roc_curve
import matplotlib.pyplot as plt
faux_positifs_rl, vrais_positifs_rl, seuil_rl = roc_curve(y, lr1.decision_function(X))
plt.plot(faux_positifs_rl, vrais_positifs_rl, label="Régression Linéaire")
faux_positifs_tr, vrais_positifs_tr, seuil_tr = roc_curve(y, tree.predict_proba(X)[:,1])
plt.plot(faux_positifs_tr, vrais_positifs_tr, label="Random Forest")
faux_positifs_dm, vrais_positifs_dm, seuil_dm = roc_curve(y, dummy.predict_proba(X)[:,1])
plt.plot(faux_positifs_dm, vrais_positifs_dm, label="Dummy")
plt.xlabel ("Faux positifs")
plt.ylabel ("Vrais positifs")
plt.legend ()

To read it better, it is simply wrong to understand that the more the curve is squashed towards the Top-Left edge, the better the model! In our example it is rather obvious of course, but sometimes we will have to measure the area under the curve to measure precisely.

the area under the ROC curve is the AUC area.

To measure it, use scikitlearn’s metrics.auc method:

from sklearn import metrics
print ("AUC Dummy: ",metrics.auc(faux_positifs_dm, vrais_positifs_dm))
print ("AUC Reg. linéaire: ",metrics.auc(faux_positifs_rl, vrais_positifs_rl))
print ("AUC Random Forest: ",metrics.auc(faux_positifs_tr, vrais_positifs_tr))
Share this post

Benoit Cayla

In more than 15 years, I have built-up a solid experience around various integration projects (data & applications). I have, indeed, worked in nine different companies and successively adopted the vision of the service provider, the customer and the software editor. This experience, which made me almost omniscient in my field naturally led me to be involved in large-scale projects around the digitalization of business processes, mainly in such sectors like insurance and finance. Really passionate about AI (Machine Learning, NLP and Deep Learning), I joined Blue Prism in 2019 as a pre-sales solution consultant, where I can combine my subject matter skills with automation to help my customers to automate complex business processes in a more efficient way. In parallel with my professional activity, I run a blog aimed at showing how to understand and analyze data as simply as possible: datacorner.fr Learning, convincing by the arguments and passing on my knowledge could be my caracteristic triptych.

View all posts by Benoit Cayla →

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Fork me on GitHub