LOGISTIC REGRESSIONIn this tutorial, we will cover how to build the logistic regression model which is classification problem. Cases where we have a discrete variable target variable (dependent variable) we perform logistic regression. Situations where we have a continuous target variable, we perform Linear Regression.
Examples of Logistic Regression
- Predicting if a given credit card transaction is fraud or not (Binary Classification)
- Customer will buy the life insurance.
- Email is Spam or Not
- Predicting if a customer will default a loan
- Order Delivery Mode: Fist Class, Second Class or Standard (Multiple Classification)
In the above examples, the Customer will buy the insurance or not int his we have only two outputs (yes or no). Predicting credit card transactions is fraud or not here also we have only two outputs (fraud or not fraud). So, these types of cases where the target variable has only two values are binary classification problems. On the other hand, cases where target variables have more than two output and it’s a discrete variable also, like predicting Delivery Mode. It’s First Class, Second Class or Standard Delivery comes under Multi Classification problems.
In this screenshot, we have to scatter plot where X-axis defining Age and Y-Axis is defining Customer Purchase Status.
If we plot a linear regression line (y = b0 +b1X) on these discrete values, that is not a perfect fit line. You can take a reference from the below screenshot. Cut offline is ignoring most cases that are on the right-hand side on X-axis and left-hand side on Y-axis.
To create the best fit line in logistic regression we use the sigmoid function.
This sigmoid function converts input range into 0 and 1. In this e is Euler’s number which is constant 2.71828.Now let’s perform Logistic Regression using Python on Jupyter Notebook.
The Dataset consists of Age and Insurance purchase status. We want to predict on the basis of customer Age, customer can buy the insurance or not.
Here we have only two outputs in the target variable (Purchase status), So it’s a Binary Classification Problem.
Steps for Regression Program in Python:Let’s load the dataset in Pandas dataframe.
import pandas as pd
df = pd.read_excel('Logistic_Regression(Binary Classification).xlsx')
from matplotlib import pyplot as plt
Split Dataset into training and test using sklearn library.
from sklearn.model_selection import
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(solver='lbfgs')
y_pred = model.predict(X_test)
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
import seaborn as sn
plt.figure(figsize = (10,7))
sn.heatmap(cm, annot = True)
If you read this confusion matrix, diagonally this is showing 3 and 4 which are the correct prediction cases and there is one case where the program has predicted the wrong value.