Share Now


In this tutorial, we will cover how to build the logistic regression model which is classification problem. Cases where we have a discrete variable target variable (dependent variable) we perform logistic regression. Situations where we have a continuous target variable, we perform Linear Regression.

Examples of Logistic Regression

  • Predicting if a given credit card transaction is fraud or not (Binary Classification)
  • Customer will buy the life insurance.
  • Email is Spam or Not
  • Predicting if a customer will default a loan
  • Order Delivery Mode: Fist Class, Second Class or Standard (Multiple Classification)

In the above examples, the Customer will buy the insurance or not int his we have only two outputs (yes or no). Predicting credit card transactions is fraud or not here also we have only two outputs (fraud or not fraud). So, these types of cases where the target variable has only two values are binary classification problems. On the other hand, cases where target variables have more than two output and it’s a discrete variable also, like predicting Delivery Mode. It’s First Class, Second Class or Standard Delivery comes under Multi Classification problems.

In this screenshot, we have to scatter plot where X-axis defining Age and Y-Axis is defining Customer Purchase Status.

If we plot a linear regression line (y = b0 +b1X) on these discrete values, that is not a perfect fit line. You can take a reference from the below screenshot. Cut offline is ignoring most cases that are on the right-hand side on X-axis and left-hand side on Y-axis.

To create the best fit line in logistic regression we use the sigmoid function.

Sigmoid Function in Logistic Regression

This sigmoid function converts input range into 0 and 1. In this e is Euler’s number which is constant 2.71828.

Now let’s perform Logistic Regression using Python on Jupyter Notebook.

Problem Statement:

The Dataset consists of Age and Insurance purchase status. We want to predict on the basis of customer Age, customer can buy the insurance or not.

Here we have only two outputs in the target variable (Purchase status), So it’s a Binary Classification Problem.

Steps for Regression Program in Python:

Let’s load the dataset in Pandas dataframe. Now let’s plot the chart using Matplotlib library.

Split Dataset into training and test using sklearn library.

After splitting dataset into training and test, will fit logistic regression model using Sklearn.linear_model library. And now to check the Accuracy Score let’s use score method. The model accuracy Score is 0.875, which is random may be when you perform this you will get the different score. We can also check the accuracy of logistic regression models using the confusion matrix. For Classification problems, we create a confusion matrix for accuracy check. To Import confusion matrix will take help from skelarn.metrics library and will plot the same using the Seaborn library. How to create a confusion matrix? Confusion Matrix in Python

If you read this confusion matrix, diagonally this is showing 3 and 4 which are the correct prediction cases and there is one case where the program has predicted the wrong value.

Share Now
November 18, 2019


    Leave a Message

    Your email address will not be published. Required fields are marked *

    Social Media

    facebook page EC Analytics Consulting   Linkedin Page EC Analytics Consulting   Youtube EC Analytics Consulting


    0124- 4601426


    EC Analytics will help your business make better decisions by providing expert-level business intelligence (BI) services. Forecasting, strategy, optimization, performance analysis, trend analysis, customer analysis, budget planning, financial reporting and more. EC Analytics also offers Advanced Data Analytics training in corporate and retail.

    EC Analytics Consulting @ 2019 ALL RIGHTS RESERVED