How to Interpret R Squared and Adjusted R Squared

Share Now


Hi Everyone, In this article of our machine learning training series using Python from EC Analytics, we’ll get the understanding about R squared value, and Adjusted R squared. Both these values we use to test the accuracy of our linear regression models.

Now first let’s start with the R squared value. R squared is used to determine the goodness of fit in regression analysis. It’s a statistical measure that represents the portion of variance for a dependent variable, means how much is the independent variable is capable to explain the dependent variable.

R Squared Equation:

R Squared Formula

1 – Sum of square of residuals/sum of square of total

Now here let’s understand what Sum of Square of residuals.

In the below screenshot we have scatter plot chart with predicted line(best fit line). Best fit line we have created using regression models. The difference between predicted value on best fit line and Actual value is residual or Error.

R Sqaured

For SSR (Sum of Square of residuals) we calculate square and do the summation of each residual.

For Example:

Predicted Values are 10, 12, 8, 15, 13

Actual Values are 9, 11, 8, 20, 17

Then Sum of Square of Residual is = (Actual – Predicted Values)^2

(9-10,11-12,8-8,15-20,13-17)^2

And for SST (Sum of Square of Total) we calculate square and do the summation of difference between Average line as predicted line and actual Values.

How to interpret using R-Squared Value in regression:

R-squared is always between 0 to 100%:

  • 0% indicates that the model explains no variability of the response data(independent variable) around it’s mean.
  • 100% indicates that the model explains all the variability of the response data(independent variable) around it’s mean.

Now let’s understand the Adjusted R square, which also use to check the accuracy of our model regression model. When our model we have multiple independent variables, we use Adjusted R-Squared instead of R-Squared value for model evaluation.

Because when we have more independent variable in R-Squared calculation(weather they are correlated or not correlated), R-squared value automatically get increased due to slope coefficient value.

R Squared Formula

In Adjusted R-Squared calculations, it penalizes the independent variables which are not correlated to the dependent variable. So there is no impact of independent variable which are not correlated to dependent variable. So in our Multiple Regression cases we use Adjusted R square for model accuracy check instead of R-Square.

Ho to calculate R-Squared and Adjusted R-Squared in Python using stats-model library of sci-kit learn.

The model which we are using here is Multiple Regression Analysis Model. In this example we have Sales as dependent variable, Marketing Expenses (Direct, Tele, Email) and Region as Independent Variables(Predictor).

#Dummy Variable Trap

#compare this using formula from statsmodel library

In the above code, In the sm.ols function, we have used direct, tele and region independent variables. You can add or remove independent variables to check the optimized R-Squared and Adjusted R-Squared Value.



Share Now
November 19, 2019

0 responses on "How to Interpret R Squared and Adjusted R Squared"

    Leave a Message

    Your email address will not be published. Required fields are marked *

    Tableau Training in Delhi

    EC Analytics will help your business make better decisions by providing expert-level business intelligence (BI) services. Forecasting, strategy, optimization, performance analysis, trend analysis, customer analysis, budget planning, financial reporting and more. EC Analytics also offers Advanced Data Analytics training in corporate and retail.

    Address

    NM 23, SECTOR 14, OLD DLF COLONY,
    GURGAON (HARYANA)
    0124- 4601426

    Featured Testimonial

    The experience...Read more

    Karan Jeena (HR Analyst)

    ZS Associates

    EC Analytics Consulting @ 2019 ALL RIGHTS RESERVED