Common evaluation metrics are:
- R-squared
- Mean squared error, MSE
- Mean absolute error, MAE
R²: The coefficient of determination
The main metric that academics researchers and data professionals use when evaluating regression models is called the coefficient of determination or R-squared.
R² measures the proportion of variation in the dependent variable, Y, explained by the independent variable(s), X. This is calculated by subtracting the sum of squared residuals divided by the total sum of squares from 1.
R² = 1 − Sum of squared residuals / Total sum of squares
Previously, based on our regression analysis, we found the best fit line. But the data points just cluster around this best fit line, many of the data points are actually not on the line. This means that bill length (X) only accounts for some of the changes in body mass (Y).
R-squared helps data professionals determine how much of the variation in the X variable explains the variation in the Y variable:
- At most R-squared can equal 1, which would mean that X explains 100% of the variance in Y.
- If R-squared equals 0, then that would mean X explains 0% of the variance in Y.
The OLS summary table shows the model has an R-squared of 0.769. This means that bill length explains about 77% of the variance in body mass. There is still about 23% of the variance of body mass that is unexplained by the model. This variance might be due to other factors or natural unexplained differences from penguin to penguin.
There is no benchmark value that R-squared has to equal. But in general, the higher R-squared the better, because it adds validity to any recommendation we make based on our analysis.
Hold-out sample to help evaluating a model
But there are also processes that help strengthen the evaluation of a model. Typically when we have a data set, we use at least part of the dataset to build and test the regression model. The computer uses the data to calculate a measure of difference between the actual and predicted values, such as sum of squared residuals. Then based on the computer’s calculations, it can find the best line. But sometimes we want our model to be good at generating predictions for data we haven’t collected, or that doesn’t exist yet.
So in those cases we want to know how the model we built performs on the data it learned from, and how the model performs on data it hasn’t experienced yet. In this case we’ll need to save a hold-out sample before we build the model.
A hold-out sample is a random sample of observed data that is not used to fit the model. Then we can evaluate how well the model fits the data used to build the model, and we can evaluate how well the model fits the hold-out sample.
MSE: Mean squared error
MSE (mean squared error) is the average of the squared difference between the predicted and actual values. Because of how MSE is calculated, MSE is very sensitive to large errors.
MAE: Mean absolute error
MAE (mean absolute error) is the average of the absolute difference between the predicted and actual values. If our data has outliers that we want to ignore, we can use MAE, as it is not sensitive to large errors.
Interpret and present linear regression results
In the execute phase of the PACE framework, our ability to communicate is crucial. Stakeholders might be non-technical business partners so data-specific terminology wouldn’t interest them. As a matter of fact, it would be too technical and they would probably lose interest very quickly, which would cause lack of buy-in.
By providing measures of uncertainty around our estimates, we’re responsibly reporting our results.
Correlation versus causation: Interpret regression results
To generalize, correlation measures the way two variables tend to change together. There is a metric called the Pearson correlation coefficient that ranges from -1 to 1 that can measure the relationship between two variables.
Note that correlation is just observational. Two variables can be correlated; they tend to change together without one variable causing the other variable to change. (In fact, there is an entire website and book, Spurious Correlations , devoted to documenting interesting and unexpected correlations between variables.)
The causation describes a cause-and-effect relationship where one variable directly causes the other to change in a particular way. Although this is an intuitive definition, proving causation requires a lot of particular circumstances to be met.
To argue for causation between variables, in general, we must run a randomized controlled experiment. The following are some key components of a proper randomized controlled experiment:
- We must control for every factor in the experiment.
- We must have a control group under certain conditions.
- We must have at least one treatment group under certain conditions.
- The difference(s) between the control and treatment groups must be observable and measurable.
Here are two samples:
Claims we can make (correlation)
- When the runner drinks more water the day before a race, they tend to have more stamina.
- When the runner doesn’t run long distances the week before a race, they tend to feel better on race day.
Claims we cannot make (causation)
- Drinking more water the day before a race causes the runner to run faster.
- Not running long distances the week before a race causes the runner to run faster.
Claims we can make (correlation)
- When I use fresher ingredients, the final dish tends to taste better.
- When I am very hungry, the final dish tends to taste better.
Claims we cannot make (causation)
- Using fresher ingredients makes the dish taste better.
- Being hungrier makes the dish taste better.
Disclaimer: Like most of my posts, this content is intended solely for educational purposes and was created primarily for my personal reference. At times, I may rephrase original texts, and in some cases, I include materials such as graphs, equations, and datasets directly from their original sources.
I typically reference a variety of sources and update my posts whenever new or related information becomes available. For this particular post, the primary source was Google Advanced Data Analytics Professional Certificate.