Multiple linear regression, also known as multiple regression, is a technique that estimates the linear relationship between one continuous dependent variable and two or more independent variables.
Full multiple regression equation is:
y = β0 + β1 · X1 + β2 · X2 + … + βn · Xn
We will use two concepts further:
- One hot encoding allows us to use categorical independent variables in our multiple linear regression.
For example, we might have print advertisements and digital advertisements. This is a categorical variable and one hot encoding will allow us to incorporate it into our regression model.
- If we want to account for how two independent variables affect the y variable together, we can use something called an interaction term.
Represent categorical variables
There are two main ways to handle categorical data, one hot encoding and label encoding.
One hot encoding is a data transformation technique that turns one categorical variable into several binary variables.
Sample 1: Let’s take the example of an advertisement having a call to action or not:
y = β0 + β1 · X1 + β2 · X2 + βaction · Xaction
If an advertisement has a call to action then X_action equals 1.
If an advertisement does not have a call to action then X_action equals 0.
In our sample X_1 measures the number of people in the advertisement and X_2 measures the length of the advertisement. Without changing the X1 and X2, aka keeping them the same, we can check on with CTA and without.
Sample 2: What if we’re interested in which streaming platform that ad is on?
Let’s say the company is running ads on three services, A, B, and C. Let’s also assume that ads can only run on one platform at a time. Now we have a categorical X variable that has three possibilities.
To represent two possibilities (eg. has a call to action versus does not have a call to action), we use one binary variable. In order to represent three possibilities, we need two binary variables.
# of categories | # of binary variables |
2 | 1 |
3 | 2 |
The below table explains why we need two variables for three possibilities.
Xservice A | Xservice B | Service A | Service B | Service C |
1 | 0 | ✓ | x | x |
0 | 1 | x | ✓ | x |
0 | 0 | x | x | ✓ |
Now our equation would be:
y = β0 + β1 · X1 + β2 · X2 + βservice A · Xservice A + βservice B · Xservice B
Notice that there is no variable X_service C, because it would not provide us with more information.
But the interpretation is a little bit different:
We can think of service C as the default streaming service. Beta service A is the difference in website clicks for two advertisements that are the exact same, except one ad is played on service C and one is played on service A.
Similarly, Beta_service B is the difference in website clicks for two advertisements that are the exact same, except one ad is played on service C and one is played on service B.
Interpret multiple regression coefficients
Let’s give more samples for interpreting multiple regression coefficients. Imagine we have a simple linear regression example as:
(ice-coffee) sales = -44 + 2.2 · temperature
- We can say that a one degree increase in temperature is associated with 2.2 more ice coffee sales.
But there may be other factors involved. Imagine there is an ad near the store.
sales = β0 + βtemperature · Xtemperature + βad · Xad
That ad variable is a binary variable, so there are two possible scenarios: There is an ad nearby and there isn’t.
If there is an ad posted nearby, then X_ad equals 1, and the temperature will take on some value. Let’s say it’s 75 degrees Fahrenheit. The equation becomes:
sales = β0 + βtemperature · 75 + βad · 1
- We estimate that the presence of the ad is associated with some increase in sales.
Now let’s take the same temperature and say that there isn’t an ad nearby. The equation then becomes:
sales = β0 + βtemperature · 75 + βad · 0
- When there isn’t an ad then the sales are only a function of the temperature.
Or we can introduce the question of proximity to public transportation. The variable transportation will measure how many kilometers away a given coffee shop is to a bus, train or subway stop. The equation becomes:
sales = β0 + βtemperature · Xtemperature + βtransportation · Xtransportation
Both temperature and distance to transportation are continuous variables.
- For every 1 degree increase in the temperature, while holding distance to public transportation constant, we expect ice coffee sales to increase by Beta_temperature.
- For every 1 kilometer further a store is from public transportation, while holding temperature constant, we expect iced coffee sales to decrease by Beta_transportation.
So we have to hold the other variable constant when interpreting the results.
Interaction Term
But there are cases when we might expect two variables to interact. For example in the case of temperature and distance to transportation, we might expect that if it’s cooler, distance to transportation might have a different effect. If we want to account for how two variable values affect each other, we include an interaction term.
An interaction term represents how the relationship between two independent variables is associated with changes in the mean of the dependent variable. Typically we represent an interaction term as the product of the two independent variables in question.
Going back to our example of coffee shops sales: If we suspect that distance from transportation might be associated with different changes in coffee shops sales based on the temperature, we can include the interaction term –temperature times transportation-.
sales = β0 + βtemperature · Xtemperature + βtransportation · Xtransportation
+ βinteraction · (temperature · transportation)
Disclaimer: Like most of my posts, this content is intended solely for educational purposes and was created primarily for my personal reference. At times, I may rephrase original texts, and in some cases, I include materials such as graphs, equations, and datasets directly from their original sources.
I typically reference a variety of sources and update my posts whenever new or related information becomes available. For this particular post, the primary source was Google Advanced Data Analytics Professional Certificate.