Statistical power | Alper Kokcu

Statistical power is the probability of getting meaningful results from a test.

For data analysts, our projects might begin with the test or study. Hypothesis testing is a way to see if a survey or experiment has meaningful results. Here’s an example.

Let’s say we work for a restaurant chain that’s planning a marketing campaign for their new milkshakes. We need to test the ad on a group of customers before turning it into a nationwide ad campaign.

In the test, we want to check whether customers like or dislike the campaign. We also want to rule out any factors outside of the ad that might lead them to say they don’t like it.

Using all our customers would be too time consuming and expensive, so we’ll need to figure out how many customers we’ll need to show that the ad is effective. 50 probably wouldn’t be enough.

Even if we randomly chose 50 customers, we might end up with customers who don’t like milkshakes at all, and if that happens, we won’t be able to measure the effectiveness of our ad in getting more milkshake orders since no one in the sample size would order them.

That’s why we need a larger sample size, so we can make sure we get a good number of all types of people for our test. Usually, the larger the sample size, the greater the chance we’ll have statistically significant results with our test, and that’s statistical power.

In this case, using as many customers as possible will show the actual differences between the groups who like or dislike the ad versus people whose decision wasn’t based on the ad at all.

There are ways to accurately calculate statistical power, but we won’t go into them here. We might need to calculate it on our own as a data analyst. For now, we should know that statistical power is usually shown as a value out of 1, so if our statistical power is 0.6, that’s the same thing as saying 60%.

In the milkshake ad test, if we found a statistical power of 60%, that means there’s a 60% chance of us getting a statistically significant result on the ad’s effectiveness. “Statistically significant” is a term that is used in statistics. In basic terms, if a test is statistically significant, it means the results of the test are real and not an error caused by random chance.

So there’s a 60% chance that the results of the milkshake ad test are reliable and real and a 40% chance that the result of the test is wrong. Usually, we need a statistical power of at least 0.8 or 80% to consider our results statistically significant.

Another sample

Let’s check out one more scenario.

Imagine we work for a restaurant chain that wants to launch a brand new birthday-cake-flavored milkshake. This milkshake will be more expensive to produce than our other milkshakes. Our company hopes that the buzz around the new flavor will bring in more customers and money to offset this cost. They want to test this out in a few restaurant locations first, so let’s figure out how many locations we’d have to use to be confident in our results.

First, we’d have to think about what might prevent us from getting statistically significant results.

Are there restaurants running any other promotions that might bring in new customers?
Do some restaurants have customers that always buy the newest item no matter what it is?
Do some locations have construction that recently started that would prevent customers from even going to the restaurant?

To get a higher statistical power, we’d have to consider all of these factors before we decide how many locations to include in our sample size for our study.

We want to make sure any effect is most likely due to the new milkshake flavor, not another factor. The measurable effects would be an increase in sales or the number of customers at locations in our sample size.

Determine the best sample size

Sometimes a store hands out samples for their customers to try them out. Those small samples are a smart way for businesses to learn more about their products from customers without having to give everyone a free sample. A lot of organizations use sample size in a similar way.

They take one part of something larger, in this case, a sample of a population. Sometimes, they’ll perform complex tests on their data to see if it meets their business objectives. (We won’t go into all the calculations needed to do this effectively. Instead, we’ll focus on a big-picture look at the process and what it involves.)

Sample size

As a quick reminder, sample size is a part of a population that is representative of the population. For businesses, it’s a very important tool. It can be both expensive and time consuming to analyze an entire population of data, so using sample size usually makes the most sense and can still lead to valid and useful findings.

There are handy calculators online that can help us find sample size. We need to input the confidence level, population size, and margin of error.

Confidence level

The confidence level is the probability that our sample accurately reflects the greater population. We can think of it the same way as confidence in anything else. It’s how strongly we feel that we can rely on something or someone.

Having a 99% confidence level is ideal, but most industries hope for at least a 90% or 95% confidence level. Industries like pharmaceuticals usually want a confidence level that’s as high as possible when they are using a sample size. This makes sense because they’re testing medicines and need to be sure they work and are safe for everyone to use. For other studies, organizations might just need to know that the test or survey results have them heading in the right direction. For example, if a paint company is testing out new colors, a lower confidence level is OK.

Margin of error

We also want to consider the margin of error for our study. I’ll write more about this below, but it basically tells us how close our sample size results are to what our results would be if we use the entire population that our sample size represents. Think of it like this:

Let’s say that the principal of a middle school approaches us with a study about students’ candy preferences. They need to know an appropriate sample size, and they need it now. The school has a student population of 500, and they’re asking for a confidence level of 95% and a margin of error of 5%.

We’ve set up a calculator in a spreadsheet, and just like any ‘sample size calculator’, this spreadsheet calculator doesn’t show any of the more complex calculations for figuring out sample size.

So all we need to do is input the numbers and when we type 500 for a population size and 95 for our confidence level percentage and 5 for our margin of error percentage, the result is about 218. That means for this study, an appropriate sample size would be 218. So if we surveyed 218 students and found that 55% of them preferred chocolate, then we can be pretty confident that would be true of all 500 students.

218 is the minimum number of people we need to survey based on our criteria of a 95% confidence level and a 5% margin of error. And, the confidence level and margin of error don’t have to add up to 100%. They’re independent of each other.

Evaluate the reliability of the data

As a data analyst, it’s important for us to figure out sample size and variables like confidence level and margin of error before running any kind of test or survey. It’s the best way to make sure our results are objective, and it gives us a better chance of getting statistically significant results.

But if we already know the sample size, like when we’re given survey results to analyze, we can calculate the margin of error ourselves. Then you’ll have a better idea of how much of a difference there is between our sample and our population. Let’s start with a more complete definition:

Margin of error

Margin of error is the maximum amount that the sample results are expected to differ from those of the actual population.

It would be great to survey or test an entire population, but it’s usually impossible or impractical to do this. So instead, we take a sample of the larger population. Based on the sample size, the resulting margin of error will tell us how different the results might be compared to the results if we had surveyed the entire population.

Margin of error helps us understand how reliable the data from our hypothesis testing is. The closer to 0 the margin of error, the closer our results from our sample would match results from the overall population.

A sample

For example, let’s say we completed a nationwide survey using a sample of the population. We asked people who work five-day work weeks whether they like the idea of a four-day work week. So our survey tells us that 60% prefer a four-day work week.

The margin of error was 10%, which tells us that between 50% and 70% like the idea. So if we were to survey all five-day workers nationwide, between 50% and 70% would agree with our results. Keep in mind our range is between 50% and 70%. That’s because the margin of error is counted in both directions from the survey results of 60%.

If we set up a 95% confidence level for our survey, there will be a 95% chance that the entire population’s responses will fall between 50% and 70% saying, yes, they want a four-day work week.

Since our margin of error overlaps with that 50% mark, we can’t say for sure that the public likes the idea of a four-day work week. In that case, we’d have to say our survey was inconclusive.

Now, if we wanted a lower margin of error, say 5% with a range between 55% and 65%, we could increase the sample size. But if we’ve already been given the sample size, we can calculate the margin of error ourselves. Then we can decide ourselves how much of a chance our results have of being statistically significant based on our margin of error.

In general, the more people we include in our survey, the more likely our sample is representative of the entire population. Decreasing the confidence level would also have the same effect, but that would also make it less likely that our survey is accurate.

Calculating margin of error

So to calculate margin of error, we need three things: population size, sample size, and confidence level. We’ll use a spreadsheet, a “margin of error calculator”, just like we did when we calculated sample size.

Let’s say we’re running a study on the effectiveness of a new drug. You have a sample size of 500 participants whose condition affects 1% of the world’s population. That’s about 80 million people, which is the population for our study. Since it’s a drug study, we need to have a confidence level of 99%. We also need a low margin of error.

When we put the numbers for population and confidence level and sample size in the appropriate spreadsheet cells, our result is a margin of error of close to 6%.

When the drug study is complete, we’d apply the margin of error to our results to determine how reliable our results might be. Calculators like this one in the spreadsheet are just one of the many tools we can use to ensure data integrity, and it’s also good to remember that checking for data integrity and aligning the data with your objectives will put us in good shape to complete our analysis.

Disclaimer: Like most of my posts, this content is intended solely for educational purposes and was created primarily for my personal reference. At times, I may rephrase original texts, and in some cases, I include materials such as graphs, equations, and datasets directly from their original sources.

I typically reference a variety of sources and update my posts whenever new or related information becomes available. For this particular post, the primary source was Google Data Analytics Professional Certificate.