Introduction to Hypothesis Testing

Hypothesis testing is a statistical procedure that uses sample data to evaluate an assumption about a population parameter. For example, hypothesis tests are often used in clinical trials to determine whether a new medicine leads to better outcomes in patients.

Statistical significance is the claim that the results of a test or experiment are not explainable by chance alone.

Steps for performing a hypothesis test:

  • State the null hypothesis and the alternative hypothesis. 
  • Choose a significance level. 
  • Find the p-value.
  • Reject or fail to reject the null hypothesis.

The null hypothesis is a statement that is assumed to be true unless there is convincing evidence to the contrary. The null hypothesis typically assumes that our observed data occurs by chance.

The alternative hypothesis is a statement that contradicts the null hypothesis, and is accepted as true only if there’s convincing evidence for it. The alternative hypothesis typically assumes that our observed data does not occur by chance.

Note: The null and alternative hypotheses are always claims about the population. That’s because the aim of hypothesis testing is to make inferences about a population based on a sample.

Significance level is the probability of rejecting the null hypothesis when it is true.

Typically, data professionals set the significance level at five percent. But we should note that this is a choice based on tradition and statistical research in education. We can adjust the significance level to meet the requirements of our analysis.

P-value refers to the probability of observing results as or more extreme than those observed when the null hypothesis is true. A lower p-value means there is stronger evidence for the alternative hypothesis.

Finally, we have to decide whether to reject or fail to reject the null hypothesis. Statisticians always say, “fail to reject” rather than “accept”. This is because hypothesis tests are based on probability, not certainty. And acceptance implies certainty.

There are two main rules for drawing a conclusion about a hypothesis test: 

  • If p-value < significance level: reject the null hypothesis. 
  • If p-value > significance level: fail to reject the null hypothesis.

A statistically significant result cannot prove with 100 percent certainty that our hypothesis is correct. Because hypothesis testing is based on probability, there’s always a chance of drawing the wrong conclusion about the null hypothesis.

Types of Errors

In hypothesis testing, there are two types of errors we can make when drawing a conclusion:

  • Type I error
  • Type II error

A Type I error, also known as a false positive, occurs when we reject a null hypothesis that is actually true. In other words, we conclude that our result is statistically significant when in fact it occurred by chance.

The probability of making a Type I error is called alpha (α). Your significance level, or alpha (α), represents the probability of making a Type I error. A significance level of five percent means we are willing to accept a five percent chance we are wrong when we reject the null hypothesis. 

To reduce our chance of making a Type I error, we choose a lower significance level. However, choosing a lower significance level means we’re more likely to make a Type II error or a false negative.

A Type II error, also known as a false negative, occurs when we fail to reject a null hypothesis, which is actually false. In other words, we conclude our result occurred by chance when it’s in fact statistically significant.

The probability of making a Type II error is called beta (β), and beta is related to the power of a hypothesis test (power = 1- β). Power refers to the likelihood that a test can correctly detect a real effect when there is one.

We can reduce our risk of making a Type II error by ensuring our test has enough power. In data work, power is usually set at 0.80 or 80%. The higher the statistical power, the lower the probability of making a Type II error. To increase power, we can increase our sample size or our significance level.

Null Hypothesis is TRUENull Hypothesis is FALSE
Reject null hypothesisType I Error
(False positive)
Correct Outcome
(True positive)
Fail to reject null hypothesisCorrect Outcome
(True negative)
Type II Error
(False negative)

A real-life sample: Imagine we’re testing the strength of the fabric for a parachute manufacturer. We want to be very confident that the material we’re using is strong enough for a functional parachute. A Type I error or a false positive means we falsely identify the material as strong enough. Obviously, in this case, we want to minimize the risk of a Type I error. To do so, we can choose a significance level of one percent instead of the standard five percent.

Ultimately, it’s our responsibility as a data professional to determine how much evidence we need to decide that a result is statistically significant.

Potential risks of Type I and Type II errors 

It’s worth mentioning that it’s important to be aware of the potential risks involved in making the two types or errors.

A Type I error means rejecting a null hypothesis which is actually true. In general, making a Type I error often leads to implementing changes that are unnecessary and ineffective, and which waste valuable time and resources

For example, if we make a Type I error in our clinical trial, the new medicine will be considered effective even though it’s actually ineffective. Based on this incorrect conclusion, an ineffective medication may be prescribed to a large number of people. Plus, other treatment options may be rejected in favor of the new medicine. 

A Type II error means failing to reject a null hypothesis which is actually false. In general, making a Type II error may result in missed opportunities for positive change and innovation. A lack of innovation can be costly for people and organizations. 

For example, if we make a Type II error in our clinical trial, the new medicine will be considered ineffective even though it’s actually effective. This means that a useful medication may not reach a large number of people who could benefit from it.

As a summary, here are some important differences between the null and alternative hypotheses:

Null hypothesis (H0)Alternative hypothesis (Ha)
Claims There is no effect in the population. There is an effect in the population.
Language No effect
No difference
No relationship
No change
An effect
A difference
A relationship
A change
Symbols Equality (=, ≤, ≥)Inequality (≠, <, >)

Disclaimer: Like most of my posts, this content is intended solely for educational purposes and was created primarily for my personal reference. At times, I may rephrase original texts, and in some cases, I include materials such as graphs, equations, and datasets directly from their original sources.

I typically reference a variety of sources and update my posts whenever new or related information becomes available. For this particular post, the primary source was Google Advanced Data Analytics Professional Certificate.