As we saw in the previous post, for a population, the variance is calculated as
σ² = ( Σ (x-μ)² ) / N.
Another equivalent formula is σ² = ( (Σ x²) / N ) – μ². If we need to calculate variance by hand, this alternate formula is easier to work with.
Let’s get a little bit better intuition of just manipulating sigma notation.
The formula for population variance is:
[math]\sigma^{2}=\frac{\sum_{i=1}^{n}(x_{i}-\mu)^{2}}{N}[/math]
Let’s focus on the numerator part and multiply out the squared term and see where it takes us.
That part is the same thing as:
[math]\sum_{i=1}^{n}(x_{i}^{2}-2x_{i}\mu+\mu^{2})[/math]
[math]=\sum_{i=1}^{N}x_{i}^{2}-2\mu\sum_{i=1}^{N}x_{i}+\mu^{2}\sum_{i=1}^{N}1[/math]
Before we bring the denominator back, let’s focus on the last part first.
[math]\sum_{i=1}^{n}x[/math] This means, whatever you have there, where ‘x’ is, iterate it N times.
So in our case, this part is equal to N, since we have 1 over there.
Now let’s put the denominator back.
[math]=\frac{\sum_{i=1}^{N}x_{i}^{2}}{N}-\frac{2\mu\sum_{i=1}^{N}x_{i}}{N}+\frac{\mu^{2}N}{N}[/math]
We remember also that the [math]\frac{\sum_{i=1}^{n}x_{i}}{N}[/math] part is the mean of a population that equals to μ.
[math]=\frac{\sum_{i=1}^{N}x_{i}^{2}}{N}-2\mu^{2}+\mu^{2}=\frac{\sum_{i=1}^{N}x_{i}^{2}}{N}-\mu^{2}[/math]
Now we have reached a neat way of writing the variance.
This means that we can essentially take the average of the squares of all the numbers in a population and then subtract them from the mean squared.
We can go a little further and convert that mu squared.
[math]\frac{\sum_{i=1}^{N}x_{i}^{2}}{N}-\frac{(\sum_{i=1}^{N}x_{i})^{2}}{N^{2}}[/math]
With this last one, we don’t even have to calculate the mean ahead of time.
Disclaimer: Like most of my posts, this content is intended solely for educational purposes and was created primarily for my personal reference. At times, I may rephrase original texts, and in some cases, I include materials such as graphs, equations, and datasets directly from their original sources.I typically reference a variety of sources and update my posts whenever new or related information becomes available. For this particular post, the primary source was Khan Academy’s Statistics and Probability series.