Central Limit Theorem & Law of Large Numbers
Learn through Comic!
The central limit theorem states that if you have a population with mean μ and standard deviation σ and take sufficiently large random samples from the population with replacement, then the distribution of the sample means will be approximately normally distributed. This will hold true regardless of whether the source population is normal or skewed, provided the sample size is sufficiently large (usually n > 30).
Let’s first see the pre-requisites needed to understand the concept of the Central Limit Theorem(CLT)
- Expected value: In simple terms, the expected value is the mean or the average value of the given data. It is the sum of all the data points divided by the total number of data points
Expected height : (150cm + 160cm + 170cm + 180cm + 180cm)/5 = 168cm
- Variance: It tells how different the data points are.
There is a high variance in color between Row1 and Row2 but the variance between Di’s of each row is small.
Mathematically variance is defined as the average squared difference between each data point and the mean value of the data
- Normal Distribution: It is also known as the Gaussian distribution or the bell-shaped distribution due to its shape. It has wide applications in nature and due to its simple characteristics, it is one of the most important distributions in statistics. The mean and variance of the distribution is the same as stated above defines the Normal Distribution
With these pre-requisites let’s understand the concept of CLT
Consider the game of tossing a coin. Every time the coin is tossed, make a note of it. Say the coin is tossed 100 times.
The result of each toss is noted as follows :
S = {H , T, T, T, H, H, T, H, H, T, T, H, H, T, T, T, H, T, H, H, T, H, T, H, H, T, H, H, H, T,T, T, H, T, T, T, H, H, T, H, T, H, H, T, T, H, H, T, T, T, H, T, H, T, H, T, H, H, T, T,H, H, T, T, H, H, T, H, T, T, H, H, T, H, H, T, T, H, T, H, H, T, H, T, T, T, H, T, H, T,T, H, T, H, H, T, T, H, T, H}
Let’s define a random variable as follows
X: Number of heads (H)
Draw random samples from S. Let the sample size (n) = 30 and S1, S2, S3, be three random samples drawn from S
S1: {H,H,T,H,T,T,T,H,T,H,T,H,H,T,H,H,T,T,H,T,T, T, H, H, T,T,T,H,T,H}
S2: { H,H,T,H,T,H,H,T,H,H,T,H,H,T,H,H,T,T,H,T,T,H,T,H,H,T,T,H,T,H }
S3: {T,T,H,T,T,T,H,H,T,H,T,H,H,T,T,H,H,T,T,T,T,H,H,T,H,H,T,T,H,T}
Compute the average number of times X is present in each of the samples
- S1: 14/30 = 0.46
- S2: 17/30 = 0.56
- S3: 13/30 = 0.43
Let’s plot these values!
Hurrah! The distribution of the sample means approximately follows a normal distribution. 😄
To understand the second part of the definition of Central Limit Theorem (CLT), Let’s first understand the concept of Law of Large Numbers (LLN)
In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials should be close to the expected value and will tend to become closer to the expected value as more trials are performed
The average number of heads (H) : (0.46 + 0.56 + 0.43)/3 = 0.483
Proof
Let’s calculate the average number of times H appears in S.
P(X = H) = 49/100 = 0.49
Result
The average number of heads calculated with CLT and LLN is very close to the true value
0.483 ≈ 0.49
A similar calculation can be done for variance!
Reference