Proof of the Central Limit Theorem
I've always found the CLT to be a bit of a mind bender. For example the outcome of a dice roll is a discrete uniform RV from $1$ to $6$. This is very different from a Normal RV which is continuous and can take values from $-\infty$ to $\infty$. If you look at the pmf (six bars of equal height) and pdf (bell curve) it's hard to see how they could be related.
But if you roll a bunch of dice and take the sum, the distribution of the sum will be approximately Normal!
The proof below relies on moment generating functions and especially the fact that the MGF uniquely determines the distribution (which apparently is a difficult result to prove and I have never done it).
Claim
Let $X_1,...,X_n$ be iid with mean $\mu$ and variance $\sigma^2$, then the distribution of
$$\frac{X_1+...+X_n - n\mu}{\sigma \sqrt{n}}$$tends to the standard normal as $n \rightarrow \infty$. That is as $n \rightarrow \infty$,
$$P \left( \frac{X_1+...+X_n - n\mu}{\sigma \sqrt{n}} \le a \right) \rightarrow \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{a} e^{-x^2/2} \, dx = F_Z(a) = \Phi(a)$$Proof
1. First we will prove the CLT for the easier case when the $X_i$'s have $\mu = 0$ and $\sigma^2 = 1$ and this will also prove the general case. To see this, let's say the $X_i$'s have $\mu$ and $\sigma^2$ not necessarily $0$ and $1$.
Note that $Y_i = (X_i - \mu)/\sigma$ has mean $0$ and variance $1$ which you can see after quick calculations. Then,
$$\frac{Y_1 + ... + Y_n - n\mu_Y}{\sigma_Y\sqrt{n}} = \frac{Y_1 + ... + Y_n - n \cdot 0}{1 \cdot \sqrt{n}} = \frac{Y_1 + ... + Y_n}{\sqrt{n}} = \frac{\frac{X_1 - \mu_X}{\sigma_X}+...+ \frac{X_n - \mu_X}{\sigma_X}}{\sqrt{n}} = \frac{X_1 +...+ X_n - n\mu_X}{\sigma_X\sqrt{n}}$$Since the following equality is true,
$$\frac{Y_1 + ... + Y_n}{\sqrt{n}} = \frac{X_1 +...+ X_n - n\mu_X}{\sigma_X\sqrt{n}}$$if you prove it for the LHS you prove it for the general case of the RHS since you can always standardize a RV.
2. So we are trying to prove that $(X_1 + ... + X_n)/\sqrt{n}$ becomes a standard normal RV where $\mu_X = 0$ and $\sigma_X^2 = 1$.
3. We will now assume the MGF of $X_i$, $M_{X_i}(t)$ exists and is finite. This will make life easier. Now the MGF of $X_i/\sqrt{n}$ is
$$M_{\frac{X_i}{\sqrt{n}}}(t) = E \left[ exp \left\{ t \frac{X_i}{\sqrt{n}} \right\} \right] = M_{X_i} \left( \frac{t}{\sqrt{n}} \right)$$and since each of the $X_i$'s are iid, the MGF of $(X_1 + ... + X_n) / \sqrt{n}$ is
$$M_{(X_1 + ... + X_n) / \sqrt{n}}(t) = E \left[ exp \left\{ t \left( \frac{X_1}{\sqrt{n}} +...+ \frac{X_n}{\sqrt{n}} \right) \right\} \right] = E \left[ exp \left\{ t \frac{X_1}{\sqrt{n}} \right\} \cdot ... \cdot exp \left\{ t \frac{X_n}{\sqrt{n}} \right\} \right]$$ $$= E \left[ exp \left\{ t \frac{X_1}{\sqrt{n}} \right\} \right] \cdot ... \cdot E \left[ exp \left\{ t \frac{X_n}{\sqrt{n}} \right\} \right] = \left( E \left[ exp \left\{ t \frac{X_i}{\sqrt{n}} \right\} \right] \right)^n = \left( M_{\frac{X_i}{\sqrt{n}}}(t) \right)^n = \left( M_{X_i}\left(\frac{t}{\sqrt{n}} \right) \right)^n$$which follows because the MGF of a sum of independent RVs equals the products of the individual MGFs. That is,
$$M_{X+Y}(t) = E[e^{t(X+Y)}] = E[e^{tX}e^{tY}] = E[e^{tX}]E[e^{tY}] = M_X(t) M_Y(t)$$4. Now the key is to show that the MGF of the RV $(X_1 + ... + X_n) / \sqrt{n}$ is equal to the MGF of the standard normal RV. The MGF of a standard normal is $M_Z(t) = e^{t^2/2}$. So if we show that
$$\left( M_{\frac{X_i}{\sqrt{n}}}(t) \right)^n = \left( M_{X_i} \left( \frac{t}{\sqrt{n}} \right) \right)^n \rightarrow e^{t^2/2} \quad \text{as} \quad n \rightarrow \infty$$then we are done. Note that as $n \rightarrow \infty$, $M_{X_i} \left( \frac{t}{\sqrt{n}} \right)$ becomes $M_{X_i}(0)$ and you get $1^\infty$ which is an indeterminate. So to show the above is true as $n \rightarrow \infty$ we will take the log of both sides so we can use L'Hopital's rule. The log of the MGF is called the cumulant and we will use the notation $L_{X_i}(t) = L(t) = \log M_{X_i}(t)= \log M(t)$.
5. Before using L'Hopital's rule we need to calculate $L(0)$, $L'(0)$ and $L''(0)$. We see that
$$L(0) = L_{X_i}(0) = \log \left( M_{X_i}(0) \right) = \log \left( E \left[ e^{ 0 \cdot X_i } \right] \right) = \log(E[1]) = \log(1) = 0$$6. Now let's figure out $L'(0)$ and $L''(0)$ and recall that $M_{X_i}'(0) = E[X_i] = 0$ and $M_{X_i}''(0) = E[X_i^2] = \sigma^2 + 0 = 1$
$$L'(t) = \left( \log M(t) \right)' = \frac{1}{M(t)} M'(t)$$ $$L'(0) = \frac{M'(0)}{M(0)} = \frac{\mu}{E[e^0]} = \mu = 0$$ $$L''(t) = (L'(t))' = \left( \frac{M'(t)}{M(t)} \right)' = \frac{M''(t)M(t) - (M'(t))^2}{(M(t))^2}$$ $$L''(0) = \frac{M''(0)M(0) - (M'(0))^2}{(M(0))^2} = E[X]^2-\mu^2 = \sigma^2 = 1$$7. The ultimate goal is to show that
$$\left( M_{\frac{X_i}{\sqrt{n}}}(t) \right)^n \rightarrow e^{t^2/2} \quad \text{as} \quad n \rightarrow \infty$$Taking the log of both sides of the above expression,
$$n\log \left( M_{\frac{X_i}{\sqrt{n}}}(t) \right) = n \log \left( M \left( \frac{t}{\sqrt{n}} \right) \right) = n L \left( \frac{t}{\sqrt{n}} \right) = \frac{ L \left( \frac{t}{\sqrt{n}} \right) }{n^{-1}} \rightarrow \frac{t^2}{2} \quad \text{as} \quad n \rightarrow \infty$$8. To show this we will use L'Hopital's rule. L'Hopital's rule is very helpful when evaluating limits of indeterminate forms such as $\frac{0}{0}$ and $ \frac{\infty}{\infty}$. For example, since $L(0) = 0$ it follows that
$$\lim_{n \rightarrow \infty} \frac{ L \left( \frac{t}{\sqrt{n}} \right) }{n^{-1}} = \frac{0}{0}$$and the RHS is an indeterminate. This is perfect for L'Hopital's rule.
L'Hopital's rule: For functions $f$ and $g$ which are differentiable on an open interval $I$ except possibly at point $c$ contained in $I$, if $\lim_{x \rightarrow c} f(x) = \lim_{x \rightarrow c} g(x) = 0$ or $\pm \infty$, $g'(x) \neq 0$ for all $x$ in $I$ with $x \neq c$, and $\lim_{x \rightarrow c} \frac{f'(x)}{g'(x)}$ exists, then
$$\lim_{x \rightarrow c} \frac{f(x)}{g(x)} = \lim_{x \rightarrow c} \frac{f'(x)}{g'(x)} $$Back to the proof, we will use L'Hopital's rule and take the derivative of top and bottom with respect to $n$.
$$\lim_{n \rightarrow \infty} \frac{ L \left( \frac{t}{\sqrt{n}} \right) }{n^{-1}} = \lim_{n \rightarrow \infty} \frac{ L' \left( t n^{-\frac{1}{2}} \right) \frac{-t}{2}n^{-\frac{3}{2}} }{-n^{-2}} = \lim_{n \rightarrow \infty} \frac{-L' \left( t n^{-\frac{1}{2}} \right) n^{-\frac{3}{2}} t}{-2n^{-2}} = \lim_{n \rightarrow \infty} \frac{L' \left( t n^{-\frac{1}{2}} \right)t}{2n^{-1/2}} = \frac{0}{0}$$where we use $L'(0) = 0$. Then we will use L'Hopital's rule again,
$$= \lim_{n \rightarrow \infty} \frac{L'' \left( t n^{-\frac{1}{2}} \right) \frac{-t^2}{2} n^{-3/2}}{-n^{-3/2}} = \lim_{n \rightarrow \infty} \frac{-L'' \left( t n^{-\frac{1}{2}} \right) t^2 n^{-3/2}}{-2n^{-3/2}} = \lim_{n \rightarrow \infty} L'' \left( \frac{t}{\sqrt{n}} \right) \frac{t^2}{2} = \frac{t^2}{2}$$Where for the final equality we use the fact that $L''(0) = 1$ Thus we have shown using L'Hopital's rule,
$$\left( M_{\frac{X_i}{\sqrt{n}}}(t) \right)^n \rightarrow e^{t^2/2} \quad \text{as} \quad n \rightarrow \infty$$We then take as a fact that if the MGF for $(X_1 +...+ X_n)/\sqrt{n} \rightarrow$ the MGF for a standard normal as $n \rightarrow \infty$, then the cdf for $(X_1 +...+ X_n)/\sqrt{n} \rightarrow$ cdf for a standard normal. This completes the proof.
Articles
Personal notes I've written over the years.
- When does the Binomial become approximately Normal
- Gambler's ruin problem
- The t-distribution becomes Normal as n increases
- Marcus Aurelius on death
- Proof of the Central Limit Theorem
- Proof of the Strong Law of Large Numbers
- Deriving Multiple Linear Regression
- Safety stock formula derivation
- Derivation of the Normal Distribution
- Comparing means of Normal populations
- Concentrate like a Roman
- How to read a Regression summary in R
- Notes on Expected Value
- How to read an ANOVA summary in R
- The time I lost faith in Expected Value
- Notes on Weighted Linear Regression
- How information can update Conditional Probability
- Coupon collecting singeltons with equal probability
- Coupon collecting with n pulls and different probabilities
- Coupon collecting with different probabilities
- Coupon collecting with equal probability
- Adding Independent Normals Is Normal
- The value of fame during and after life
- Notes on the Beta Distribution
- Notes on the Gamma distribution
- Notes on Conditioning
- Notes on Independence
- A part of society
- Conditional Expectation and Prediction
- Notes on Covariance
- Deriving Simple Linear Regression
- Nature of the body
- Set Theory Basics
- Polynomial Regression
- The Negative Hyper Geometric RV
- Notes on the MVN
- Deriving the Cauchy density function
- Exponential and Geometric relationship
- Joint Distribution of Functions of RVs
- Order Statistics
- The Sample Mean and Sample Variance
- Probability that one RV is greater than another
- St Petersburg Paradox
- Drunk guy by a cliff
- The things that happen to us