Proof of the Strong Law of Large Numbers
The Weak Law of Large Numbers (WLLN) is much easier to prove. You can get it using Markov's inequality.
Just what is the difference between the WLLN and the SLLN? When I first learned about this stuff I was like "for the love of god just have one version and go with the strong version since strong is stronger than weak." As you can see I have the mind of a mathematical genius.
This is the best explanation I've ever heard and I encourage you to check it out.
What is clear is that the LLN is a hugely important result and it enables us to uncover the nature of things through experimentation.
Claim
Let $X_1,..., X_n$ be iid each having finite mean $\mu$, then $$P \left( \lim_{n \rightarrow \infty} \frac{X_1 + ... + X_n}{n} = \mu \right) = 1$$Proof
1. We begin with an extra assumption that isn't necessary to prove SLLN, but to make life easier we will assume that the $X_i$'s have a finite fourth moment. That is $E[X_i^4] = K < \infty$
2. We will prove the SLLN assuming that $E[X_i] = \mu = 0$. If $\mu_X \neq 0$ then consider $Y_i$ with $Y_i = X_i - \mu_X$ which has $E[Y_i] = \mu_Y = 0$ and if you prove that
$$P \left( \lim_{n \rightarrow \infty}\frac{Y_1+...+Y_n}{n} = \mu_Y = 0 \right) = 1$$Then replacing the $Y_i$'s with $X_i - \mu_X$ we have
$$P \left( \lim_{n \rightarrow \infty}\frac{X_1+...+X_n - n\mu_X}{n} = \mu_Y = 0 \right) = P \left( \lim_{n \rightarrow \infty}\frac{X_1+...+X_n}{n} = \mu_X \right) = 1$$3. Let $S_n = X_1 + ... + X_n$ with $E[X_i] = 0$ and consider
$$E[S_n^4] = E[(X_1 + ... + X_n)( X_1 + ... + X_n)( X_1 + ... + X_n)( X_1 + ... + X_n)]$$The expansion of this beast results in terms of the form
$$X_i^4, \quad X_i^3X_j, \quad X_i^2X_j^2, \quad X_i^2X_jX_k, \quad \text{and} \quad X_iX_jX_kX_l$$where $i,j,k,l$ are all different. By LOE and since all the $X_i$ have mean $0$, it follows by independence that
$$ \begin{aligned} E[X_i^3 X_j] & = E[X_i^3]E[X_j] = 0 \\ E[X_i^2X_jX_k] & = E[X_i^2]E[X_j]E[X_k] = 0 \\ E[X_iX_jX_kX_l] & = 0 \\ \end{aligned} $$4. Now the remaining terms of $E[S_n^4]$ are $E[X_i^4]$ and $E[X_i^2X_j^2]$. Now there are ${n \choose 2}$ different pairs for $i$ and $j$. For a given pair of $i$ and $j$, the term $E[X_i^2X_j^2]$ will appear ${4 \choose 2} = 6$ times. To see this first look at the expansion for $E[S_n^2S_n^2]$,
$$ E[S_n^4] = E[S_n^2S_n^2] = \left[ \begin{pmatrix} X_1X_1 + X_1X_2 + ... + X_1X_n + \\ X_2X_1 + X_2X_2 + ... + X_2X_n + \\ \vdots \\ X_nX_1 + X_nX_2 + ... + X_nX_n \\ \end{pmatrix} \begin{pmatrix} X_1X_1 + X_1X_2 + ... + X_1X_n + \\ X_2X_1 + X_2X_2 + ... + X_2X_n + \\ \vdots \\ X_nX_1 + X_nX_2 + ... + X_nX_n \\ \end{pmatrix} \right] $$and note that for $i=1$ and $j=2$ the term $E[X_1^2X_2^2]$ will show up $6$ times,
$$E[X_1X_1X_2X_2 + X_1X_2X_1X_2 + X_1X_2X_2X_1 + X_2X_1X_1X_2 + X_2X_1X_2X_1 + X_2X_2X_1X_1] = 6E[X_1^2X_2^2]$$5. Since $X_1,...,X_n$ are iid it follows that
$$E[S_n^4] = nE[X_i^4] + 6 {n \choose 2} E[X_i^2X_j^2] = nK + 3n(n-1)E[X_i^2]E[X_j^2]$$where we have used the independence assumption and the assumption that $E[X_i^4] = K < \infty$. Since variance is always greater than or equal to zero,
$$0 \le Var(X_i^2) = E[X_i^4] - (E[X_i^2])^2$$it follows that
$$(E[X_i^2])^2 \le E[X_i^4] = K$$And since $E[X_i^2X_j^2] = E[X_i^2]E[X_j^2] = (E[X_i^2])^2$ we can go back to step 5 and say
$$E[S_n^4] \le nK + 3n(n-1)K$$We then divide both sides by $n^4$,
$$E \left[ \frac{S_n^4}{n^4} \right] \le \frac{K}{n^3} + \frac{3(n-1)K}{n^3}$$and since $n > n-1$, we can further say
$$E \left[ \frac{S_n^4}{n^4} \right] \le \frac{K}{n^3} + \frac{3K}{n^2}$$6. Now examine the following sum and note that each term is finite because of the inequality above,
$$E \left[ \sum_{n=1}^\infty \frac{S_n^4}{n^4}\right] = \sum_{n=1}^\infty E \left[ \frac{S_n^4}{n^4} \right] \le \sum_{n=1}^\infty \frac{K}{n^3} + \frac{3K}{n^2} < \infty$$7. Take as a fact that if there is a positive probability that a sum is infinite, then the expected value of the sum is also infinite. But since the expected value of the sum is finite, it means that, with probability $1$,
$$\sum_{n=1}^\infty \frac{S_n^4}{n^4} < \infty$$8. The convergence of this series implies that the nth term goes to $0$, so we can conclude that with probability $1$,
$$\lim_{n \rightarrow \infty} \frac{S_n^4}{n^4} = 0 = \mu_X$$9. But if $\frac{S_n^4}{n^4} = \left( \frac{S_n}{n} \right)^4$ goes to zero, then so does $\frac{S_n}{n}$. Why? I think because of the well known continuous mapping theorem which notes that continuous functions preserve limits. If $f$ is a continuous function and $a_n \rightarrow a$ then $f(a_n) \rightarrow f(a)$. We note that $f(x) = \sqrt[4]{x}$ is continuous, and since $\frac{S_n^4}{n^4} \rightarrow 0$ then $\sqrt[4]{(S_n/n)^4} = |S_n/n| \rightarrow \sqrt[4]{0} = 0$. And finally because of some obvious thing the absolute value goes away. FML
Therefore this proves that with probability $1$,
$$\frac{S_n}{n} \rightarrow 0 \quad \text{as} \quad n \rightarrow \infty$$which proves that for RVs, say $Y_i$ that are iid with $\mu_Y = 0$
$$P \left( \lim_{n \rightarrow \infty}\frac{Y_1+...+Y_n}{n} = \mu_Y = 0 \right) = 1$$10. If $Y_i = X_i - \mu_X$ with $\mu_X$ not necessarily $0$ then the proven result above also means that
$$P \left( \lim_{n \rightarrow \infty}\frac{X_1+...+X_n - n\mu_X}{n} = 0 \right) = P \left( \lim_{n \rightarrow \infty}\frac{X_1+...+X_n}{n} = \mu_X \right) = 1$$Articles
Personal notes I've written over the years.
- When does the Binomial become approximately Normal
- Gambler's ruin problem
- The t-distribution becomes Normal as n increases
- Marcus Aurelius on death
- Proof of the Central Limit Theorem
- Proof of the Strong Law of Large Numbers
- Deriving Multiple Linear Regression
- Safety stock formula derivation
- Derivation of the Normal Distribution
- Comparing means of Normal populations
- Concentrate like a Roman
- How to read a Regression summary in R
- Notes on Expected Value
- How to read an ANOVA summary in R
- The time I lost faith in Expected Value
- Notes on Weighted Linear Regression
- How information can update Conditional Probability
- Coupon collecting singeltons with equal probability
- Coupon collecting with n pulls and different probabilities
- Coupon collecting with different probabilities
- Coupon collecting with equal probability
- Adding Independent Normals Is Normal
- The value of fame during and after life
- Notes on the Beta Distribution
- Notes on the Gamma distribution
- Notes on Conditioning
- Notes on Independence
- A part of society
- Conditional Expectation and Prediction
- Notes on Covariance
- Deriving Simple Linear Regression
- Nature of the body
- Set Theory Basics
- Polynomial Regression
- The Negative Hyper Geometric RV
- Notes on the MVN
- Deriving the Cauchy density function
- Exponential and Geometric relationship
- Joint Distribution of Functions of RVs
- Order Statistics
- The Sample Mean and Sample Variance
- Probability that one RV is greater than another
- St Petersburg Paradox
- Drunk guy by a cliff
- The things that happen to us