Derivation of the Normal Distribution
Where does the bell curve (pdf) of the Normal distribution come from? When I first saw this function I was like, how in the world did they come up with that?
$$f(x) = \frac{1}{\sqrt{2 \pi}\sigma} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}$$First, completely forget about the function above. Let's just think about throwing darts at the origin (Cartesian coordinates). In this thought experiment we're not professional dart throwers. So we don't always get a bullseye and there's some randomness to where the darts land. Based on two assumptions about the randomness, the distribution of darts must be Normal.
Assumptions
1. The closer you are to the origin the higher the probability
2. Left right accuracy or inaccuracy doesn't impact up down accuracy/inaccuracy at all. "Rotationally invariant" and $x$ and $y$ independent
Derivation
Therefore we're looking for a function $\varphi : R^2 \rightarrow [0,1]$
Here we start with $\varphi$ is in polar coordinates. So $\varphi(r)$ should give you a probability between $[0,1]$
Note that $\int_{-\infty}^{\infty} \varphi (r)dA = 1$ and also note that it's $\varphi(r)$ and not $\varphi(r,\theta)$ due to assumption 2.
Now from assumption 2 we know that the pdf can be split into two marginal pdfs:
$$\varphi(r) = f_X(x) f_Y(y)$$ $$= f(x)f(y)$$Where last equality comes from the fact that $f_X = f_Y$ therefore we can just call it $f$
Transform from polar coordinates to Cartesian which means $\forall x$ and $\forall y$
$$\varphi(\sqrt{x^2+y^2}) = f(x)f(y)$$Set $y=0$ and get:
$$\varphi(x) = f(x)f(0)$$Note that $f(0)$ is just a constant therefore let $f(0) = \lambda$.
$$\varphi(x)=\lambda f(x)$$Therefore
$$\lambda f(\sqrt{x^2+y^2}) = f(x)f(y)$$Multiply both sides by $\frac{1}{\lambda^2}$ which says $\lambda$ shouldn't be zero amirite?
$$\frac{f(\sqrt{x^2+y^2})}{\lambda} = \frac{f(x)}{\lambda} \frac{f(y)}{\lambda}$$Let $g(x) = \frac{f(x)}{\lambda}$ which means that:
$$g(\sqrt{x^2+y^2}) = g(x)g(y)$$Now from examining $g$ we decide that $g$ must be an exponential and therefore in the form $g(x) = e^{Ax^2}$ and note that $C^x = e^{xln(C)}$
If $g$ is in the form $g(x) = e^{Ax^2}$ then plugging into $g(\sqrt{x^2+y^2}) = g(x)g(y)$ you get:
$$e^{Ax^2}e^{Ay^2} = e^{A(x^2+y^2)}$$And since $g(x) = \frac{f(x)}{\lambda}$ we have that
$$ f(x) = \lambda e^{Ax^2} $$Note that $A$ has to be negative otherwise your pdf looks kind of like a parabola which means your probability of missing really bad gets higher as you suck more. By making $A$ negative you get the classic bell shaped curve. Therefore just define $A=-h^2$ to guarantee non positive values only
$$ f(x) = \lambda e^{-h^2x^2} $$By the axiom of probability,
$$\int_{-\infty}^{\infty} \lambda e^{-h^2x^2} = 1 $$Now there is a famous integral which says that $\int_{-\infty}^{\infty} e^{-x^2} = \sqrt{\pi} $ and so we need to get into the right form.
Let $u=hx$ and $du = h dx$
$$\frac{\lambda}{h} \int_{-\infty}^{\infty} e^{-u^2} du = 1$$So this means that $ \lambda = \frac{h}{\sqrt{\pi}}$ and $h^2 = \lambda^2 \pi$ and therefore
$$f(x) = \lambda e^{-\lambda^2 \pi x^2}$$Interesting place to be. Let's mess with various values of $\lambda$. If $\lambda = 1 \Rightarrow f(x) = e^{-\pi x^2}$ and I did check that the area under this case is one. If you graph the $\lambda = 1$ case versus $\lambda = 5$ you get the same bell shaped curve but pulled upward (taller and skinnier). Therefore we can think of $\lambda$ as the variance of our data $\sigma$ and note that as $\lambda \nearrow \sigma \searrow$ and as $\lambda \searrow \sigma \nearrow$ therefore the next objective is to get $\lambda$ in terms of $\sigma$
By definition $\sigma_X^2 = Var(X) = E[(X-\mu)^2] = \int_{-\infty}^{\infty} (x-\mu)^2f(x) dx$
$$\sigma^2 = \int_{-\infty}^{\infty} (x-0)^2f(x) dx = \int_{-\infty}^{\infty} x^2 \lambda e^{-\lambda^2 \pi x^2} dx$$Integration by parts:
$$u = x$$ $$du = dx$$ $$dv = x e^{-\lambda^2 \pi x^2} dx$$ $$v = afterUsub = \frac{-1}{2\pi \lambda^2} e^{-\lambda^2 \pi x^2}$$ $$= \lambda\frac{-x}{2\pi\lambda^2} e^{-\pi \lambda^2 x^2} \Biggr|_{-\infty}^{\infty} + \int_{-\infty}^{\infty} \frac{1}{2 \pi \lambda^2} e^{-\pi \lambda^2 x^2} dx) = \lambda \int_{-\infty}^{\infty} \frac{1}{2 \pi \lambda^2} e^{-\pi \lambda^2 x^2} dx$$ $$ = \frac{1}{2 \pi \lambda^2} \int_{-\infty}^{\infty} \lambda e^{-\pi \lambda^2 x^2} dx = \frac{1}{2 \pi \lambda^2} = \sigma^2$$ $$\Rightarrow \lambda^2 = \frac{1}{2\pi \sigma^2} \Rightarrow \lambda = \frac{1}{\sqrt{2 \pi} \sigma}$$Remember that $\lambda$ and $\sigma$ are inversely proportional therefore,
$$f(x) = \frac{1}{\sqrt{2 \pi} \sigma} e^{\frac{-1}{2}(\frac{x}{\sigma})^2}$$And if you want to center this at some other value $\mu$ then just do
$$f(x) = \frac{1}{\sqrt{2 \pi} \sigma} e^{-\frac{1}{2}(\frac{x-\mu}{\sigma})^2}$$Articles
Personal notes I've written over the years.
- When does the Binomial become approximately Normal
- Gambler's ruin problem
- The t-distribution becomes Normal as n increases
- Marcus Aurelius on death
- Proof of the Central Limit Theorem
- Proof of the Strong Law of Large Numbers
- Deriving Multiple Linear Regression
- Safety stock formula derivation
- Derivation of the Normal Distribution
- Comparing means of Normal populations
- Concentrate like a Roman
- How to read a Regression summary in R
- Notes on Expected Value
- How to read an ANOVA summary in R
- The time I lost faith in Expected Value
- Notes on Weighted Linear Regression
- How information can update Conditional Probability
- Coupon collecting singeltons with equal probability
- Coupon collecting with n pulls and different probabilities
- Coupon collecting with different probabilities
- Coupon collecting with equal probability
- Adding Independent Normals Is Normal
- The value of fame during and after life
- Notes on the Beta Distribution
- Notes on the Gamma distribution
- Notes on Conditioning
- Notes on Independence
- A part of society
- Conditional Expectation and Prediction
- Notes on Covariance
- Deriving Simple Linear Regression
- Nature of the body
- Set Theory Basics
- Polynomial Regression
- The Negative Hyper Geometric RV
- Notes on the MVN
- Deriving the Cauchy density function
- Exponential and Geometric relationship
- Joint Distribution of Functions of RVs
- Order Statistics
- The Sample Mean and Sample Variance
- Probability that one RV is greater than another
- St Petersburg Paradox
- Drunk guy by a cliff
- The things that happen to us