Notes on the Beta Distribution
A very interesting distribution. I've seen it a lot when you have a Bernoulli RV with unknown parameter $p$.
For example if you have an unfair coin then the probability it lands heads could be thought of as a Beta RV. There's this whole frequentist versus Bayesian debate but honestly I don't have the expertise to comment on that.
If you have $X \sim Beta(a=10, b=1)$ then your density function is defined always from $(0,1)$ and
$$f_X(x) = 10x^9 \quad x \in (0,1)$$Is a legit pdf! Also know that you can get bimodal looking distributions with the Beta.
The Beta and the Gamma are closely related. Note that
$$B(a,b) = \frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)} = \frac{(a-1)!(b-1)!}{(a+b-1)!}$$Creating a Beta from two Gamma RVs with common rate parameter - Bank and Post Office
Waiting time at a bank is $X \sim Gamma(a,1)$
Waiting time at a post office is $Y \sim Gamma(b,1)$
$X$ and $Y$ are independent and let $T = X+Y$ and $W = \frac{X}{X+Y}$
What is the joint distribution of $T$ and $W$? Are they independent?
$$f_{T,W}(t,w) = f_{X,Y}(x,y) \left| \frac{d(x,y)}{d(t,w)} \right|$$Solving for $x$ and $y$ in terms of $t$ and $w$ we get
$$X = TW$$ $$Y = T(1-W)$$ $$ \begin{bmatrix} \frac{\partial x}{\partial t} & \frac{\partial x}{\partial w} \\ \frac{\partial y}{\partial t} & \frac{\partial y}{\partial w} \\ \end{bmatrix} = \begin{bmatrix} w & t \\ 1-w & -t \\ \end{bmatrix} = -wt - t(1-w) = -wt -t + wt = -t $$Since $X$ and $Y$ are independent we have
$$f_{X,Y}(x,y) = \frac{e^{-x} x^{a-1}}{\Gamma(a)} \cdot \frac{e^{-y} y^{b-1}}{\Gamma(b)}$$ $$f_{T,W}(t,w) = \frac{e^{-tw} (tw)^{a-1}}{\Gamma(a)} \cdot \frac{e^{-t(1-w)} (t(1-w))^{b-1}}{\Gamma(b)} t = \frac{e^{-tw - t +tw} t^{a-1} w^{a-1}}{\Gamma(a)} \cdot \frac{t^{b-1} (1-w)^{b-1}}{\Gamma(b)} t $$ $$ = \frac{w^{a-1} (1-w)^{b-1}}{\Gamma(a) \Gamma(b)} e^{-t} t^{a+b-1} = \frac{\Gamma(a+b)}{\Gamma(a) \Gamma(b)} w^{a-1} (1-w)^{b-1} \cdot \frac{e^{-t} t^{a+b-1}}{\Gamma(a+b)}$$where we get the last equality by multiplying by $1 = \frac{\Gamma(a+b)}{\Gamma(a+b)}$. From the joint pdf of $T$ and $W$ we see that
1. $T$ and $W$ are independent!
2. $T \sim Gamma(a+b)$ and $W \sim Beta(a,b)$
3. Apparently this is a special case, if you don't use Gamma RVs then $T$ and $W$ will be not be independent
To get the expected value of a Beta RV, note that
$$E[W] = E \left[ \frac{X}{X+Y} \right] = \frac{E[X]}{E[X+Y]} = \frac{a}{a+b}$$the second equality is NOT by linearity of expectation. Usually if you do that it's completely wrong. In this specific case it's true. Since $W$ and $T$ are independent we know that $Cov(W,T) = 0$ so by definition
$$E[WT] = E[W]E[T]$$ $$ E \left[ \frac{X(X+Y)}{X+Y} \right] = E[W] E[X+Y] $$ $$ E[X] = E[W] E[X+Y]$$ $$E[W] = \frac{E[X]}{E[X+Y]}$$LOTUS is also a good way to get the moments of a Beta RV.
Example 1
Great example from Math Stack Exchange: Predicting batting average at the start of the season.
Example 2
You're flipping coins $n+m$ times. Each flip is independent. All flips have identical success probability $p$ which is chosen before the first flip from a $U \sim Uniform(0,1)$ distribution. Given that your number of successful flips is $n$, the $P(U=u | N=n)$ has a Beta pdf with parameters $(n+1,m+1)$ and I think as you add more data, then you get a more accurate probability of what the uniform probability was. So if $n=100=m$ then you get a symmetrical spike around $0.5$ which makes sense. If $n= 10$ and $m = 90$ then you get a spike around $0.1$ as the probability chosen. I'm just trying to make an analogy with the batting average example.
Example 3
If $X \sim Binomial(n,p)$ and $B \sim Beta(j, n-j+1)$ then
$$ P(X \ge j) = P(U_{(j)} \le p) = P(B \le p)$$Articles
Personal notes I've written over the years.
- When does the Binomial become approximately Normal
- Gambler's ruin problem
- The t-distribution becomes Normal as n increases
- Marcus Aurelius on death
- Proof of the Central Limit Theorem
- Proof of the Strong Law of Large Numbers
- Deriving Multiple Linear Regression
- Safety stock formula derivation
- Derivation of the Normal Distribution
- Comparing means of Normal populations
- Concentrate like a Roman
- How to read a Regression summary in R
- Notes on Expected Value
- How to read an ANOVA summary in R
- The time I lost faith in Expected Value
- Notes on Weighted Linear Regression
- How information can update Conditional Probability
- Coupon collecting singeltons with equal probability
- Coupon collecting with n pulls and different probabilities
- Coupon collecting with different probabilities
- Coupon collecting with equal probability
- Adding Independent Normals Is Normal
- The value of fame during and after life
- Notes on the Beta Distribution
- Notes on the Gamma distribution
- Notes on Conditioning
- Notes on Independence
- A part of society
- Conditional Expectation and Prediction
- Notes on Covariance
- Deriving Simple Linear Regression
- Nature of the body
- Set Theory Basics
- Polynomial Regression
- The Negative Hyper Geometric RV
- Notes on the MVN
- Deriving the Cauchy density function
- Exponential and Geometric relationship
- Joint Distribution of Functions of RVs
- Order Statistics
- The Sample Mean and Sample Variance
- Probability that one RV is greater than another
- St Petersburg Paradox
- Drunk guy by a cliff
- The things that happen to us