Sun 18 November 2018

Notes on the Beta Distribution

Written by Hongjinn Park in Articles

A very interesting distribution. I've seen it a lot when you have a Bernoulli RV with unknown parameter $p$.

For example if you have an unfair coin then the probability it lands heads could be thought of as a Beta RV. There's this whole frequentist versus Bayesian debate but honestly I don't have the expertise to comment on that.


If you have $X \sim Beta(a=10, b=1)$ then your density function is defined always from $(0,1)$ and

$$f_X(x) = 10x^9 \quad x \in (0,1)$$

Is a legit pdf! Also know that you can get bimodal looking distributions with the Beta.

The Beta and the Gamma are closely related. Note that

$$B(a,b) = \frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)} = \frac{(a-1)!(b-1)!}{(a+b-1)!}$$

Creating a Beta from two Gamma RVs with common rate parameter - Bank and Post Office

Waiting time at a bank is $X \sim Gamma(a,1)$

Waiting time at a post office is $Y \sim Gamma(b,1)$

$X$ and $Y$ are independent and let $T = X+Y$ and $W = \frac{X}{X+Y}$

What is the joint distribution of $T$ and $W$? Are they independent?

$$f_{T,W}(t,w) = f_{X,Y}(x,y) \left| \frac{d(x,y)}{d(t,w)} \right|$$

Solving for $x$ and $y$ in terms of $t$ and $w$ we get

$$X = TW$$ $$Y = T(1-W)$$ $$ \begin{bmatrix} \frac{\partial x}{\partial t} & \frac{\partial x}{\partial w} \\ \frac{\partial y}{\partial t} & \frac{\partial y}{\partial w} \\ \end{bmatrix} = \begin{bmatrix} w & t \\ 1-w & -t \\ \end{bmatrix} = -wt - t(1-w) = -wt -t + wt = -t $$

Since $X$ and $Y$ are independent we have

$$f_{X,Y}(x,y) = \frac{e^{-x} x^{a-1}}{\Gamma(a)} \cdot \frac{e^{-y} y^{b-1}}{\Gamma(b)}$$ $$f_{T,W}(t,w) = \frac{e^{-tw} (tw)^{a-1}}{\Gamma(a)} \cdot \frac{e^{-t(1-w)} (t(1-w))^{b-1}}{\Gamma(b)} t = \frac{e^{-tw - t +tw} t^{a-1} w^{a-1}}{\Gamma(a)} \cdot \frac{t^{b-1} (1-w)^{b-1}}{\Gamma(b)} t $$ $$ = \frac{w^{a-1} (1-w)^{b-1}}{\Gamma(a) \Gamma(b)} e^{-t} t^{a+b-1} = \frac{\Gamma(a+b)}{\Gamma(a) \Gamma(b)} w^{a-1} (1-w)^{b-1} \cdot \frac{e^{-t} t^{a+b-1}}{\Gamma(a+b)}$$

where we get the last equality by multiplying by $1 = \frac{\Gamma(a+b)}{\Gamma(a+b)}$. From the joint pdf of $T$ and $W$ we see that

1. $T$ and $W$ are independent!

2. $T \sim Gamma(a+b)$ and $W \sim Beta(a,b)$

3. Apparently this is a special case, if you don't use Gamma RVs then $T$ and $W$ will be not be independent

To get the expected value of a Beta RV, note that

$$E[W] = E \left[ \frac{X}{X+Y} \right] = \frac{E[X]}{E[X+Y]} = \frac{a}{a+b}$$

the second equality is NOT by linearity of expectation. Usually if you do that it's completely wrong. In this specific case it's true. Since $W$ and $T$ are independent we know that $Cov(W,T) = 0$ so by definition

$$E[WT] = E[W]E[T]$$ $$ E \left[ \frac{X(X+Y)}{X+Y} \right] = E[W] E[X+Y] $$ $$ E[X] = E[W] E[X+Y]$$ $$E[W] = \frac{E[X]}{E[X+Y]}$$

LOTUS is also a good way to get the moments of a Beta RV.

Example 1

Great example from Math Stack Exchange: Predicting batting average at the start of the season.

Example 2

You're flipping coins $n+m$ times. Each flip is independent. All flips have identical success probability $p$ which is chosen before the first flip from a $U \sim Uniform(0,1)$ distribution. Given that your number of successful flips is $n$, the $P(U=u | N=n)$ has a Beta pdf with parameters $(n+1,m+1)$ and I think as you add more data, then you get a more accurate probability of what the uniform probability was. So if $n=100=m$ then you get a symmetrical spike around $0.5$ which makes sense. If $n= 10$ and $m = 90$ then you get a spike around $0.1$ as the probability chosen. I'm just trying to make an analogy with the batting average example.

Example 3

If $X \sim Binomial(n,p)$ and $B \sim Beta(j, n-j+1)$ then

$$ P(X \ge j) = P(U_{(j)} \le p) = P(B \le p)$$


Articles

Personal notes I've written over the years.