Sun 6 January 2019

Coupon collecting with n pulls and different probabilities

Written by Hongjinn Park in Articles

Let $X$ be the number of unique coupons in $n$ pulls, what is the expected value and variance when the probabilities are different?


So the max formula doesn't apply as well here since your total pulls is restricted to a set number.

Let's be very clever and this time think about the number of uncollected types. Let $X$ be the number of uncollected coupons and $X_i = 1$ if coupon $i$ is uncollected in n pulls. Therefore, $Y$ which is the number of unique coupons collected in $n$ pulls is

$$Y = N-X = N-\sum_{i=1}^{N} X_i$$ $$E[Y] = N-\sum_{i=1}^{N} E[X_i] = N-\sum_{i=1}^{N} P(X_i=1) = N - \sum_{i=1}^{N} (1-p_i)^n$$

Let's say $N=20$ and $n=10$ and $p_i = \frac{1}{20}$ then $E[Y] = 20 - 20(\frac{19}{20})^{10} = 8.02$ so you're expected to have two duplicates.

Also, since $X = N-Y$,

$$E[X] = E[N-Y] = N - E[Y] = \sum_{i=1}^{N} (1-p_i)^n$$ Also note that the $Var(Y) =Var(N-X) = 0 + (-1)^2Var(X) = Var(X)$. The $Var(X) = E[X^2] - E[X]^2$ and I already have the expected value. To get $E[X^2]$ I use the this useful identity that you get from looking at the $E[{X \choose 2}]$ which eventually gives you $$E[X^2] = 2 \sum_{i< j} P(A_iA_j) + E[X]$$

For my problem, the event that both $X_i$ and $X_j$ both happen occurs with probability $(1-p_i-p_j)^n$ and therefore,

$$E[X^2] = 2 \sum_{i < j} (1-p_i-p_j)^n + E[X]$$

And from all this crap you can get $Var(X) = Var(Y)$.



Articles

Personal notes I've written over the years.