Thu 6 October 2016

The Negative Hyper Geometric RV

Written by Hongjinn Park in Articles

Some cool applications! For example from a shuffled deck,
  • the expected number of cards to flip over until an Ace (10.6 cards)
  • the expected number of cards to flip over until the Ace of Spade (26.5 cards)
  • the expected number of cards to flip over until anything except Ace of Spade (1.019 cards)
  • the expected number of cards to flip over until any card of any type (1 card)
  • the expected number of cards to flip over until a Spade (3.786 cards).
  • for a Spade, why isn't $X$, the number of cards to flip $\sim Geo(p=\frac{1}{4})$ and $E[X] = 4 > 3.786$? Because for a Geo RV X, each trial is independent and there is full replacement. Same reason the EV of flips for an Ace of Spade is $26.5 < \frac{1}{\frac{1}{52}}=52$.

You have an urn with $n$ special and $m$ ordinary balls and you keep picking with no replacement until you get $r$ specials. Let $Y$ be the total number of picks until you amass $r$ specials. Then $Y$ is a Negative Hyper Geometric RV. What is the pmf of $Y$? Similar to how you get the pmf for a NegBin, you know that the last pick must be a special ball. Therefore,

$$P(Y=k) = \frac{{n \choose r-1}{m \choose k-r}}{{n+m \choose k-1}} \frac{n-r+1}{n+m-k+1}$$

Now we are curious what is the $E[Y]$. Instead of using the pmf above, which could be a bitch, let's look at this...

Let $X$ be the number of ordinary balls you pick before amassing r specials. Then $Y = r + X$. Now label the ordinary balls $o_1, o_2, ... ,o_m$ and let $X_i = 1$ if $o_i$ is chosen before r specials are gathered. Therefore $Y=r+\sum_{i=1}^{m} X_i$ and

$$E[Y] = E[r+X] = E[r+\sum_{i=1}^{m} X_i] = r + \sum_{i=1}^{m} E[X_i] = r + \sum_{i=1}^{m} P(X_i=1) $$

To get $P(X_i=1)$ note that in any ordering of all $n+m$ balls, $o_i$ could be at any spot, from spot 1 to spot number $n+m$. Now after considering any ordering of $n+m$ balls, erase all ordinary balls except $o_i$ so that you are just looking at every special ball and $o_i$ which in total is $n+1$ balls. For $o_i$ to be selected it just has to be in the first r spots of the $n+1$ possible spots. Note that $o_i$ can appear in any of the $n+1$ spots with equal probability. Therefore $P(X_i=1) = \frac{r}{n+1}$ and

$$E[Y] = r + m\frac{r}{n+1} = \frac{r(n+m+1)}{n+1}$$

Another way to calculate $P(X_i=1)$ is by remembering that you just need $o_i$ in the first r of the $n+1$ possible spots to guarantee selection and therefore you just need to choose a group of size r consisting of $o_i$ and $r-1$ specials to go in front. Let's say you have n women and one man. You want a committee of size r to go in front. It should be made up of $r-1$ women and 1 man.

$$P(X_i=1) = \frac{{1 \choose 1}{n \choose r-1}}{{n+1 \choose r}} = \frac{r}{n+1}$$

Here the denominator ${n+1 \choose r}$ includes starting front groups that don't have $o_i$ in the first r slots, as in the entire group consists of specials. The numerator is the number of ways to choose a group of size r that includes the $o_i$. Note that by symmetry the probability that $o_i$ is in the first r spots is the same as the probability that $o_i$ is in the final r spots or really any division of the $n+1$ possible spots. To convince yourself of this, think of spots 2,3,4,5. In the 4 spots starting at location 2 thru 5 you can have any group of the n+1 balls or ${n+1 \choose 4}$ and for the denominator you have ${1 \choose 1}{n \choose 3}$.

In regards to this clever argument, note that $P(k=m+r)$ which is the probability of being very "unlucky" and withdrawing all ordinary balls to get $r$ specials is not necessarily the same as $P(k=1+r)$ which is only getting one dud to get r specials. You can look at the pmf to see this. For example it is not necessarily true that $P(k=r) = P(k=1+r) = ... = P(k=m+r)$ but in regards to whether ball $o_i$ is chosen you don't care about the value of $Y$, whether it is $r$ or $r+m$. For instance say that $o_i$ comes right before a consecutive string of the first r special balls. Then $o_i$ could be the last ordinary ball ($Y=k=m+r$) or the first ordinary ball ($Y=k=1+r$) or the second, etc. Again, for the purposes of determining $P(X_i = 1)$ the value of $Y$ does not matter which is why you can safely erase all the other ordinary balls and retain equal probability for $o_i$ to be in any of the remaining $n+1$ spaces.

Think about it this way. You have ab and ba. Now introduce C. You have Cab, Cba, aCb, bCa, abC, baC. Now take out C. You have ab, ba, ab, ba, ab, ba. You still have half are ab and the other half are ba.

Note that $Var(Y) = Var(r + X) = Var(X).$

(*) To calculate the variance we will use $Var(X) = E[X^2] - E[X]^2$ and to get that we will calculate $E[X^2]$ from the formula $E[X^2] = 2\sum_{i < j}P(A_iA_j)+E[X]$ which comes from looking at $E[{X\choose2}]$.

The event $A_iA_j$ means that both ordinary balls $i$ and $j$ were chosen before r specials were picked. Again, first look at any ordering of all $n+m$ balls. Erase everything except $o_i$, $o_j$, and the r specials. You are left with a string of size $n+2$. Therefore, $A_iA_j$ will happen if in the possible $n+2$ slots $o_i$ and $o_j$ are in the first r+1 spots. As a result,

$$P(A_iA_j) = \frac{{2 \choose 2}{n \choose r-1}}{{n+2 \choose r+1}} = \frac{r(r+1)}{(n+1)(n+2)}$$

After some algebra, see (*) you get,

$$Var(X) = Var(Y) = \frac{mr(n+1-r)(n+m+1)}{(n+1)^2 (n+2)}$$

Let $X$ be the number of cards to flip over to get at least one of any card of any type. Then, $Var(X) = 0$ since $m=0$, $r=1$, $n=52$. Therefore it always takes exactly 1 flip over, never more or less.

Let $X$ be the number of cards to flip over to get the Ace of Spades. Then, since $m=51$, $r=1$, $n=1$,

$$Var(X) = \frac{51(1)(53)}{4(3)} = \frac{2703}{12} = 225.25$$

So for the number of cards to flip over to see the Ace of Spades, $E[X] = 26.5$ and $\sigma \approx 15$.



Articles

Personal notes I've written over the years.