Squid Game conditional probabilities

Difficulty:   ★★★☆☆   undergraduate

Earlier we analysed the probabilities for the bridge-crossing scenario in the Squid Game episode “VIPs”, which has “deadly high stakes” according to the Netflix blurb for the series. 🙂 So far, we made the assumption of no foreknowledge. This means our results for the players’ progress describe their chances as they stand before the game begins. Equivalently, if the game has started, our results assume the analyst knows nothing about prior contestants, and cannot view the state of the bridge.

But now, suppose we are told only that a specific player numbered i died on step number n. (That is, they stood safely on column n – 1, but chose wrongly amongst the next pair of glass panels on column n, breaking a pane and plummeting downwards.) Then the next player is definitely safe on step n, but has no information about later steps, so the game is essentially reset from that point. Hence the “conditional probability” that player I > i is still alive on step N > n is simply:

    \[P(I,N|i\textrm{ died on }n) = a_{I-i,N-n}.\]

Recall a_{i',n'} \equiv P(i',n') is the chance player i′ is alive on step n′ (given no information nor conditions). We labelled as b_{i',n'} = \binom{n'-1}{i'-1}2^{-n'} the chance they died on step n′ specifically, so analogously:

    \[P(I\textrm{ dies on }N|i\textrm{ died on }n) = b_{I-i,N-n}.\]

Now, suppose we are told only that a specific player I will die on step N. What is the probability for an earlier player’s progress? Bayes’ theorem says that given two events A and B, the conditional probabilities are related by P(A|B) = P(B|A)P(A)/P(B), which in our case is:

    \[\begin{aligned}c_{i,n} &:= P(i\textrm{ died on }n|I\textrm{ dies on }N) \\ &= \frac{b_{I-i,N-n}b_{i,n}}{b_{I,N}} \\ &= \frac{\binom{N-n-1}{I-i-1}\binom{n-1}{i-1}}{\binom{N-1}{I-1}}.\end{aligned}\]

The powers of 2 cancelled. The Table below shows some example numbers.

Table: Probability player i died on step n, given player I = 5 will die on step N = 8

n = 1

2 3 4 5 6 7 8

i = 1

4/7 2/7 4/35 1/35 0 0 0 0
2 0 2/7 12/35 9/35 4/35 0 0 0
3 0 0 4/35 9/35 12/35 2/7 0 0
4 0 0 0 1/35 4/35 2/7 4/7 0
5 0 0 0 0 0 0 0 1

In general, on any given row (fixed player i) the entries are nonzero only for n between i and N – I + i inclusive. This forms a diamond shape. For the row sum \sum_{n=i}^{N-I+i}c_{i,n} computer algebra returns a hypergeometric function times two binomial coefficients, which appears to simplify to 1 (for integer parameters) as expected, since player i must die somewhere. On any given column \sum_i c_{i,n} = (I-1)/(N-1) which is independent of n, meaning each step has equal chance that some player will die there. In particular the first entry itself takes this value: c_{1,1} = (I-1)/(N-1).

We examine other properties and special cases. By construction the last row and column are zeroes apart from c_{I,N} := 1; our general formula does not apply for n = N. If we are told where the second player I = 2 died, then player i = 1 has an equal chance 1/(N – 1) of dying on any earlier step. Also from the definition it is clear:

    \[c_{i,n} \equiv c_{I-i,N-n},\]

so the table is symmetric about its central point. The ratio of adjacent entries follows from the binomial coefficients:

    \[\begin{aligned}\frac{c_{i-1,n}}{c_{i,n}} &= \frac{(i-1)(N-I-n+i)}{(I-i)(n-i+1)}, \\ \frac{c_{i,n-1}}{c_{i,n}} &= \frac{(N-n)(n-i)}{(n-1)(N-I-n+i+1)}.\end{aligned}\]

It follows that at step n, player i = (I – 1)n/N and the subsequent player have the same “fail” chance. Presumably the maximum lies within this range. Physically we require the indices i and n to be integers. For the chosen Table parameters above, the relation just given is simply in/2, so every second column contains an adjacent pair of equal values. For the steps (columns) on the other hand, on n = (i – 1)(N – 1) / (I – 2) and the following step the “elimination” chance is equal. Note these special index values are linear functions of the other index (i or n respectively), where we regard I and N as fixed.

By rearranging terms we can write equivalent expressions for the chance to be eliminated, such as:

    \[c_{i,n} \equiv \frac{i\binom{N-I}{n-i}\binom{I-1}{i}}{n\binom{N-1}{n}}.\]

conditional probability in Squid Game
The probability Squid Game bridge contestants will expire on a given step, given the condition: player I = 9 will die on step N = 25. It forms a sort of boat- or saddle-shape. The blue dots are for integer index values, which are physical. In previous results the “ridge line” of high probability was at roughly n = 2i, but for the conditional probability it is spread out between the endpoint events, so in this case is roughly n = 3i.

For suitably large parameters, the probability resembles a gaussian curve. We can apply the de Moivre-Laplace approximation (with parameter p := ½ say) to the binomial coefficients. This gives a gaussian for a fixed step number n, as a function of the player number. I omit the height, but its centre and width are determined from the exponent which is:


The spread is maximum at n = N/2, in this approximation. Now to obtain a gaussian approximation for a fixed player i, apply the results of the previous blog post using the substitutions x \rightarrow n-1, a \rightarrow i-1, b \rightarrow I-i-1, and X \rightarrow N-2. The centre is n_0 := x_0 + 1 = (i-1)(N-1)/(I-2)+1/2. One option for the height of the gaussians — when looking for a simple expression — is to use the sums 1 and (I – 1)/(N – 1) determined before. Recall for a normalised gaussian, the height 1/\sqrt{2\pi}\sigma is inversely proportional to the standard deviation.

There are other conditional probability questions one could pose. Suppose we are given a window, bounded by the events that player J died on column L, and later player K dies on column M? Inside this window, the probabilities reduce to our above analysis: the chance i dies on n is just c_{i-J,n-L}, where we also substitute I \rightarrow K-J and N \rightarrow M-L. As another possible scenario to analyse, we might be informed that player I is alive on step N. Then we would not know how far they progressed, just that it was at least that far. Or, we might be told player I died on or before step N.

A concluding thought: Bayes’ theorem is deceptively simple-looking. I tried harder ways beforehand, trying to puzzle through the subtlety of conditional probability on my own. But with Bayes, the main result followed easily from our previous work.

🡐 asymptotics | Squid Game bridge | ⸻ 🠦

Gaussian approximation to a certain product of binomial coefficients

Difficulty:   ★★★☆☆   undergraduate

Consider the following function, which is the product of a certain pair of binomial coefficients:

    \[f(x) := \binom{x}{a}\binom{X-x}{b}.\]

We take abX >> 1 to be constants, and x to have domain [a – 1, Xb + 1] which implies Xab – 2 at least. As usual \binom{x}{a} := x!/a!(x-a)!, and this is extended beyond integer values by replacing each factorial with a Gamma function. Note the independent variable x appears in the upper entries of the binomial coefficients. Curiously, from inspection f is well-approximated by a gaussian curve. To gain some insight, for integer values of the parameters f is the polynomial:

    \[(a! b!)^{-1}x(x-1)\cdots(x-a+1)\cdot(X-b+1-x)\cdots(X-x).\]

This has many zeroes, and sometimes oscillates wildly in between them, hence the domain of x specified earlier.

plot of function and a gaussian curve approximation
Figure: The function for a = 7, b = 10, and X = 20. It is shown beyond our stated domain, which is bounded by the roots at x = 6 and 11. The gaussian uses our estimated centre of x = 277/34 or approx. 8.147, whereas f‘s actual maximum occurs at around x = 8.139. The variance is from our approximate harmonic number formula, evaluated at the estimated centre point. Alternately, the “finite difference” derivatives give a poor estimate in this case. In general, the gaussian fit looks best for high parameter values with a near b, etc.

Now the usual approximations to a single binomial coefficient (actually, binomial distribution) are not helpful here. For example the de Moivre–Laplace approximation is a gaussian in terms of the lower entry in the binomial coefficient, whereas our x is in the upper entries. More promising is the approximation as a Poisson distribution, which leads to a polynomial which is itself gaussian-like, and motivated the previous post incidentally. However we proceed from first principles, by estimating the centre point and the second derivative there.

At the (central) maximum of f, the slope is zero. In general the derivative is f'(x) = f(x)(H_x-H_{x-a}-H_{X-x}+H_{X-b-x}), where the H’s are called harmonic numbers. There may not exist any simple explicit expression for the turning points. Instead, the ratio of nearby points is comparatively simple:

    \[\frac{f(x-1/2)}{f(x+1/2)} = \frac{(x-a+1/2)(X+1/2-x)}{(x+1/2)(X-b+1/2-x)},\]

using the properties of the binomial coefficient. The derivative is approximately zero where this ratio is unity, which occurs at:

    \[x_0 := \frac{2aX+a-b}{2(a+b)}.\]

This should be a close estimate for the central turning point. [To do better, substitute specific numbers for the parameters, and solve numerically.] It is typically not an integer. Our sought-for gaussian has form C\operatorname{exp}(-(x-x_0)^2/2\sigma^2). We set the height C := f(x_0). Only the width remains to be determined. The gaussian’s second derivative evaluated at its centre point is -C/\sigma^2. On the other hand:

    \[f''(x) = f'(x)^2/f(x) - (H_x^{(2)}-H_{x-a}^{(2)}+H_{X-x}^{(2)}-H_{X-b-x}^{(2)})f(x),\]

which uses the so-called harmonic numbers of order 2, and I incorporate the function and its derivative (both given earlier) for brevity of the expression. Matching the results at x_0 yields the variance parameter \sigma^2:

    \[\sigma^{-2} := H_{x_0}^{(2)}-H_{x_0-a}^{(2)}+H_{X-x_0}^{(2)}-H_{X-b-x_0}^{(2)},\]

using f'(x_0) \approx 0. (At large values the series H_x^{(2)} \approx \pi^2/6 -1/x +1/2x^2\cdots may give insight into the above.) But alternatively, we can approximate the second derivative using elementary operations. By sampling the function at x_0-1, x_0, and x_0+1 say, a “finite differences” approach gives approximate derivatives. We can use the simple ratio formula obtained earlier to reduce the sampling to one or two points only, which might gain some insight along the way (though I currently wonder if this is a dead end…).

Now f'(x_0-1/2) \approx f(x_0) - f(x_0-1), which becomes:


after using the ratio formula to obtain f(x_0-1) in terms of C. Similarly it turns out f'(x_0+1/2) is the negative of the above expression, but with a and b interchanged. Then a second derivative is: f''(x_0) \approx f'(x_0+1/2)-f'(x_0-1/2), but the combined expression does not simplify further so I won’t write it out. The last step is to set \tilde\sigma^2 := -C/f''(x_0), which is different to the earlier choice.

A slightly different approach uses f'(x_0-1/2) \approx (f(x_0+1/2)-f(x_0-3/2))/2, which may be expressed in terms of another sampled point E := f(x_0-1/2) = f(x_0+1/2). Similarly f'(x_0+1/2) \approx (f(x_0+3/2)-f(x_0-1/2))/2. The estimate for the second derivative follows, then later:

    \[\hat\sigma^2 := \frac{-2C(aX-b)(bX+a+2b)(bX-b^2-ab+a+2b)(aX-a^2-ab-b)}{E(a+b)^6}.\]

The expression is a little simpler in this approach, but at the cost of a second sample point. The use of f'(x_0-1) \approx f(x_0-1/2)-f(x_0-3/2) and f'(x_0+1) \approx f(x_0+3/2)-f(x_0+1/2) instead leads to the same result.

Gaussian approximation to a certain polynomial

Difficulty:   ★★★☆☆   undergraduate

Consider the function:


where the independent variable x ranges between 0 and X, and the exponents are large: A, B \gg 1. [We could call it a “polynomial”, though the exponents need not be integers. Specifically it is the product of “monomials” in x and Xx, so might possibly be called a “sparse” polynomial in this sense.] Surprisingly, it closely resembles a gaussian curve, over our specified domain x \in [0,X].

approximation to a certain polynomial using a gaussian curve
Figure: The polynomial with parameters A = 10, B = 13, and X = 11. Our gaussian approximation is visually indistinguishable near the centre. Outside our specified domain the polynomial tends to \pm\infty, and for each non-integer exponent the tail on one side becomes imaginary.

The turning point is where the derivative equals zero. This occurs when x is the surprisingly simple expression:

    \[\tilde x := \frac{X}{1+B/A},\]

at which the function has value:

    \[A^A B^B \Big(\frac{X}{A+B}\Big)^{A+B} \equiv (B/A)^B \tilde x^{A+B}.\]

An arbitrary gaussian, not necessarily normalised, has form: Ce^{-(x-D)^2/2\sigma^2}. This has centre D which we equate with \tilde x, and maximum height C which we set to the above expression. We can fix the final parameter, the standard deviation, by matching the second derivatives at the turning point. Hence the variance is:

    \[\sigma^2 = \frac{AB}{(A+B)^3}X^2 \equiv \frac{B}{A^2X}\tilde x^3.\]

Hence our gaussian approximation may be expressed:

    \[\boxed{(B/A)^B \tilde x^{A+B} \operatorname{exp}\Big( -\frac{(x-\tilde x)^2}{2B\tilde x^3/A^2X} \Big).}\]

The integral of the original curve turns out to be:

    \[\int_0^X x^A(X-x)^Bdx = \frac{X^{A+B+1}}{(A+B+1)\binom{A+B}{A}}.\]

This uses the binomial coefficient \binom{A+B}{A} := (A+B)!/A!B!, which is extended to non-integer values by replacing the factorials with Gamma functions. We could then apply Stirling’s approximation A! \approx \sqrt{2\pi A}(A/e)^A to each factorial, to obtain:

    \[\int\cdots \approx \frac{\sqrt{2\pi(A+B)}}{A+B+1}(B/A)^{B+1/2}\tilde x^{A+B+1},\]

though this is more messy to write out. On the other hand, the integral of the gaussian approximation is:

    \[\int_{-\infty}^\infty \operatorname{exp}\cdots = \sqrt\frac{2\pi}{A+B}(B/A)^{B+1/2}\tilde x^{A+B+1}.\]

We evaluated this integral over all real numbers, because the expression is simpler and still approximately the same. The ratio of the above two expressions is (A+B)/(A+B+1) \approx 1.