Squid Game conditional probabilities

Difficulty:   ★★★☆☆   undergraduate

Earlier we analysed the probabilities for the bridge-crossing scenario in the Squid Game episode “VIPs”, which has “deadly high stakes” according to the Netflix blurb for the series. 🙂 So far, we made the assumption of no foreknowledge. This means our results for the players’ progress describe their chances as they stand before the game begins. Equivalently, if the game has started, our results assume the analyst knows nothing about prior contestants, and cannot view the state of the bridge.

But now, suppose we are told only that a specific player numbered i died on step number n. (That is, they stood safely on column n – 1, but chose wrongly amongst the next pair of glass panels on column n, breaking a pane and plummeting downwards.) Then the next player is definitely safe on step n, but has no information about later steps, so the game is essentially reset from that point. Hence the “conditional probability” that player I > i is still alive on step N > n is simply:

    \[P(I,N|i\textrm{ died on }n) = a_{I-i,N-n}.\]

Recall a_{i',n'} \equiv P(i',n') is the chance player i′ is alive on step n′ (given no information nor conditions). We labelled as b_{i',n'} = \binom{n'-1}{i'-1}2^{-n'} the chance they died on step n′ specifically, so analogously:

    \[P(I\textrm{ dies on }N|i\textrm{ died on }n) = b_{I-i,N-n}.\]

Now, suppose we are told only that a specific player I will die on step N. What is the probability for an earlier player’s progress? Bayes’ theorem says that given two events A and B, the conditional probabilities are related by P(A|B) = P(B|A)P(A)/P(B), which in our case is:

    \[\begin{aligned}c_{i,n} &:= P(i\textrm{ died on }n|I\textrm{ dies on }N) \\ &= \frac{b_{I-i,N-n}b_{i,n}}{b_{I,N}} \\ &= \frac{\binom{N-n-1}{I-i-1}\binom{n-1}{i-1}}{\binom{N-1}{I-1}}.\end{aligned}\]

The powers of 2 cancelled. The Table below shows some example numbers.

Table: Probability player i died on step n, given player I = 5 will die on step N = 8
step:

n = 1

2 3 4 5 6 7 8
player:

i = 1

4/7 2/7 4/35 1/35 0 0 0 0
2 0 2/7 12/35 9/35 4/35 0 0 0
3 0 0 4/35 9/35 12/35 2/7 0 0
4 0 0 0 1/35 4/35 2/7 4/7 0
5 0 0 0 0 0 0 0 1

In general, on any given row (fixed player i) the entries are nonzero only for n between i and N – I + i inclusive. This forms a diamond shape. For the row sum \sum_{n=i}^{N-I+i}c_{i,n} computer algebra returns a hypergeometric function times two binomial coefficients, which appears to simplify to 1 (for integer parameters) as expected, since player i must die somewhere. On any given column \sum_i c_{i,n} = (I-1)/(N-1) which is independent of n, meaning each step has equal chance that some player will die there. In particular the first entry itself takes this value: c_{1,1} = (I-1)/(N-1).

We examine other properties and special cases. By construction the last row and column are zeroes apart from c_{I,N} := 1; our general formula does not apply for n = N. If we are told where the second player I = 2 died, then player i = 1 has an equal chance 1/(N – 1) of dying on any earlier step. Also from the definition it is clear:

    \[c_{i,n} \equiv c_{I-i,N-n},\]

so the table is symmetric about its central point. The ratio of adjacent entries follows from the binomial coefficients:

    \[\begin{aligned}\frac{c_{i-1,n}}{c_{i,n}} &= \frac{(i-1)(N-I-n+i)}{(I-i)(n-i+1)}, \\ \frac{c_{i,n-1}}{c_{i,n}} &= \frac{(N-n)(n-i)}{(n-1)(N-I-n+i+1)}.\end{aligned}\]

It follows that at step n, player i = (I – 1)n/N and the subsequent player have the same “fail” chance. Presumably the maximum lies within this range. Physically we require the indices i and n to be integers. For the chosen Table parameters above, the relation just given is simply in/2, so every second column contains an adjacent pair of equal values. For the steps (columns) on the other hand, on n = (i – 1)(N – 1) / (I – 2) and the following step the “elimination” chance is equal. Note these special index values are linear functions of the other index (i or n respectively), where we regard I and N as fixed.

By rearranging terms we can write equivalent expressions for the chance to be eliminated, such as:

    \[c_{i,n} \equiv \frac{i\binom{N-I}{n-i}\binom{I-1}{i}}{n\binom{N-1}{n}}.\]

conditional probability in Squid Game
The probability Squid Game bridge contestants will expire on a given step, given the condition: player I = 9 will die on step N = 25. It forms a sort of boat- or saddle-shape. The blue dots are for integer index values, which are physical. In previous results the “ridge line” of high probability was at roughly n = 2i, but for the conditional probability it is spread out between the endpoint events, so in this case is roughly n = 3i.

For suitably large parameters, the probability resembles a gaussian curve. We can apply the de Moivre-Laplace approximation (with parameter p := ½ say) to the binomial coefficients. This gives a gaussian for a fixed step number n, as a function of the player number. I omit the height, but its centre and width are determined from the exponent which is:

    \[-\frac{\Big(i-\frac{N+nI-I-2n}{N-2}\Big)^2}{(n-1)(N-n-1)/2(N-2)}.\]

The spread is maximum at n = N/2, in this approximation. Now to obtain a gaussian approximation for a fixed player i, apply the results of the previous blog post using the substitutions x \rightarrow n-1, a \rightarrow i-1, b \rightarrow I-i-1, and X \rightarrow N-2. The centre is n_0 := x_0 + 1 = (i-1)(N-1)/(I-2)+1/2. One option for the height of the gaussians — when looking for a simple expression — is to use the sums 1 and (I – 1)/(N – 1) determined before. Recall for a normalised gaussian, the height 1/\sqrt{2\pi}\sigma is inversely proportional to the standard deviation.

There are other conditional probability questions one could pose. Suppose we are given a window, bounded by the events that player J died on column L, and later player K dies on column M? Inside this window, the probabilities reduce to our above analysis: the chance i dies on n is just c_{i-J,n-L}, where we also substitute I \rightarrow K-J and N \rightarrow M-L. As another possible scenario to analyse, we might be informed that player I is alive on step N. Then we would not know how far they progressed, just that it was at least that far. Or, we might be told player I died on or before step N.

A concluding thought: Bayes’ theorem is deceptively simple-looking. I tried harder ways beforehand, trying to puzzle through the subtlety of conditional probability on my own. But with Bayes, the main result followed easily from our previous work.

🡐 asymptotics | Squid Game bridge | ⸻ 🠦

Gaussian approximation to a certain product of binomial coefficients

Difficulty:   ★★★☆☆   undergraduate

Consider the following function, which is the product of a certain pair of binomial coefficients:

    \[f(x) := \binom{x}{a}\binom{X-x}{b}.\]

We take abX >> 1 to be constants, and x to have domain [a – 1, Xb + 1] which implies Xab – 2 at least. As usual \binom{x}{a} := x!/a!(x-a)!, and this is extended beyond integer values by replacing each factorial with a Gamma function. Note the independent variable x appears in the upper entries of the binomial coefficients. Curiously, from inspection f is well-approximated by a gaussian curve. To gain some insight, for integer values of the parameters f is the polynomial:

    \[(a! b!)^{-1}x(x-1)\cdots(x-a+1)\cdot(X-b+1-x)\cdots(X-x).\]

This has many zeroes, and sometimes oscillates wildly in between them, hence the domain of x specified earlier.

plot of function and a gaussian curve approximation
Figure: The function for a = 7, b = 10, and X = 20. It is shown beyond our stated domain, which is bounded by the roots at x = 6 and 11. The gaussian uses our estimated centre of x = 277/34 or approx. 8.147, whereas f‘s actual maximum occurs at around x = 8.139. The variance is from our approximate harmonic number formula, evaluated at the estimated centre point. Alternately, the “finite difference” derivatives give a poor estimate in this case. In general, the gaussian fit looks best for high parameter values with a near b, etc.

Now the usual approximations to a single binomial coefficient (actually, binomial distribution) are not helpful here. For example the de Moivre–Laplace approximation is a gaussian in terms of the lower entry in the binomial coefficient, whereas our x is in the upper entries. More promising is the approximation as a Poisson distribution, which leads to a polynomial which is itself gaussian-like, and motivated the previous post incidentally. However we proceed from first principles, by estimating the centre point and the second derivative there.

At the (central) maximum of f, the slope is zero. In general the derivative is f'(x) = f(x)(H_x-H_{x-a}-H_{X-x}+H_{X-b-x}), where the H’s are called harmonic numbers. There may not exist any simple explicit expression for the turning points. Instead, the ratio of nearby points is comparatively simple:

    \[\frac{f(x-1/2)}{f(x+1/2)} = \frac{(x-a+1/2)(X+1/2-x)}{(x+1/2)(X-b+1/2-x)},\]

using the properties of the binomial coefficient. The derivative is approximately zero where this ratio is unity, which occurs at:

    \[x_0 := \frac{2aX+a-b}{2(a+b)}.\]

This should be a close estimate for the central turning point. [To do better, substitute specific numbers for the parameters, and solve numerically.] It is typically not an integer. Our sought-for gaussian has form C\operatorname{exp}(-(x-x_0)^2/2\sigma^2). We set the height C := f(x_0). Only the width remains to be determined. The gaussian’s second derivative evaluated at its centre point is -C/\sigma^2. On the other hand:

    \[f''(x) = f'(x)^2/f(x) - (H_x^{(2)}-H_{x-a}^{(2)}+H_{X-x}^{(2)}-H_{X-b-x}^{(2)})f(x),\]

which uses the so-called harmonic numbers of order 2, and I incorporate the function and its derivative (both given earlier) for brevity of the expression. Matching the results at x_0 yields the variance parameter \sigma^2:

    \[\sigma^{-2} := H_{x_0}^{(2)}-H_{x_0-a}^{(2)}+H_{X-x_0}^{(2)}-H_{X-b-x_0}^{(2)},\]

using f'(x_0) \approx 0. (At large values the series H_x^{(2)} \approx \pi^2/6 -1/x +1/2x^2\cdots may give insight into the above.) But alternatively, we can approximate the second derivative using elementary operations. By sampling the function at x_0-1, x_0, and x_0+1 say, a “finite differences” approach gives approximate derivatives. We can use the simple ratio formula obtained earlier to reduce the sampling to one or two points only, which might gain some insight along the way (though I currently wonder if this is a dead end…).

Now f'(x_0-1/2) \approx f(x_0) - f(x_0-1), which becomes:

    \[\frac{2C(a+b)^3}{(2aX+a-b)(2bX-2b^2-2ab+a+3b)},\]

after using the ratio formula to obtain f(x_0-1) in terms of C. Similarly it turns out f'(x_0+1/2) is the negative of the above expression, but with a and b interchanged. Then a second derivative is: f''(x_0) \approx f'(x_0+1/2)-f'(x_0-1/2), but the combined expression does not simplify further so I won’t write it out. The last step is to set \tilde\sigma^2 := -C/f''(x_0), which is different to the earlier choice.

A slightly different approach uses f'(x_0-1/2) \approx (f(x_0+1/2)-f(x_0-3/2))/2, which may be expressed in terms of another sampled point E := f(x_0-1/2) = f(x_0+1/2). Similarly f'(x_0+1/2) \approx (f(x_0+3/2)-f(x_0-1/2))/2. The estimate for the second derivative follows, then later:

    \[\hat\sigma^2 := \frac{-2C(aX-b)(bX+a+2b)(bX-b^2-ab+a+2b)(aX-a^2-ab-b)}{E(a+b)^6}.\]

The expression is a little simpler in this approach, but at the cost of a second sample point. The use of f'(x_0-1) \approx f(x_0-1/2)-f(x_0-3/2) and f'(x_0+1) \approx f(x_0+3/2)-f(x_0+1/2) instead leads to the same result.

Gaussian approximation to a certain polynomial

Difficulty:   ★★★☆☆   undergraduate

Consider the function:

    \[x^A(X-x)^B,\]

where the independent variable x ranges between 0 and X, and the exponents are large: A, B \gg 1. [We could call it a “polynomial”, though the exponents need not be integers. Specifically it is the product of “monomials” in x and Xx, so might possibly be called a “sparse” polynomial in this sense.] Surprisingly, it closely resembles a gaussian curve, over our specified domain x \in [0,X].

approximation to a certain polynomial using a gaussian curve
Figure: The polynomial with parameters A = 10, B = 13, and X = 11. Our gaussian approximation is visually indistinguishable near the centre. Outside our specified domain the polynomial tends to \pm\infty, and for each non-integer exponent the tail on one side becomes imaginary.

The turning point is where the derivative equals zero. This occurs when x is the surprisingly simple expression:

    \[\tilde x := \frac{X}{1+B/A},\]

at which the function has value:

    \[A^A B^B \Big(\frac{X}{A+B}\Big)^{A+B} \equiv (B/A)^B \tilde x^{A+B}.\]

An arbitrary gaussian, not necessarily normalised, has form: Ce^{-(x-D)^2/2\sigma^2}. This has centre D which we equate with \tilde x, and maximum height C which we set to the above expression. We can fix the final parameter, the standard deviation, by matching the second derivatives at the turning point. Hence the variance is:

    \[\sigma^2 = \frac{AB}{(A+B)^3}X^2 \equiv \frac{B}{A^2X}\tilde x^3.\]

Hence our gaussian approximation may be expressed:

    \[\boxed{(B/A)^B \tilde x^{A+B} \operatorname{exp}\Big( -\frac{(x-\tilde x)^2}{2B\tilde x^3/A^2X} \Big).}\]

The integral of the original curve turns out to be:

    \[\int_0^X x^A(X-x)^Bdx = \frac{X^{A+B+1}}{(A+B+1)\binom{A+B}{A}}.\]

This uses the binomial coefficient \binom{A+B}{A} := (A+B)!/A!B!, which is extended to non-integer values by replacing the factorials with Gamma functions. We could then apply Stirling’s approximation A! \approx \sqrt{2\pi A}(A/e)^A to each factorial, to obtain:

    \[\int\cdots \approx \frac{\sqrt{2\pi(A+B)}}{A+B+1}(B/A)^{B+1/2}\tilde x^{A+B+1},\]

though this is more messy to write out. On the other hand, the integral of the gaussian approximation is:

    \[\int_{-\infty}^\infty \operatorname{exp}\cdots = \sqrt\frac{2\pi}{A+B}(B/A)^{B+1/2}\tilde x^{A+B+1}.\]

We evaluated this integral over all real numbers, because the expression is simpler and still approximately the same. The ratio of the above two expressions is (A+B)/(A+B+1) \approx 1.

Squid Game asymptotic probability

Difficulty:   ★★★☆☆   undergraduate

We continue with the bridge-crossing scenario from Squid Game called “Glass Stepping Stones”. Here I analyse the probabilities for late contestants on a very long bridge, and the expectation value for a player’s progress. Last time I found an exact expression for the probability P(i,n) that player number i is still alive on the nth step. Now seems a good place to mention there are equivalent expressions, such as:

    \[P(i,n) = 1 - \binom{n}{i}\cdot{_2F_1}(i,n+1,i+1;-1),\]

where {_2F_1} is called the hypergeometric function, and the other term is the binomial coefficient which is read as “n choose i”. The factor 2^{-n} seen previously has been absorbed. We listed several special cases of the probabilities last time. Another is:

    \[P(i,2i-1) = \frac{1}{2}.\]

So remarkably, the chance player i will be alive on step 2i – 1 is precisely 50%! For fixed large i, if we plot the probability distribution as a function of n it looks smooth, remaining near 1 for early steps before rolling down to near 0. Qualitatively this looks like a \tan^{-1}, tanh, or erf (“error function”). We reflect these curves, centre them on the value 1/2 at n = 2i – 1, and scale them linearly: so they have the appropriate bounds and match the slope at the centre point. See the Figure below.

probability for player 400
Figure: The probability player number i = 400 is still alive at step n. The scaled tanh function is close to the exact curve, while the scaled erf function nearly overlays it.

In fact the slope used in the Figure is only an approximation as described next, but this is a deliberate choice to show it still gives a good fit. The exact slope \partial P/\partial n evaluated at n = 2i – 1 seems a little too complicated to be useful. It contains a derivative of the hypergeometric function, which appears to approach -1/2 in the limit of large i, hence the slope at the centre point is asymptotic to -1/\sqrt{4\pi i}. Another approach is to consider the subsequent bridge step, for which:

    \[P(i,2i) = \frac{1}{2} - \frac{\Gamma(i+1/2)}{2\sqrt\pi\,\Gamma(i+1)},\]

which uses the Gamma function. The difference P(i,2i) – P(i,2i-1) approximates the slope, and is also asymptotic to -1/\sqrt{4\pi i} as i \rightarrow \infty. Hence our approximation for late players is:

    \[P(i,n) \approx \frac{1}{2}\Big( 1-\operatorname{erf}\frac{n+1-2i}{2\sqrt i} \Big).\]

Now the error function by definition is the integral of a gaussian curve. The derivative with respect to n of our approximation is precisely the righthand side below, which itself approximates the chance a late player dies on that step:

    \[P(i\textrm{ dies on }n) \approx -\frac{1}{\sqrt{4\pi i}}e^{-(n-2i+1)^2/4i}.\]

For fixed i this is a gaussian distribution with centre n = 2i – 1 and standard deviation \sqrt{2i}. It is normalised in the sense its integral over n \in \mathbb R is exactly 1, but physically we want the discrete sum over n \in \mathbb N^+. For the 10th player this is approx. 0.9991, which is already close. The exact chance for dying on a given step was determined in the previous article to be b_{i,n} = \binom{n-1}{i-1} \cdot 2^{-n}. The Figure below shows some early values. As before, we can extend the function beyond integer parameters.

probability of death
Figure: The probability player i will die on step n, given no foreknowledge (that is, before the game begins). The blue dots correspond to integer values of the parameters, which are physical. Contestants face a near-gaussian “hill of death” so to speak, which peaks at n = 2i – 2 and 2i – 1. I have included the “spires” at the back for sake of interest, as a peek into the rich structure for negative n, though this is unphysical.

The ratio b_{i,n}/b_{i,n+1} = 2(n-i+1)/n precisely. Hence for a given player, the adjacent steps n = 2i – 2 and 2i – 1 are equally likely locations their game will be “discontinued”. This is surely the maximum assuming integer parameters, apart from the first player for whom step 0 is safe but step 1 is their most likely “resting place”. Hence the reader might prefer to translate our gaussian approximation by half a step or so; apparently there are various approximations to a binomial coefficient. The subsequent step n = 2i is a more likely endpoint than the earlier step 2i – 3.

The expectation value for a given player’s death is:

    \[\sum_{n=1}^\infty n \cdot b_{i,n} = \binom{0}{i-1} \cdot {_2F_1}(1,-i,2-i;-1).\]

This function is very close to 2i, apart from a small oscillatory wiggle. At integer i it is singular, but from inspection of its plot it may be extended to a continuous function with value precisely 2i on physical parameter values (that is, integer i). Finally, for a given step n, the probability that some player breaks a tile is:

    \[\sum_{i=1}^n b_{i,n} = \frac{1}{2}\]

precisely, which is unsurprising. (Better terminology would be the nth “column”, as Henle+  use.) This assumes that n or more players have finished their run, otherwise the step is less likely to be broken.

Update, May 19: The death chance b_{i,n} is ½ times a binomial distribution in n. We previously found a gaussian curve for a given player i. Now. for a fixed step n, the de Moivre-Laplace approximation is a gaussian over the player number i:

    \[\frac{1}{\sqrt{2\pi(n-1)}}e^{-\frac{(i - n/2 - 1/2)^2}{(n - 1)/2}.\]

🡐 general solution | Squid Game bridge | conditional probabilities 🠦

More Squid Game probabilities

Difficulty:   ★★★☆☆   undergraduate

Last time I analysed the bridge-crossing scenario in the series Squid Game. In this fictional challenge called “Glass Stepping Stones”, the front contestant must leap forward along glass panels, choosing left or right each time, knowing only that one side is strengthened glass while the other will shatter. At least later players may learn from the choices of their forerunners. Here I use combinatorial arguments, derive a recurrence relation for the chance to die on a given step, and obtain an analytic solution with a hypergeometric function.

Again, write a_{i,n} or equivalently P(i,n) for the probability the ith player is still alive on the nth step. We showed these probabilities satisfy the recurrence relation a_{i,n} = \frac{1}{2}(a_{i-1,n-1}+a_{i,n-1}), along with initial conditions a_{1,n} = 1/2^n, and a_{i,1} = 1 for all players after the first. Equivalently, we can start from a_{0,n} := 0, and a_{i,0} := 1 for i \ge 1. This is a bit like Pascal’s triangle. Rather than adding the previous two terms, we take their average — which of course is the sum divided by two. And rather than 1’s at the sides, we have 0’s and 1’s.

Let’s write b_{i,n} for the likelihood the ith player will die upon landing on the nth step. Then b_{i,n} = a_{i,n-1} - a_{i,n}. These values satisfy the same recurrence relation as before:

    \[\begin{aligned} b_{i,n} &=a_{i,n-1} - a_{i,n} \\ &= \frac{1}{2}(a_{i-1,n-2}+a_{i,n-2}-a_{i,n-1}-a_{i-1,n-1}) \\ &= \frac{b_{i,n-1}+b_{i-1,n-1}}{2}.\end{aligned}\]

Only the initial conditions are different: b_{1,n} = 1/2^n, and b_{i,1} = 0 for all players after the first. It is aesthetic to begin a step earlier: b_{i,0} := 0 =: b_{0,n}, except for b_{0,0} = 1. The Table below shows a few early entries.

Table: Probability player i will die on step n itself, given no foreknowledge
step:

n = 1

2 3 4 5 6 7 8
player:

i = 1

1/2 1/4 1/8 1/16 1/32 1/64 1/128 1/256
2 0 1/4 1/4 3/16 1/8 5/64 3/64 7/256
3 0 0 1/8 3/16 3/16 5/32 15/128 21/256
4 0 0 0 1/16 1/8 5/32 5/32 35/256
5 0 0 0 0 1/32 5/64 15/128 35/256
6 0 0 0 0 0 1/64 3/64 21/256

Alternatively, there are elegant combinatorial arguments, for which I was initially inspired by another blog . For player i to die on step n, the previous i – 1 players must have died somewhere amongst the n – 1 prior steps. There are n – 1 choose i – 1″ ways to arrange these mistaken steps, out of 2^{n-1} total combinations of equal probability. Given any such arrangement, the next player has a 50% chance their following leap is a misstep, hence:

    \[b_{i,n} := P(i\textrm{ dies on }n) = \binom{n-1}{i-1}2^{-n}.\]

(I originally found this simple formula in a much more roundabout way, as often happens!) If i > n, the probability is zero. By similar reasoning, the chance that precisely i players have died by step n (inclusive) is:

    \[P(i\textrm{ players died by }n) = \binom{n}{i}2^{-n}.\]

A draft paper (Henle+ 2021 ) gives this result. It may also be obtained by summing over the previous formula: \sum_{n' = i}^n b_{i,n'}/2^{n-n'}. Note if player i died on n′, the next player must make nn′ correct guesses in a row, so that no-one else dies by the nth step.

Now the probability the ith player is alive at the nth step or further, is the probability any number of previous players died by step n or before. (So what is ruled out is i or more dying by this stage.) This is just a sum over the previous displayed formula: \sum_{i'=0}^{i-1}P(i\textrm{ died by }n), which computer algebra simplifies to:

    \[P(i,n) = 1-\binom{n}{i}2^{-n}\cdot{_2F_1}(1,i-n,i+1;-1).\]

Here _2F_1 is called the “(ordinary) hypergeometric function”. I gave a limited table of these probability values in the previous blog. For fixed integer i \ge 0, the entire expression reduces to a polynomial in n of order i – 1 with rational coefficients, all times 2^{-n}. For example the likelihood the 5th player is alive at step n is:

    \[P(5,n) = 2^{-n}\frac{1}{24}\big( n^4-2n^3+11n^2+14n+24 \big).\]

In general, for n < i we have a_{i,n} = 1, so players get some steps for free. The diagonal terms are a_{i,i} = 1-2^{-i}. This makes sense because for player i to not be alive on the ith step, every leap by previous players must also have been a misstep. Some results like these may also be shown using induction and the recurrence relation. I give more special cases in the next blog post. Yet the general formula works even for non-integers, though this is not physical, as the Figure below shows. For negative parameters (not shown) it has a rich structure, with singularities, and some probabilities values negative or exceeding 1.

probabilities plot
Figure: Probability that player i is alive at step n. The blue dots are for integer values of the parameters, which are physical. The probability decreases with step number. Visually, it is as if contestants start from the plateau at top-left, then slide down a hill of death 🙁

An alternative derivation of the probabilities is based on where the previous player died (if at all). If that player i – 1 died on step n′ < n, their follower must make nn′ correct guesses in a row to reach step n safely. Now sum the result from n′ = i – 1, which is the earliest step upon which they may conceivably die, up to n′ = n – 1. Add to this the chance the player was still alive at step n – 1 (which is one minus the sum of chances they died on step n′) as this guarantees the following player i is alive at n. Numerical testing shows the result is indeed equivalent. Hence rather than summing over players for a fixed step, one may instead sum over steps for a fixed player.

🡐 recurrence relation | Squid Game bridge | asymptotics 🠦

Squid Game bridge probabilities

Difficulty:   ★★★☆☆   undergraduate

In the popular Korean series Squid Game, one episode features a bridge-crossing game, whose probabilities are a fun challenge to calculate. (Warning: partial spoilers ahead.) In this cruel fictional scenario, called “Glass Stepping Stones” in the English subtitles, glass panels are suspended above a long fall. Contestants must leap between them. At each step the leading player is forced to choose left or right, knowing only that one panel is made of ordinary glass which will shatter, and the other is strengthened (“tempered”) glass which will hold. Later contestants cross the same bridge, and watch all previous attempts, so can learn the successes and failures.

The odds are simple for the first player. On each leap forward, there is a 50% chance they will fall to their death. Hence the chance of surviving N steps is 1/2^N, an exponential decrease. In the show (~30 minute mark), one player actually calculates this: 15 untested steps remain ahead of him, for a horrifyingly low 1/32768 chance of survival from that point. (Actually this is the third player, but more on that later.)

Squid Game bridge
The third contestant on the Squid Game bridge accurately calculates his chances as 1 in 32768

But suppose we do not know the outcome of earlier players. At the start, before anyone has moved, what is the probability a_{i,n} say, that player number i will still be alive on step number n? We showed a_{1,n} = 1/2^n. For player 2, it is certain they will survive step 1, by copying the first player if they were successful, or switching to the opposite pane if not. By extension player i is certain to survive the first i – 1 steps, hence a_{i,n} = 1 for all n \le i - 1.

In general, we set up a recurrence relation. But consider firstly the case i = 2. What is the chance they are alive at step n? If the first player died on step 1 (I mean, they leaped from the starting platform to an ordinary glass panel at step 1), then their successor must guess n – 1 tiles to reach step n successfully (I mean, to still be alive on panel n, and not fall through it). The probability of this combination of events is (1 - a_{1,1})/2^{n-1}. Similar reasoning applies to any step up to n – 1. However if the first player is still alive on n – 1, their follower is guaranteed to reach step n successfully. (Any later performance of the first player is irrelevant to their successors at step n.) The overall probability is the sum over these possibilities, which for an arbitrary player is:

    \[a_{i,n} = a_{i-1,n-1} + \sum_{k=1}^{n-1} ( a_{i-1,k-1} - a_{i-1,k} ) / 2^{n-k}.\]

This gives the probability in terms of the previous player. (Note the term in parentheses is the chance the previous player will die on step k precisely.) Hence starting from the initial conditions given earlier, we may build up an array of values using a spreadsheet, computer program, or computer algebra system. The latter choice preserves exact fractions, which feels very satisfying. Also we define a_{i,0} = 1 for convenience, where “step 0” may be interpreted as the ledge contestants safely start from. The Table below gives the first few values.

Table: Probability player i is still alive by step n, given no foreknowledge
step:

n = 1

2 3 4 5 6 7 8
player:

i = 1

1/2 1/4 1/8 1/16 1/32 1/64 1/128 1/256
2 1 3/4 1/2 5/16 3/16 7/64 1/16 9/256
3 1 1 7/8 11/16 1/2 11/32 29/128 37/256
4 1 1 1 15/16 13/16 21/32 1/2 93/256
5 1 1 1 1 31/32 57/64 99/128 163/256
6 1 1 1 1 1 63/64 15/16 219/256

In Squid Game the bridge has 18 (pairs of) steps. The probability of crossing the entire bridge is the probability of being alive on step 18, as the next leap is to safety. In theory the 9th player has nearly even odds of making it: a_{9,18} = 53381/131072 \approx 0.41, and the next player likely will: a_{10,18} = 77691/131072 \approx 0.59. In the show, 16 players compete in this challenge, so the last player has excellent odds, supposedly: a_{16,18} = 65493/65536 \approx 0.9993.  However our analysis does not account for human behaviour! In the show, time pressure, rivalries, and imperfect memory compete with logical decision making and the interests of the group as a whole. On the other hand, some players claim to distinguish the glass types by sight or sound, which would give an advantage. These make interesting plot elements, but would spoil the simplicity and purity of a mathematical analysis.

Returning to the recurrence relation, it simplifies to:

    \[a_{i,n} = \frac{a_{i,n-1} + a_{i-1,n-1}}{2}.\]

Hence each term is just the average of two previous terms. However I wanted to derive this via direct physical interpretation, not algebraic manipulation alone. This is intuitively satisfying. With the end result in mind, we relate P(i,n) \equiv a_{i,n} to the previous step and previous player. [Update, 21st April: A simpler way is to consider step 1. If player 1 guesses breaks it, there are i – 1 remaining players for the next n – 1 steps. If player 1 instead guesses correctly, there are i players for the next n – 1 steps. This gives the recurrence relation.] Consider the three cases for player i – 1: they (A) died before step n – 1, (B) died on step n – 1, or (C) made it safely to step n – 1 or further. The total probability is the sum of these cases:

    \[P(i,n) = P(i,n|A)P(A) + P(i,n|B)P(B) + P(i,n|C)P(C).\]

The first term for example is the “conditional probability” that i is alive at step n, given that case A occurred; times the probability of case A itself occurring. There is a similar decomposition to the above for P(i,n-1). Now most parts of the expression are straightforward. If i – 1 died at step n – 1, then the next player is definitely safe at that step, but may only guess at the following step, so P(i,n-1|B) = 1 and P(i,n|B) = 1/2. If i – 1 was safe at step n – 1 or further, then the next player is safe for an extra step: P(i,n-1|C) = 1 = P(i,n|C). For case A the conditional probabilities are more difficult, but we do not need to calculate them. Observe that if the previous player died before n – 1, then steps n – 1 and n are uncharted territory. Hence the chance the following player makes it to n safely, is half of whatever it was for them to reach n – 1 safely: P(i,n|A) = \frac{1}{2}P(i,n-1|A). Hence the decomposition becomes:

    \[P(i,n) = \frac{1}{2}P(i,n-1|A)P(A) + \frac{1}{2}P(B) + P(C).\]

But this is just \frac{1}{2}P(i,n-1) apart from the C term, as seen from expanding out the conditional cases. Now P(C) = P(i-1,n-1) \equiv a_{i-1,n-1}. It follows P(i,n) = \frac{1}{2}(P(i,n-1) + P(i-1,n-1)) as before. We did not need to evaluate P(A) or P(B), though this is straightforward.

Now that the reader (and author!) have more experience with conditional probability, let’s return to the third player in the Squid Game episode. Before anyone moved, he had chance a_{3,18} = 43/65536 \approx 1/1500 of surviving the bridge. This would seem to contradict the earlier calculation, which gave a lower chance by a factor of 21½, a surprising contrast! The black-masked “Front Man” said to the VIP observers, “I believe this next game will exceed your expectations” (~12:30 mark), but in this sense it did not 😆 . The distinction is the information learned. Conditional probability is a subtle and beautiful thing. If we know nothing about the previous attempts, nor the state of the bridge, the probabilities are our variables a_{i,n}. But if we are given the information player I died on step N for instance, then the following player has no information about later steps, and the bridge scenario is essentially reset from that point onwards. Hence P(3,18|2\textrm{ died on }3) = a_{3-2,18-3} = 1/2^{15}.

This scenario has been a valuable learning experience, as I had not worked with conditional probabilities before. Probability is very important in physics, particularly quantum physics where it is intrinsic (it is usually assumed). I originally came up with an incorrect recurrence relation, but realised this upon comparison with an article in Medium , which uses an elegant combinatorial argument. The scenario had already captured my attention, but realising my flaw drove further my need to understand. A related article  is also helpful; I recommend these if you find my discussion hard to follow. There is even a draft paper  on the Squid Game bridge probabilities! Presumably this is all little more than a specific application of textbook combinatorics. Still, it is fun to rediscover things for oneself.

🡐 ⸻ | Squid Game bridge | general solution 🠦

Coordinates adapted to observer 4-velocity field

Difficulty:   ★★★☆☆   undergraduate

Suppose you have a 4-velocity field \mathbf u, which might be interpreted physically as observers or a fluid. It may be useful to derive a time coordinate T which both coincides with proper time for the observers, and synchronises them in the usual way. Here we consider only the geodesic and vorticity-free case. Define:

    \[dT := -\mathbf u^\flat.\]

The “flat” symbol is just a fancy way to denote lowering the index, so the RHS is just -u_\mu. On the LHS, dT is the gradient of a scalar, which may be expressed using the familiar chain rule:

    \[dT = \frac{\partial T}{\partial x^0}dx^0 + \frac{\partial T}{\partial x^1}dx^1 + \cdots,\]

where x^\mu is a coordinate basis. Technically dT is a covector, with components (dT)_\mu = \partial T/\partial x^\mu in the cobasis dx^\mu. Similarly -\mathbf u^\flat = -u_0dx^0 -u_1dx^1 -\cdots, so we must match the components: \partial T/\partial x^\mu = -u_\mu. For our purposes we do not need to integrate explicitly, it is sufficient to know the original equation is well-defined. (No such time coordinate exists if there is acceleration or vorticity, which is a corollary of the Frobenius theorem, see Ellis+ 2012 §4.6.2.)

The new coordinate is timelike, since \langle dT,dT\rangle = \langle -\mathbf u^\flat,-\mathbf u^\flat\rangle = -1. One can show its change with proper time is dT/d\tau = \langle dT,\mathbf u\rangle = 1. Further, the T = \textrm{const} hypersurfaces are orthogonal to \mathbf u, since the normal vector (dT)^\sharp is parallel to \mathbf u. This orthogonality means that at each point, the hypersurface agrees with the usual simultaneity defined locally by the observer at that point. (Orthogonality corresponds to the Poincaré-Einstein convention, so named by H. Brown 2005 §4.6).

We want to replace the x^0-coordinate by T, and keep the others. What are the resulting metric components for this new coordinate? (Of course it’s the same metric, just a different expression of this tensor.) Notice the original components of the inverse metric satisfy g^{\mu\nu} = \langle dx^\mu,dx^\nu\rangle. Similarly one new component is g'^{TT} = \langle dT,dT\rangle = -1. Also g'^{Ti} = \langle dT,dx^i\rangle = -u^i, where i = 1,2,3. The g'^{iT} are the same by symmetry, and the remaining components are unchanged. Hence the new components in terms of original components are:

    \[g'^{\mu\nu} = \begin{pmatrix} -1 & -u^1 & -u^2 & -u^3 \\ -u^1 & g^{11} & g^{12} & g^{13} \\ -u^2 & g^{21} & g^{22} & g^{23} \\ -u^3 & g^{31} & g^{32} & g^{33} \end{pmatrix}.\]

The matrix inverse gives the new metric components g'_{\mu\nu}. The 4-velocity components are: u'_\mu = (-1,0,0,0) by the original equation. Also u'^T = \langle dT,\mathbf u\rangle = 1, and the u'^i = \langle dx^i,\mathbf u\rangle = u^i are unchanged. Hence u'^\mu = (1,u^1,u^2,u^3).

Anecdote: I used to write out dT = -u_0dx^0 - u_1dx^1 - \cdots, rearrange for dx^0, and substitute it into the original line element. This works but is clunky. My original inspiration was Taylor & Wheeler 2000 §B4, and I was thrilled to discover their derivation of Gullstrand-Painlevé coordinates from Schwarzschild coordinates plus certain radial velocities. (I give more references in MacLaurin 2019  §3.) I imagine that if a textbook presented the material above — given limited space and more formality — it may seem as if the more elegant approach were obvious. However I only (re?)-discovered it today by accident, using a specific 4-velocity from the previous post, and noticing the inverse metric components looked simple and familiar…

Total angular momentum in Schwarzschild spacetime

Difficulty:   ★★★☆☆   undergraduate

In relativity, distances and times are relative to an observer’s velocity. Hence one should be careful when defining an angular momentum. Speaking generally, a natural parametrisation of 4-velocities uses Killing vector fields, if the spacetime has any. In Schwarzschild spacetime, Hartle (2003 §9.3) defines the Killing energy per mass and Killing angular momentum per mass as:

    \[e := -\langle\mathbf u,\partial_t\rangle, \qquad \ell_z := \langle\mathbf u,\partial_\phi\rangle.\]

The angle brackets are the metric scalar product, \phi has range [0,2\pi), and we will take \mathbf u to be a 4-velocity.  I have relabeled Hartle’s \ell as \ell_z. While \partial_t and \partial_\phi are just coordinate basis vectors for Schwarzschild coordinates, as Killing vector fields (KVFs) they have geometric significance beyond this convenient description. [\partial_t is the unique KVF which as r \rightarrow \infty in “our universe” (region I), is future-pointing with squared-norm -1. On the other hand \partial_\phi has squared-norm r^2\sin^2\theta, so is partly determined by having maximum squared-norm r^2 amongst points at any given r, which implies it is orthogonal to \partial_t, although the specific orientation is not otherwise determined geometrically.]

In fact \ell_z is the portion of angular momentum (per mass) about the z-axis. In Cartesian coordinates (t,x,y,z), the KVF \mathbf Z := \partial_\phi has components (0,-y,x,0). Similarly, we can define angular momentum about the x-axis using the KVF X^\mu := (0,0,z,-y), which in spherical coordinates is (0,0,\sin\phi,\cot\theta\cos\phi). For the y-axis we use Y^\mu := (0,-z,0,x), which is (0,0,-\cos\phi,\cot\theta\sin\phi) in the original coordinates. Then:

    \[\ell_x := \langle\mathbf u,\mathbf X\rangle, \qquad \ell_y := \langle\mathbf u,\mathbf Y\rangle, \qquad \ell_z = \langle\mathbf u,\mathbf Z\rangle.\]

Hence we can define the total angular momentum as the Pythagorean relation \ell^2 := \ell_x^2 + \ell_y^2 + \ell_z^2, that is:

    \[\ell^2 := \langle\mathbf u,\mathbf X\rangle^2 + \langle\mathbf u,\mathbf Y\rangle^2 + \langle\mathbf u,\mathbf Z\rangle^2.\]

This is a natural quantity determined from the geometry alone, unlike the individual \ell_z etc. which rely on an arbitrary choice of axes. It is non-negative. I came up with this independently, but do not claim originality, and the general idea could be centuries old. Similarly quantum mechanics uses J^2 and J_z, which I first encountered in a 3rd year course, although these are operators on flat space.

One 4-velocity field which conveniently implements the total angular momentum is:

    \[u^\mu = \bigg( \frac{e}{1-\frac{2M}{r}}, \pm\sqrt{e^2-\Big(1-\frac{2M}{r}\Big)\Big(1+\frac{\ell^2}{r^2}\Big)},\frac{\ell}{r^2},0 \bigg).\]

In this case the axial momenta are \ell_x = \ell\sin\phi, \ell_y = -\ell\cos\phi, and \ell_z = 0, for a total Killing angular momentum \ell as claimed. There are restrictions on the parameters, in particular the “\pm” must be a minus in the black hole interior. Incidentally this field is geodesic since \nabla_{\mathbf u}\mathbf u = 0. It also has zero vorticity (I wrote a technical post on the kinematic decomposition previously), so we might say it has macroscopic rotation but no microscopic rotation. Another possibility is in terms of \ell_z and \ell:

    \[u^\mu = \bigg( \cdots, \pm\frac{\sqrt{\ell^2-\ell_z^2\csc^2\theta}}{r^2}, \frac{\ell_z}{r^2\sin^2\theta} \bigg),\]

where the first two components are the same as the previous vector. The expressions are simpler with a lowered index u_\mu.

Swing boats and pump tracks

Difficulty:   ★★☆☆☆   high school

Possibly my favourite amusement park ride I have ever tried was called a Schiffschaukel, which is German for “ship swing”. It was powered only by the rider, with no motor of any kind. With enough skill, you could make a complete 360° vertical loop! In place of the rope or chain links in an ordinary swing, it was supported by rigid struts, which helped achieve a complete rotation. I loved the physical challenge of it. It was easy to start it rocking. I also managed to get my body above the horizontal, but not to make a complete loop. However one man I watched could not only achieve a loop, but gauged his speed so as to hang upside down for a few seconds, neck craned forward to watch the ground, before he gradually tipped over.

ship swing
Schiffschaukel or ship swing, from Oktoberfest.net. The one I rode was in a local Volksfest (“people’s festival”) in a town outside of Munich, in 2007. It looked a lot like the one in this photo, with a few cosmetic differences: as I recall I had no waist harness but just one foot strapped in, there were two supporting struts rather than four, no “boat” decoration, and the swing was shorter.

In my experience most people, including Germans, have never heard of it. Wikipedia calls it a “ship swing”, in contrast to a “pirate ship” ride which is motorised. I have been on a couple of the latter rides in Australia: huge structures which seat dozens of people, and are quite different to the self-propelled swing. The unpowered type date from the 1800s apparently; many held two people, and had ropes to pull on. The German language Wikipedia has more information: here is an automatic translation. Also it turns out there is a modern Estonian sport kiiking (meaning “swinging”) which is the same concept. The current world record for a revolution is a swing with radius just over 7 metres, achieved by an Olympic medal-winning rower.

But how is it even possible to make a loop? I found it counter-intuitive, like pulling yourself up by the bootstraps, as the proverbial saying goes. Similarly, I recently learned of “pump tracks” for bicycles, from my brother. It is possible to propel yourself around such a course, which consists of mounds and banked corners, without pedalling! The energy comes from raising or lowering your body with the correct timing. In fact you can even propel yourself on flat ground this way, by making appropriate turns and body maneuvers.

angular momentum for ship swing
Conservation of angular momentum along a circular arc. The rider travels from left to right, with their centre of mass initially following the middle arc. After standing up, their centre of mass follows the inner arc, and their velocity is increased. (The solid arc is the ground or base of the ship, say.)

Conservation of angular momentum explains both scenarios. Consider a circular, concave segment of track, as with the ship swing. Approximate the person, plus bike or “ship”, as a point object with mass m. Suppose this point, their centre-of-mass, moves on a circle with radius R (note this is less than the radius of the track arc). The angular momentum, as determined from the centre of curvature, is R\times mV, where V is the speed. (Technically this is a vector cross-product, but in this simple example where the vectors are at 90°, we can more or less treat it as an ordinary multiplication of numbers.) Now suppose the rider stands up straighter, so their centre of mass moves a “height” h towards the centre of curvature. The angular momentum is (R-h)\times mV', where V' is the new speed. But since angular momentum is conserved, this must match the previous expression, hence:

    \[\frac{V'}{V} = \frac{R}{R-h} = 1 + \frac{h}{R-h} > 1.\]

The speed has increased! Note the rider put work in, by not only resisting the centrifugal acceleration V^2/R, but moving against it, in the opposite direction. The forces can be severe. For a swing to barely reach the top, it must have a speed V = \sqrt{2g\cdot 2R} at the bottom of the arc, by conservation of kinetic and gravitational potential energy. Here g \approx 9.8m/s^2 is the “acceleration” due to gravity. The centripetal acceleration at the lowest part is V^2/R = 4g, which is independent of the radius. Including the weight due to gravity gives a total of 5g — that is, a g-force of 5!

The swing rider should probably bend their knees when reaching the maximum height of their arc, to reverse the process and complete the cycle. If their speed is zero at this point (so a full revolution is not achieved), crouching has no effect on the speed, which is zero after all. In this case the maximum speed — measured when at the lowest part of the circle — grows by the fixed proportion R/(R-h) with each swing. This is exponential growth! Once revolution is achieved, the rider can gain further speed by crouching at the top of the circle. While this reduces their speed by the same proportion R/(R-h), over one revolution there is a net gain, since the speed at the bottom is greater due to gravitational fall.

1/2 m Vlow^2 = mg2R + 1/2 m Vtop^2

Vlow^2 = 4gR + Vtop^2

If Vtop is speed before crouch, then should really use Vtop/ratio. So:

Vlow^2*ratio^2 = 4gR + Vtop^2/ratio^2

Suppose speed V just before bottom. Then V*ratio after standing. At top, before crouch, Vtop^2 = V^2*ratio^2 – 4gR. After crouch, new speed^2 is: Vtop^2/ratio^2 = V^2 – 4gR/ratio^2. At bottom, before standing, Vlow^2 = V^2 + 4gR(1-1/ratio^2).

1/ratio = (R-h)/R = 1-h/R. So 1-1/ratio^2 = 1 – (1-h/R)^2 = 2h/R – h^2/R^2

So 4gR( … ) = 8gh – 4gh^2/R = 4gh(2-h/R).

So with each revolution, the kinetic energy increases by a fixed amount (not fixed proportion). This is linear growth. Hence the speed increases indefinitely, but slowing rate. Of course this ignores friction, the rider’s strength, etc., assumes instantaneous repositioning

On a pump track the strategy is analogous: in a valley you should stand up, and over a bump you should crouch down, as one webpage explains. In both cases you move closer to the centre of curvature. In internet forums, people say similar motions arise naturally in skateboarding, surfing, snowboarding, and other sports. Returning to the ship swing, I was not told any strategy at the time. On certain swings I sensed my speed diminish, so knew I had made a mistake. At the bottom of the swing, when the sum of centrifugal force and gravity is maximum, it felt safer to “go with the flow”, and unnatural to resist. But resist is exactly what I should have done.

🡐 ⸻ | everyday physics | ⸻ 🠦

Affine connection for axial symmetry

Difficulty:   ★★★★☆   graduate

Suppose you have an axially symmetric vector field. Can we define an affine connection which keeps the vectors “parallel”, under rotation about the axis? For example, we wish the vectors illustrated below to get parallel-transported around the circle:

an axially symmetric vector field
We seek an affine connection which declares vectors at a given radius “parallel”, for any vector field with circular symmetry / cylindrical symmetry / rotational symmetry.

Take Minkowski spacetime in cylindrical coordinates (t,r,\phi,z), with metric -dt^2+dr^2+r^2d\phi^2+dz^2, and consider a vector field \mathbf u whose components are independent of \phi:

    \[u^\mu = (A{\scriptstyle(t,r,z)},B{\scriptstyle(t,r,z)},C{\scriptstyle(t,r,z)},D{\scriptstyle(t,r,z)}).\]

The covariant derivative in the tangential direction \partial_\phi has components:

    \[(\nabla_{\partial_\phi}\mathbf u)^\alpha = \Big(0,-rC{\scriptstyle(t,r,z)},\frac{B{\scriptstyle(t,r,z)}}{r},0\Big).\]

We want this to vanish, but first a quick recap (Lee  §4, 5). Recall a connection is defined by \nabla_{\partial_\mu}\partial_\nu = \Gamma_{\mu\nu}^\alpha \partial_\alpha, in terms of our coordinate vector frame (\partial_t,\partial_r,\partial_\phi,\partial_z). This extends to a covariant derivative of arbitrary vectors and tensors, also denoted “\nabla”. The derivative of \mathbf u above assumed the Levi–Civita connection, which is inherited from the metric: it is the unique symmetric, metric-compatible connection. In that case the set of \Gamma are called Christoffel symbols, but in general they are called connection coefficients.

The offending Christoffel symbols which prevent our vector field from being parallel-transported are: \Gamma_{\phi\phi}^r = -r and \Gamma_{\phi r}^\phi = 1/r. But we are free to simply define a new connection for which these vanish: \tilde\Gamma_{\phi\phi}^r := 0 =: \tilde\Gamma_{\phi r}^\phi! Given a frame, any set of smooth functions \tilde\Gamma yields a valid connection (Lee, Lemma 4.10). It is natural to hold on to the other Christoffel symbols, to accord whatever respect remains for the metric. In fact only one is nonzero, \Gamma_{r\phi}^\phi = 1/r. To set this to zero would essentially deny the increase in circumference with the radius. Incidentally, even with keeping \Gamma_{r\phi}^\phi, the new connection is flat, meaning its associated Riemann curvature tensor vanishes.

The new connection may be expressed as the Levi–Civita one with a bilinear correction:

    \[\tilde\nabla_{\mathbf v}\mathbf u = \nabla_{\mathbf v}\mathbf u - \big( \frac{1}{r}\partial_\phi\otimes d\phi\otimes dr - r\partial_r\otimes d\phi\otimes d\phi \big) (\mathbf v,\mathbf u),\]

where \mathbf v and \mathbf u are arbitrary vectors, to be substituted into the 2nd and 3rd slots respectively of the (1,2)-tensor in parentheses. This is much simpler than it looks, as the terms just pick out r and \phi-components, and return basis vectors. Equivalently, the correction may be written -\langle d\phi,\mathbf v\rangle \big( r^{-1}\langle dr,\mathbf u\rangle \partial_\phi - r\langle d\phi,\mathbf u\rangle \partial_r \big), where the angle brackets mean contraction of a 1-form and vector. Notice from here and the two “offending” Christoffel symbols mentioned earlier, that only (the component of) the derivative in the \phi–direction is affected.

These expressions obscure some beautiful symmetry. Let’s raise one index and lower another, in the correction term:

    \[\tilde\nabla_{\mathbf v}\mathbf u = \nabla_{\mathbf v}\mathbf u - \frac{1}{r}\langle d\phi,\mathbf v\rangle \cdot (2\partial_r\wedge\partial_\phi)\lrcorner\mathbf u^\flat.\]

Here 2\partial_r\wedge\partial_\phi := \partial_r\otimes \partial_\phi - \partial_\phi\otimes\partial_r is a wedge product, \mathbf u^\flat is just the 1-form with components u_\mu, and “\boldsymbol\lrcorner” is a contraction. The correction’s components are simply -r^{-1}v^\phi\cdot(0,-u_\phi,u_r,0). This is a vector, even though some lowered indices appear in the expression. The correction is just a rotation in the r\phi–plane! From inspection of the diagram, this is unsurprising.

This is analogous to Fermi–Walker transport. Given a worldline, this corrects the (Levi–Civita connection) time-derivative \nabla_{\mathbf u} by a rotation in the plane spanned by \mathbf u and the 4-acceleration vector \nabla_{\mathbf u}\mathbf u. Under Fermi–Walker transport, orthonormal frames stay orthonormal over time, and their orientation agrees with gyroscopes. For both our connection and the Fermi-Walker derivative, there is a preferred differentiation direction, along which a rotation is added to the Levi-Civita derivative.

I previously wrote about a connection for a spherically symmetric vector field. This has been a good learning experience about connections other than Levi-Civita. Many of us completed general relativity courses in which the curvature quantities were merely formulae with no intuitive understanding. However questions from mathematicians like: “Which connection are you using?” prompted me to learn more. (At least I have never been asked which differential structure I am using, nor which point-set topology, which is fortunate for all involved. 🙂 ) There are various physically-motivated connections defined in research paper  §2. I intend to apply this to the rotating disc, and to an observer field in Schwarzschild spacetime. Also, I accidentally came across Rothman+ 2001  about parallel transport in Schwarzschild spacetime, with numerous followup papers by various authors. All of this struck me again with a sense of fascination about curvature: how rich and deep it is.

🡐 Spherical symmetry connection | Curvature | ⸻ 🠦