Euler and spinors

Difficulty:   ★★★☆☆   undergraduate

The discovery of spinors is most often credited to quantum physicists in the 1920s, and to Élie Cartan in the prior decade for an abstract mathematical approach. But it turns out the legendary mathematician Leonhard Euler discovered certain algebraic properties for the usual (2-component, Pauli) spinors, back in the 1700s! He gave a parametrisation for rotations in 3D, using essentially what were later known as Cayley-Klein parameters. There was not even the insight that each set of parameter values forms an interesting object in its own right. But we can recognise one key conceptual aspect of spinors in this accidental discovery: the association with rotations.

In 1771, Euler published a paper on orthogonal transformations, whose title translates to: “An algebraic problem that is notable for some quite extraordinary relations”. Euler scholars index it as “E407”, and the Latin original  is available from the Euler Archive website. I found an English translation  online, which also transcribes the original language.

Euler commences with the aim to find “nine numbers… arranged… in a square” which satisfy certain conditions. In modern notation this is a matrix M say, satisfying M^TM = 1 = MM^T, which describes an orthogonal matrix. While admittedly the paper is mostly abstract algebra, he is also motivated by geometry. In §3 he mentions that the equation for a surface is “transformed” under a change of [Cartesian] coordinates, including the case where the coordinate origins coincide. We recognise this (today, at least) as a rotation, possibly combined with a reflection. Euler also mentions “angles” (§4 and later), which is clearly geometric language.

He goes on to analyse orthogonal transformations in various dimensions. [I was impressed with the description of rotations about n(n – 1)/2 planes, in n dimensions, because I only first learned this in the technical context of higher-dimensional rotating black holes. It is only in 3D that rotations are specified by axis vectors.] Then near the end of the paper, Euler seeks orthogonal matrices containing only rational entries, a “Diophantine” problem. Recall rotation matrices typically contain many trigonometric terms like sin(θ) and cos(θ), which are irrational numbers for most values of the parameter θ. But using some free parameters “p, q, r, s”, Euler presents:

    \[\begin{tabular}{|c|c|c|}\hline $\frac{p^2+q^2+r^2+s^2}{u}$ & $\frac{2(qr+ps)}{u}$ & $\frac{2(qs-pr)}{u}$ \\ \hline $\frac{2(qr-ps)}{u}$ & $\frac{p^2-q^2+r^2-s^2}{u}$ & $\frac{2(pq+rs)}{u}$ \\ \hline $\frac{2(qs+pr)}{u}$ & $\frac{2(rs-pq)}{u}$ & $\frac{p^2-q^2-r^2+s^2}{u}$ \\ \hline\end{tabular},\]

where u := p^2 + q^2 + r^2 + s^2. (I have copied the style which Euler uses in some subsequent examples.) By choosing rational values of the parameters, the matrix entries will also be rational, however this is not our concern here. The matrix has determinant +1, so we know it represents a rotation. It turns out the parameters form the components of a spinor!! (p,q,r,s)/\sqrt u are the real components of a normalised spinor. We allow all real values, but will ignore some trivial cases. One aspect of spinors is clear from inspection: in the matrix the parameters occur only in pairs, hence the sets of values (p,q,r,s)/\sqrt u and -(p,q,r,s)/\sqrt u give rise to the same rotation matrix. (Those familiar with spinors will recall the spin group is the “double cover” of the rotation group.)

The standard approach is to combine the parameters into two complex numbers. But in the geometric algebra (or Clifford algebra) interpretation, a spinor is a rotation of sorts, or we might say a “half-rotation”. It is about the following plane:

    \[q\hat{\mathbf y}\wedge\hat{\mathbf z} + r\hat{\mathbf z}\wedge\hat{\mathbf x} + s\hat{\mathbf x}\wedge\hat{\mathbf y}.\]

(For those who haven’t seen the wedge product nor bivectors, you can visualise \hat{\mathbf x}\wedge\hat{\mathbf y} for example, as the parallelogram or plane spanned by those vectors. It also has a magnitude and handedness/orientation.) The sum is itself a plane, because we are in 3D. Dividing by \sqrt{q^2 + r^2 + s^2} gives a unit bivector. For the spinor, the angle of rotation θ/2 say, is given by (c.f. Doran & Lasenby 2003  §2.7.1):

    \[\cos(\theta/2) = p/\sqrt{u},\qquad \sin(\theta/2) = \sqrt{(q^2+r^2+s^2)/u}.\]

This determines θ/2 to within a range of 2π (if we include also the orientation of the plane). In contrast, the matrix given earlier effects a rotation by θ — twice the angle — about the same plane. This is because geometric algebra formulates rotations using two copies of the spinor. The matrix loses information about the sign of the spinor, and hence also any distinction between one or two full revolutions.

Euler extends the challenge of finding orthogonal matrices with rational entries to 4D. In §34 he parametrises matrices using “eight numbers at will a, b, c, d, p, q, r, s”. However the determinant of this matrix is -1, so it is not a rotation, and the parameters cannot form a spinor. Two of its eigenvalues are -1 and +1. Now the eigenvectors corresponding to distinct eigenvalues are orthogonal (a property most familiar for symmetric matrices, but it holds for orthogonal matrices also). It follows the matrix causes reflection along one axis, fixes an orthogonal axis, and rotates about the remaining plane. So it does not “include every possible solution” (§36). But I guess the parameters might form a subgroup of the Pin(4) group, the double cover of the 4-dimensional orthogonal group O(4).

Euler provides another 4×4 orthogonal matrix satisfying additional properties, in §36. This one has determinant +1, hence represents a rotation. It would appear no eigenvalues are +1 in general, so it may represent an arbitrary rotation. I guess the parameters (a,b,c,d,p,q,r,s)/\sqrt u, where I label by u the quantity (a^2+b^2+c^2+d^2)(p^2+q^2+r^2+s^2) mentioned by Euler, might form spinors of 4D space (not 3+1-dimensional spacetime). If so, these are members of Spin(4), the double cover of the 4D rotation group SO(4).

Euler was certainly unaware of his implicit discovery of spinors. His motive was to represent rotations using rational numbers, asserting these are “most suitable for use” (§11). Probably more significant today is that rotations are described by rational functions of spinor components. But the fact spinors would be rediscovered repeatedly in different applications suggests there is something very natural or Platonic about them. Euler says his 4D “solution deserves the more attention”, and that with a general procedure for this and higher dimensions, “Algebra… would be seen to grow very much.” (§36) He could not have anticipated how deserving of attention spinors are, nor their importance in algebra and elsewhere!

Spin-1/2 and a rotating mirror

Difficulty:   ★★☆☆☆   high school

Imagine a light ray reflecting off a mirror. If the mirror is rotating, the direction of the reflected beam will also rotate, but at twice the rate of the mirror! This follows from the way the angles work, if you recall for example “the angle of incidence equals the angle of reflection” and think about it carefully… Or, just play around with an animation until it looks right 😉 . In quantum physics this angle-doubling property turns up in the description of electrons for example, where it seems very mysterious and exotic (keywords: “spin-1/2” and “spinors”). So its appearance in an “ordinary” and intuitive setting is reassuring.

angles for the mirror and ray
Figure 1: A two-sided mirror and light ray (green arrows). The light arrives from the left, bounces off the mirror in the centre, and exits towards the upper-right. In this example the light beam has angle b = 165°, and the mirror 15°.

For simplicity let’s use a two-dimensional plane, as shown in Figure 1. We measure angles from the positive direction of the x-axis, as usual in polar coordinates. I choose to measure from the centre outwards, so for the incoming ray the angle assigned is the opposite of what the arrow might suggest. Label the incoming ray angle b, and mirror rotation angle m. Now if you increase m by some given amount, the outgoing ray angle increases by twice as much. But if you increase b instead, the outgoing ray angle decreases by the same amount. We also need an “initial condition” of sorts:  when the mirror is horizontal (m = 0°), and the ray arrives from directly above (b = 90°), the reflected beam is also at 90°. It follows:

reflected angle  =  2mb + 180°.

Now if the mirror rotates by 180°, the reflected ray completes a full 360° rotation, so is back to its original position. (We suppose the mirror is 2-sided.) If you hadn’t watched the rotation, you wouldn’t know anything had changed. But now suppose we make one side of the mirror red and the other blue, so the reflected ray takes on the colour of the closest side. Now the ray must make two complete revolutions, 720°, to get back to its original state! After one revolution it is back to the same position, but has a different colour, as the opposite side of the mirror faces the beam. Similarly, if the reflected ray is rotated by 180° in one direction, this is not the same as rotating by 180° in the opposite direction, as the colour is different. “Spinors” have these same features, except in place of red/blue their mathematical description picks up a factor of ±1.

two mirrors rotated by differing amounts
Figure 2: A two-sided coloured mirror. The incoming ray is depicted as grey, but think of it as white so its reflection is free to take on the appropriate colour. In the left image, the blue side of the mirror is facing upwards. Now slowly rotate the mirror by a half-revolution. (Actually I have drawn a slightly different angle just for variety.) The reflected ray changes angle by a full revolution, but has since turned red!

You might try animating this yourself. If you draw the rays with unit length, then the arrow for the incoming beam points from (cos b, sin b) to (0,0). The outgoing arrow points from (0,0) to -(cos(2mb), sin(2mb)), where a minus sign replaces the 180° term from earlier. The colour depends on whether the incoming ray is from 0 to 180° ahead of the mirror, or 0 to 180° behind. This is determined by the sign of sin(mb). It is convenient to allow angle parameters beyond 360°, which makes  no physical difference  at most only a change in colour, as we have learned 😀 . Below is Mathematica code I wrote, which uses slider controls for the angle parameters. The result is fun to play around with, and it helps make the angle-doubling more intuitive.

width = 0.3;
height = 0.02;
mirror = Polygon[{{-width,-height},{width,-height},{width,height},{-width,height}}, VertexColors->{Red,Red,Blue,Blue}];
Manipulate[ Graphics[ {Rotate[mirror,m], {Gray,Arrow[{{Cos[b],Sin[b]},{0,0}}]}, {If[Sin[b-m]>0,Blue,Red],Arrow[{{0,0},-{Cos[2m-b],Sin[2m-b]}}]}}, PlotRange->{{-1,1},{-1,1}} ], {{m,0},-2\[Pi],2\[Pi]}, {{b,3\[Pi]/4},0,2\[Pi]} ]

More on “covariant” versus “contravariant”

Difficulty:   ★★★☆☆   undergraduate

I have written on the topic of “covariant” and “contravariant” vectors (and higher-rank tensors) previously, and have been intending to write an update for a number of years. It must be noted some authors recommend avoiding these terms completely, including Schutz 2009  §3.3:

Most of these names are old-fashioned; ‘vectors’ and ‘dual vectors’ or ‘one-forms’ are the modern names.

Let’s return to Schutz’ reason soon. I have followed this naming practice myself, except I prefer to say “covector” or “1-form”, rather than “dual vector” which can be clumsy. (What would you call the vector which is the (metric) dual to a given 1-form: the “dual of a dual vector”, or a “dual-dual-vector”!?) People also talk about “up[stairs] indices” and “down[stairs] indices”, which seems alright.

But if you want to be cheeky, you might say a vector is covariant, while a 1-form is contravariant — the exact opposite of usual terminology! I remember a maths graduate student at some online school stating this. Similar sentiments are expressed by Spivak 1999 vol. 1  §4:

Nowadays such situations are always distinguished by calling the things which go in the same direction “covariant” and the things which go in the opposite direction “contravariant”. Classical terminology used these same words, and it just happens to have reversed this: a vector field is called a contravariant vector field, while a section of T*M is called a covariant vector field. And no one has had the gall or authority to reverse terminology so sanctified by years of usage. So it’s very easy to remember which kind of vector field is covariant, and which is contravariant — it’s just the opposite of what it logically ought to be.

While I love material which challenges my conceptual understanding, and Spivak’s humorous prose is fun; trying to be too “clever” with terms can hamper clear communication. Back to Schutz, who clarifies:

The reason that ‘co’ and ‘contra’ have been abandoned is that they mix up two very different things: the transformation of a basis is the expression of new vectors in terms of old ones; the transformation of components is the expression of the same object in terms of the new basis.

If you take the components of a fixed vector in a given basis, they transform contravariantly when the basis changes. But if you consider the vector as a whole — a single geometric object — and ask how a basis vector (specifically) is mapped to a new basis vector, the change is “covariant”. (Recall, as Schutz explains: “The property of transforming with basis vectors gives rise to the co in ‘covariant vector’ and its shorter form ‘covector’.”) In general, if you fix a set of components, by which I mean fixing an ordered set of numbers like (0,1,0,½) say, and then change the basis vectors these numbers refer to, then the change of a vector (as a whole entity) is “covariant”, so-called. For 1-forms, the converse of these statements apply. Some diagrams would make this paragraph clearer, but I leave this as an exercise, sorry.

However, it seems to me the most accurate description is that vectors don’t change at all, when you change a basis! Picture a vector as an arrow in space, then the arrow does not move. In this sense vectors are neither contravariant nor covariant, but invariant! (We could also say generally covariant, since they are geometric entities independent of any coordinate system. This is the usual modern meaning of the word “covariant”, but it’s a bit different to the covariant–contravariant distinction, so for clarity I avoid this language here.) In conclusion, one of the clearest descriptions is to simply say: vector, or 1-form / covector. Or, given historical usage, it is especially clear to say a vector’s “components transform contravariantly”. See the Table.

Table: transformation under basis change
object clarification vector 1-form
components same (co)vector, but components in a new basis contravariant covariant
(co)vector same (co)vector, treated as a whole invariant invariant
basis (co)vector transform to different (co)vector covariant contravariant

Addendum: I recently learned a dual basis need not be made up of 1-forms, as in the usual formulation in differential geometry, but of vectors instead! Recall the defining relation between a coordinate basis and cobasis: dx^\mu(\partial_\nu) = \delta^\mu_\nu, or \mathbf e^\mu(\mathbf e_\nu) = \delta^\mu_\nu for an arbitrary frame. In particular, each cobasis element is orthogonal to the “other” 3 vectors. But we can take the duals (dx^\mu)^\sharp, which are vectors but obey the same orthogonality relations, via the metric scalar product: \langle(dx^\mu)^\sharp,\partial_\nu\rangle = \delta^\mu_\nu. (By relating to the standard approach I may have made things look complicated, but this should be visualised as simply finding new vectors orthogonal to existing vectors.) (On a separate note, “dual” here means as an individual vector, not dual as a basis.) It seems this vector approach to a dual basis was the original one. In 1820 an Italian mathematician Giorgini distinguished between projezione oblique (parallel projections) and projezioni ortogonali (orthogonal projections) of line segments, now termed contravariant and covariant. According to one historian (Caparrini 2003 ), this was “one of the first clear-cut distinctions between the two types of projections in analytic geometry.” However priority goes to Hachette 1809 . Today in geometric algebra, also known as Clifford algebra, a dual basis is also defined as vectors not 1-forms, via \langle\mathbf e^\mu,\mathbf e_\nu\rangle = \delta^\mu_\nu (Doran & Lasenby 2003  §4.3).

Squid Game conditional probabilities

Difficulty:   ★★★☆☆   undergraduate

Earlier we analysed the probabilities for the bridge-crossing scenario in the Squid Game episode “VIPs”, which has “deadly high stakes” according to the Netflix blurb for the series. 🙂 So far, we made the assumption of no foreknowledge. This means our results for the players’ progress describe their chances as they stand before the game begins. Equivalently, if the game has started, our results assume the analyst knows nothing about prior contestants, and cannot view the state of the bridge.

But now, suppose we are told only that a specific player numbered i died on step number n. (That is, they stood safely on column n – 1, but chose wrongly amongst the next pair of glass panels on column n, breaking a pane and plummeting downwards.) Then the next player is definitely safe on step n, but has no information about later steps, so the game is essentially reset from that point. Hence the “conditional probability” that player I > i is still alive on step N > n is simply:

    \[P(I,N|i\textrm{ died on }n) = a_{I-i,N-n}.\]

Recall a_{i',n'} \equiv P(i',n') is the chance player i′ is alive on step n′ (given no information nor conditions). We labelled as b_{i',n'} = \binom{n'-1}{i'-1}2^{-n'} the chance they died on step n′ specifically, so analogously:

    \[P(I\textrm{ dies on }N|i\textrm{ died on }n) = b_{I-i,N-n}.\]

Now, suppose we are told only that a specific player I will die on step N. What is the probability for an earlier player’s progress? Bayes’ theorem says that given two events A and B, the conditional probabilities are related by P(A|B) = P(B|A)P(A)/P(B), which in our case is:

    \[\begin{aligned}c_{i,n} &:= P(i\textrm{ died on }n|I\textrm{ dies on }N) \\ &= \frac{b_{I-i,N-n}b_{i,n}}{b_{I,N}} \\ &= \frac{\binom{N-n-1}{I-i-1}\binom{n-1}{i-1}}{\binom{N-1}{I-1}}.\end{aligned}\]

The powers of 2 cancelled. The Table below shows some example numbers.

Table: Probability player i died on step n, given player I = 5 will die on step N = 8
step:

n = 1

2 3 4 5 6 7 8
player:

i = 1

4/7 2/7 4/35 1/35 0 0 0 0
2 0 2/7 12/35 9/35 4/35 0 0 0
3 0 0 4/35 9/35 12/35 2/7 0 0
4 0 0 0 1/35 4/35 2/7 4/7 0
5 0 0 0 0 0 0 0 1

In general, on any given row (fixed player i) the entries are nonzero only for n between i and N – I + i inclusive. This forms a diamond shape. For the row sum \sum_{n=i}^{N-I+i}c_{i,n} computer algebra returns a hypergeometric function times two binomial coefficients, which appears to simplify to 1 (for integer parameters) as expected, since player i must die somewhere. On any given column \sum_i c_{i,n} = (I-1)/(N-1) which is independent of n, meaning each step has equal chance that some player will die there. In particular the first entry itself takes this value: c_{1,1} = (I-1)/(N-1).

We examine other properties and special cases. By construction the last row and column are zeroes apart from c_{I,N} := 1; our general formula does not apply for n = N. If we are told where the second player I = 2 died, then player i = 1 has an equal chance 1/(N – 1) of dying on any earlier step. Also from the definition it is clear:

    \[c_{i,n} \equiv c_{I-i,N-n},\]

so the table is symmetric about its central point. The ratio of adjacent entries follows from the binomial coefficients:

    \[\begin{aligned}\frac{c_{i-1,n}}{c_{i,n}} &= \frac{(i-1)(N-I-n+i)}{(I-i)(n-i+1)}, \\ \frac{c_{i,n-1}}{c_{i,n}} &= \frac{(N-n)(n-i)}{(n-1)(N-I-n+i+1)}.\end{aligned}\]

It follows that at step n, player i = (I – 1)n/N and the subsequent player have the same “fail” chance. Presumably the maximum lies within this range. Physically we require the indices i and n to be integers. For the chosen Table parameters above, the relation just given is simply in/2, so every second column contains an adjacent pair of equal values. For the steps (columns) on the other hand, on n = (i – 1)(N – 1) / (I – 2) and the following step the “elimination” chance is equal. Note these special index values are linear functions of the other index (i or n respectively), where we regard I and N as fixed.

By rearranging terms we can write equivalent expressions for the chance to be eliminated, such as:

    \[c_{i,n} \equiv \frac{i\binom{N-I}{n-i}\binom{I-1}{i}}{n\binom{N-1}{n}}.\]

conditional probability in Squid Game
The probability Squid Game bridge contestants will expire on a given step, given the condition: player I = 9 will die on step N = 25. It forms a sort of boat- or saddle-shape. The blue dots are for integer index values, which are physical. In previous results the “ridge line” of high probability was at roughly n = 2i, but for the conditional probability it is spread out between the endpoint events, so in this case is roughly n = 3i.

For suitably large parameters, the probability resembles a gaussian curve. We can apply the de Moivre-Laplace approximation (with parameter p := ½ say) to the binomial coefficients. This gives a gaussian for a fixed step number n, as a function of the player number. I omit the height, but its centre and width are determined from the exponent which is:

    \[-\frac{\Big(i-\frac{N+nI-I-2n}{N-2}\Big)^2}{(n-1)(N-n-1)/2(N-2)}.\]

The spread is maximum at n = N/2, in this approximation. Now to obtain a gaussian approximation for a fixed player i, apply the results of the previous blog post using the substitutions x \rightarrow n-1, a \rightarrow i-1, b \rightarrow I-i-1, and X \rightarrow N-2. The centre is n_0 := x_0 + 1 = (i-1)(N-1)/(I-2)+1/2. One option for the height of the gaussians — when looking for a simple expression — is to use the sums 1 and (I – 1)/(N – 1) determined before. Recall for a normalised gaussian, the height 1/\sqrt{2\pi}\sigma is inversely proportional to the standard deviation.

There are other conditional probability questions one could pose. Suppose we are given a window, bounded by the events that player J died on column L, and later player K dies on column M? Inside this window, the probabilities reduce to our above analysis: the chance i dies on n is just c_{i-J,n-L}, where we also substitute I \rightarrow K-J and N \rightarrow M-L. As another possible scenario to analyse, we might be informed that player I is alive on step N. Then we would not know how far they progressed, just that it was at least that far. Or, we might be told player I died on or before step N.

A concluding thought: Bayes’ theorem is deceptively simple-looking. I tried harder ways beforehand, trying to puzzle through the subtlety of conditional probability on my own. But with Bayes, the main result followed easily from our previous work.

🡐 asymptotics | Squid Game bridge | ⸻ 🠦

Gaussian approximation to a certain product of binomial coefficients

Difficulty:   ★★★☆☆   undergraduate

Consider the following function, which is the product of a certain pair of binomial coefficients:

    \[f(x) := \binom{x}{a}\binom{X-x}{b}.\]

We take abX >> 1 to be constants, and x to have domain [a – 1, Xb + 1] which implies Xab – 2 at least. As usual \binom{x}{a} := x!/a!(x-a)!, and this is extended beyond integer values by replacing each factorial with a Gamma function. Note the independent variable x appears in the upper entries of the binomial coefficients. Curiously, from inspection f is well-approximated by a gaussian curve. To gain some insight, for integer values of the parameters f is the polynomial:

    \[(a! b!)^{-1}x(x-1)\cdots(x-a+1)\cdot(X-b+1-x)\cdots(X-x).\]

This has many zeroes, and sometimes oscillates wildly in between them, hence the domain of x specified earlier.

plot of function and a gaussian curve approximation
Figure: The function for a = 7, b = 10, and X = 20. It is shown beyond our stated domain, which is bounded by the roots at x = 6 and 11. The gaussian uses our estimated centre of x = 277/34 or approx. 8.147, whereas f‘s actual maximum occurs at around x = 8.139. The variance is from our approximate harmonic number formula, evaluated at the estimated centre point. Alternately, the “finite difference” derivatives give a poor estimate in this case. In general, the gaussian fit looks best for high parameter values with a near b, etc.

Now the usual approximations to a single binomial coefficient (actually, binomial distribution) are not helpful here. For example the de Moivre–Laplace approximation is a gaussian in terms of the lower entry in the binomial coefficient, whereas our x is in the upper entries. More promising is the approximation as a Poisson distribution, which leads to a polynomial which is itself gaussian-like, and motivated the previous post incidentally. However we proceed from first principles, by estimating the centre point and the second derivative there.

At the (central) maximum of f, the slope is zero. In general the derivative is f'(x) = f(x)(H_x-H_{x-a}-H_{X-x}+H_{X-b-x}), where the H’s are called harmonic numbers. There may not exist any simple explicit expression for the turning points. Instead, the ratio of nearby points is comparatively simple:

    \[\frac{f(x-1/2)}{f(x+1/2)} = \frac{(x-a+1/2)(X+1/2-x)}{(x+1/2)(X-b+1/2-x)},\]

using the properties of the binomial coefficient. The derivative is approximately zero where this ratio is unity, which occurs at:

    \[x_0 := \frac{2aX+a-b}{2(a+b)}.\]

This should be a close estimate for the central turning point. [To do better, substitute specific numbers for the parameters, and solve numerically.] It is typically not an integer. Our sought-for gaussian has form C\operatorname{exp}(-(x-x_0)^2/2\sigma^2). We set the height C := f(x_0). Only the width remains to be determined. The gaussian’s second derivative evaluated at its centre point is -C/\sigma^2. On the other hand:

    \[f''(x) = f'(x)^2/f(x) - (H_x^{(2)}-H_{x-a}^{(2)}+H_{X-x}^{(2)}-H_{X-b-x}^{(2)})f(x),\]

which uses the so-called harmonic numbers of order 2, and I incorporate the function and its derivative (both given earlier) for brevity of the expression. Matching the results at x_0 yields the variance parameter \sigma^2:

    \[\sigma^{-2} := H_{x_0}^{(2)}-H_{x_0-a}^{(2)}+H_{X-x_0}^{(2)}-H_{X-b-x_0}^{(2)},\]

using f'(x_0) \approx 0. (At large values the series H_x^{(2)} \approx \pi^2/6 -1/x +1/2x^2\cdots may give insight into the above.) But alternatively, we can approximate the second derivative using elementary operations. By sampling the function at x_0-1, x_0, and x_0+1 say, a “finite differences” approach gives approximate derivatives. We can use the simple ratio formula obtained earlier to reduce the sampling to one or two points only, which might gain some insight along the way (though I currently wonder if this is a dead end…).

Now f'(x_0-1/2) \approx f(x_0) - f(x_0-1), which becomes:

    \[\frac{2C(a+b)^3}{(2aX+a-b)(2bX-2b^2-2ab+a+3b)},\]

after using the ratio formula to obtain f(x_0-1) in terms of C. Similarly it turns out f'(x_0+1/2) is the negative of the above expression, but with a and b interchanged. Then a second derivative is: f''(x_0) \approx f'(x_0+1/2)-f'(x_0-1/2), but the combined expression does not simplify further so I won’t write it out. The last step is to set \tilde\sigma^2 := -C/f''(x_0), which is different to the earlier choice.

A slightly different approach uses f'(x_0-1/2) \approx (f(x_0+1/2)-f(x_0-3/2))/2, which may be expressed in terms of another sampled point E := f(x_0-1/2) = f(x_0+1/2). Similarly f'(x_0+1/2) \approx (f(x_0+3/2)-f(x_0-1/2))/2. The estimate for the second derivative follows, then later:

    \[\hat\sigma^2 := \frac{-2C(aX-b)(bX+a+2b)(bX-b^2-ab+a+2b)(aX-a^2-ab-b)}{E(a+b)^6}.\]

The expression is a little simpler in this approach, but at the cost of a second sample point. The use of f'(x_0-1) \approx f(x_0-1/2)-f(x_0-3/2) and f'(x_0+1) \approx f(x_0+3/2)-f(x_0+1/2) instead leads to the same result.

Gaussian approximation to a certain polynomial

Difficulty:   ★★★☆☆   undergraduate

Consider the function:

    \[x^A(X-x)^B,\]

where the independent variable x ranges between 0 and X, and the exponents are large: A, B \gg 1. [We could call it a “polynomial”, though the exponents need not be integers. Specifically it is the product of “monomials” in x and Xx, so might possibly be called a “sparse” polynomial in this sense.] Surprisingly, it closely resembles a gaussian curve, over our specified domain x \in [0,X].

approximation to a certain polynomial using a gaussian curve
Figure: The polynomial with parameters A = 10, B = 13, and X = 11. Our gaussian approximation is visually indistinguishable near the centre. Outside our specified domain the polynomial tends to \pm\infty, and for each non-integer exponent the tail on one side becomes imaginary.

The turning point is where the derivative equals zero. This occurs when x is the surprisingly simple expression:

    \[\tilde x := \frac{X}{1+B/A},\]

at which the function has value:

    \[A^A B^B \Big(\frac{X}{A+B}\Big)^{A+B} \equiv (B/A)^B \tilde x^{A+B}.\]

An arbitrary gaussian, not necessarily normalised, has form: Ce^{-(x-D)^2/2\sigma^2}. This has centre D which we equate with \tilde x, and maximum height C which we set to the above expression. We can fix the final parameter, the standard deviation, by matching the second derivatives at the turning point. Hence the variance is:

    \[\sigma^2 = \frac{AB}{(A+B)^3}X^2 \equiv \frac{B}{A^2X}\tilde x^3.\]

Hence our gaussian approximation may be expressed:

    \[\boxed{(B/A)^B \tilde x^{A+B} \operatorname{exp}\Big( -\frac{(x-\tilde x)^2}{2B\tilde x^3/A^2X} \Big).}\]

The integral of the original curve turns out to be:

    \[\int_0^X x^A(X-x)^Bdx = \frac{X^{A+B+1}}{(A+B+1)\binom{A+B}{A}}.\]

This uses the binomial coefficient \binom{A+B}{A} := (A+B)!/A!B!, which is extended to non-integer values by replacing the factorials with Gamma functions. We could then apply Stirling’s approximation A! \approx \sqrt{2\pi A}(A/e)^A to each factorial, to obtain:

    \[\int\cdots \approx \frac{\sqrt{2\pi(A+B)}}{A+B+1}(B/A)^{B+1/2}\tilde x^{A+B+1},\]

though this is more messy to write out. On the other hand, the integral of the gaussian approximation is:

    \[\int_{-\infty}^\infty \operatorname{exp}\cdots = \sqrt\frac{2\pi}{A+B}(B/A)^{B+1/2}\tilde x^{A+B+1}.\]

We evaluated this integral over all real numbers, because the expression is simpler and still approximately the same. The ratio of the above two expressions is (A+B)/(A+B+1) \approx 1.

Squid Game asymptotic probability

Difficulty:   ★★★☆☆   undergraduate

We continue with the bridge-crossing scenario from Squid Game called “Glass Stepping Stones”. Here I analyse the probabilities for late contestants on a very long bridge, and the expectation value for a player’s progress. Last time I found an exact expression for the probability P(i,n) that player number i is still alive on the nth step. Now seems a good place to mention there are equivalent expressions, such as:

    \[P(i,n) = 1 - \binom{n}{i}\cdot{_2F_1}(i,n+1,i+1;-1),\]

where {_2F_1} is called the hypergeometric function, and the other term is the binomial coefficient which is read as “n choose i”. The factor 2^{-n} seen previously has been absorbed. We listed several special cases of the probabilities last time. Another is:

    \[P(i,2i-1) = \frac{1}{2}.\]

So remarkably, the chance player i will be alive on step 2i – 1 is precisely 50%! For fixed large i, if we plot the probability distribution as a function of n it looks smooth, remaining near 1 for early steps before rolling down to near 0. Qualitatively this looks like a \tan^{-1}, tanh, or erf (“error function”). We reflect these curves, centre them on the value 1/2 at n = 2i – 1, and scale them linearly: so they have the appropriate bounds and match the slope at the centre point. See the Figure below.

probability for player 400
Figure: The probability player number i = 400 is still alive at step n. The scaled tanh function is close to the exact curve, while the scaled erf function nearly overlays it.

In fact the slope used in the Figure is only an approximation as described next, but this is a deliberate choice to show it still gives a good fit. The exact slope \partial P/\partial n evaluated at n = 2i – 1 seems a little too complicated to be useful. It contains a derivative of the hypergeometric function, which appears to approach -1/2 in the limit of large i, hence the slope at the centre point is asymptotic to -1/\sqrt{4\pi i}. Another approach is to consider the subsequent bridge step, for which:

    \[P(i,2i) = \frac{1}{2} - \frac{\Gamma(i+1/2)}{2\sqrt\pi\,\Gamma(i+1)},\]

which uses the Gamma function. The difference P(i,2i) – P(i,2i-1) approximates the slope, and is also asymptotic to -1/\sqrt{4\pi i} as i \rightarrow \infty. Hence our approximation for late players is:

    \[P(i,n) \approx \frac{1}{2}\Big( 1-\operatorname{erf}\frac{n+1-2i}{2\sqrt i} \Big).\]

Now the error function by definition is the integral of a gaussian curve. The derivative with respect to n of our approximation is precisely the righthand side below, which itself approximates the chance a late player dies on that step:

    \[P(i\textrm{ dies on }n) \approx -\frac{1}{\sqrt{4\pi i}}e^{-(n-2i+1)^2/4i}.\]

For fixed i this is a gaussian distribution with centre n = 2i – 1 and standard deviation \sqrt{2i}. It is normalised in the sense its integral over n \in \mathbb R is exactly 1, but physically we want the discrete sum over n \in \mathbb N^+. For the 10th player this is approx. 0.9991, which is already close. The exact chance for dying on a given step was determined in the previous article to be b_{i,n} = \binom{n-1}{i-1} \cdot 2^{-n}. The Figure below shows some early values. As before, we can extend the function beyond integer parameters.

probability of death
Figure: The probability player i will die on step n, given no foreknowledge (that is, before the game begins). The blue dots correspond to integer values of the parameters, which are physical. Contestants face a near-gaussian “hill of death” so to speak, which peaks at n = 2i – 2 and 2i – 1. I have included the “spires” at the back for sake of interest, as a peek into the rich structure for negative n, though this is unphysical.

The ratio b_{i,n}/b_{i,n+1} = 2(n-i+1)/n precisely. Hence for a given player, the adjacent steps n = 2i – 2 and 2i – 1 are equally likely locations their game will be “discontinued”. This is surely the maximum assuming integer parameters, apart from the first player for whom step 0 is safe but step 1 is their most likely “resting place”. Hence the reader might prefer to translate our gaussian approximation by half a step or so; apparently there are various approximations to a binomial coefficient. The subsequent step n = 2i is a more likely endpoint than the earlier step 2i – 3.

The expectation value for a given player’s death is:

    \[\sum_{n=1}^\infty n \cdot b_{i,n} = \binom{0}{i-1} \cdot {_2F_1}(1,-i,2-i;-1).\]

This function is very close to 2i, apart from a small oscillatory wiggle. At integer i it is singular, but from inspection of its plot it may be extended to a continuous function with value precisely 2i on physical parameter values (that is, integer i). Finally, for a given step n, the probability that some player breaks a tile is:

    \[\sum_{i=1}^n b_{i,n} = \frac{1}{2}\]

precisely, which is unsurprising. (Better terminology would be the nth “column”, as Henle+  use.) This assumes that n or more players have finished their run, otherwise the step is less likely to be broken.

Update, May 19: The death chance b_{i,n} is ½ times a binomial distribution in n. We previously found a gaussian curve for a given player i. Now. for a fixed step n, the de Moivre-Laplace approximation is a gaussian over the player number i:

    \[\frac{1}{\sqrt{2\pi(n-1)}}e^{-\frac{(i - n/2 - 1/2)^2}{(n - 1)/2}.\]

🡐 general solution | Squid Game bridge | conditional probabilities 🠦

More Squid Game probabilities

Difficulty:   ★★★☆☆   undergraduate

Last time I analysed the bridge-crossing scenario in the series Squid Game. In this fictional challenge called “Glass Stepping Stones”, the front contestant must leap forward along glass panels, choosing left or right each time, knowing only that one side is strengthened glass while the other will shatter. At least later players may learn from the choices of their forerunners. Here I use combinatorial arguments, derive a recurrence relation for the chance to die on a given step, and obtain an analytic solution with a hypergeometric function.

Again, write a_{i,n} or equivalently P(i,n) for the probability the ith player is still alive on the nth step. We showed these probabilities satisfy the recurrence relation a_{i,n} = \frac{1}{2}(a_{i-1,n-1}+a_{i,n-1}), along with initial conditions a_{1,n} = 1/2^n, and a_{i,1} = 1 for all players after the first. Equivalently, we can start from a_{0,n} := 0, and a_{i,0} := 1 for i \ge 1. This is a bit like Pascal’s triangle. Rather than adding the previous two terms, we take their average — which of course is the sum divided by two. And rather than 1’s at the sides, we have 0’s and 1’s.

Let’s write b_{i,n} for the likelihood the ith player will die upon landing on the nth step. Then b_{i,n} = a_{i,n-1} - a_{i,n}. These values satisfy the same recurrence relation as before:

    \[\begin{aligned} b_{i,n} &=a_{i,n-1} - a_{i,n} \\ &= \frac{1}{2}(a_{i-1,n-2}+a_{i,n-2}-a_{i,n-1}-a_{i-1,n-1}) \\ &= \frac{b_{i,n-1}+b_{i-1,n-1}}{2}.\end{aligned}\]

Only the initial conditions are different: b_{1,n} = 1/2^n, and b_{i,1} = 0 for all players after the first. It is aesthetic to begin a step earlier: b_{i,0} := 0 =: b_{0,n}, except for b_{0,0} = 1. The Table below shows a few early entries.

Table: Probability player i will die on step n itself, given no foreknowledge
step:

n = 1

2 3 4 5 6 7 8
player:

i = 1

1/2 1/4 1/8 1/16 1/32 1/64 1/128 1/256
2 0 1/4 1/4 3/16 1/8 5/64 3/64 7/256
3 0 0 1/8 3/16 3/16 5/32 15/128 21/256
4 0 0 0 1/16 1/8 5/32 5/32 35/256
5 0 0 0 0 1/32 5/64 15/128 35/256
6 0 0 0 0 0 1/64 3/64 21/256

Alternatively, there are elegant combinatorial arguments, for which I was initially inspired by another blog . For player i to die on step n, the previous i – 1 players must have died somewhere amongst the n – 1 prior steps. There are n – 1 choose i – 1″ ways to arrange these mistaken steps, out of 2^{n-1} total combinations of equal probability. Given any such arrangement, the next player has a 50% chance their following leap is a misstep, hence:

    \[b_{i,n} := P(i\textrm{ dies on }n) = \binom{n-1}{i-1}2^{-n}.\]

(I originally found this simple formula in a much more roundabout way, as often happens!) If i > n, the probability is zero. By similar reasoning, the chance that precisely i players have died by step n (inclusive) is:

    \[P(i\textrm{ players died by }n) = \binom{n}{i}2^{-n}.\]

A draft paper (Henle+ 2021 ) gives this result. It may also be obtained by summing over the previous formula: \sum_{n' = i}^n b_{i,n'}/2^{n-n'}. Note if player i died on n′, the next player must make nn′ correct guesses in a row, so that no-one else dies by the nth step.

Now the probability the ith player is alive at the nth step or further, is the probability any number of previous players died by step n or before. (So what is ruled out is i or more dying by this stage.) This is just a sum over the previous displayed formula: \sum_{i'=0}^{i-1}P(i\textrm{ died by }n), which computer algebra simplifies to:

    \[P(i,n) = 1-\binom{n}{i}2^{-n}\cdot{_2F_1}(1,i-n,i+1;-1).\]

Here _2F_1 is called the “(ordinary) hypergeometric function”. I gave a limited table of these probability values in the previous blog. For fixed integer i \ge 0, the entire expression reduces to a polynomial in n of order i – 1 with rational coefficients, all times 2^{-n}. For example the likelihood the 5th player is alive at step n is:

    \[P(5,n) = 2^{-n}\frac{1}{24}\big( n^4-2n^3+11n^2+14n+24 \big).\]

In general, for n < i we have a_{i,n} = 1, so players get some steps for free. The diagonal terms are a_{i,i} = 1-2^{-i}. This makes sense because for player i to not be alive on the ith step, every leap by previous players must also have been a misstep. Some results like these may also be shown using induction and the recurrence relation. I give more special cases in the next blog post. Yet the general formula works even for non-integers, though this is not physical, as the Figure below shows. For negative parameters (not shown) it has a rich structure, with singularities, and some probabilities values negative or exceeding 1.

probabilities plot
Figure: Probability that player i is alive at step n. The blue dots are for integer values of the parameters, which are physical. The probability decreases with step number. Visually, it is as if contestants start from the plateau at top-left, then slide down a hill of death 🙁

An alternative derivation of the probabilities is based on where the previous player died (if at all). If that player i – 1 died on step n′ < n, their follower must make nn′ correct guesses in a row to reach step n safely. Now sum the result from n′ = i – 1, which is the earliest step upon which they may conceivably die, up to n′ = n – 1. Add to this the chance the player was still alive at step n – 1 (which is one minus the sum of chances they died on step n′) as this guarantees the following player i is alive at n. Numerical testing shows the result is indeed equivalent. Hence rather than summing over players for a fixed step, one may instead sum over steps for a fixed player.

🡐 recurrence relation | Squid Game bridge | asymptotics 🠦

Squid Game bridge probabilities

Difficulty:   ★★★☆☆   undergraduate

In the popular Korean series Squid Game, one episode features a bridge-crossing game, whose probabilities are a fun challenge to calculate. (Warning: partial spoilers ahead.) In this cruel fictional scenario, called “Glass Stepping Stones” in the English subtitles, glass panels are suspended above a long fall. Contestants must leap between them. At each step the leading player is forced to choose left or right, knowing only that one panel is made of ordinary glass which will shatter, and the other is strengthened (“tempered”) glass which will hold. Later contestants cross the same bridge, and watch all previous attempts, so can learn the successes and failures.

The odds are simple for the first player. On each leap forward, there is a 50% chance they will fall to their death. Hence the chance of surviving N steps is 1/2^N, an exponential decrease. In the show (~30 minute mark), one player actually calculates this: 15 untested steps remain ahead of him, for a horrifyingly low 1/32768 chance of survival from that point. (Actually this is the third player, but more on that later.)

Squid Game bridge
The third contestant on the Squid Game bridge accurately calculates his chances as 1 in 32768

But suppose we do not know the outcome of earlier players. At the start, before anyone has moved, what is the probability a_{i,n} say, that player number i will still be alive on step number n? We showed a_{1,n} = 1/2^n. For player 2, it is certain they will survive step 1, by copying the first player if they were successful, or switching to the opposite pane if not. By extension player i is certain to survive the first i – 1 steps, hence a_{i,n} = 1 for all n \le i - 1.

In general, we set up a recurrence relation. But consider firstly the case i = 2. What is the chance they are alive at step n? If the first player died on step 1 (I mean, they leaped from the starting platform to an ordinary glass panel at step 1), then their successor must guess n – 1 tiles to reach step n successfully (I mean, to still be alive on panel n, and not fall through it). The probability of this combination of events is (1 - a_{1,1})/2^{n-1}. Similar reasoning applies to any step up to n – 1. However if the first player is still alive on n – 1, their follower is guaranteed to reach step n successfully. (Any later performance of the first player is irrelevant to their successors at step n.) The overall probability is the sum over these possibilities, which for an arbitrary player is:

    \[a_{i,n} = a_{i-1,n-1} + \sum_{k=1}^{n-1} ( a_{i-1,k-1} - a_{i-1,k} ) / 2^{n-k}.\]

This gives the probability in terms of the previous player. (Note the term in parentheses is the chance the previous player will die on step k precisely.) Hence starting from the initial conditions given earlier, we may build up an array of values using a spreadsheet, computer program, or computer algebra system. The latter choice preserves exact fractions, which feels very satisfying. Also we define a_{i,0} = 1 for convenience, where “step 0” may be interpreted as the ledge contestants safely start from. The Table below gives the first few values.

Table: Probability player i is still alive by step n, given no foreknowledge
step:

n = 1

2 3 4 5 6 7 8
player:

i = 1

1/2 1/4 1/8 1/16 1/32 1/64 1/128 1/256
2 1 3/4 1/2 5/16 3/16 7/64 1/16 9/256
3 1 1 7/8 11/16 1/2 11/32 29/128 37/256
4 1 1 1 15/16 13/16 21/32 1/2 93/256
5 1 1 1 1 31/32 57/64 99/128 163/256
6 1 1 1 1 1 63/64 15/16 219/256

In Squid Game the bridge has 18 (pairs of) steps. The probability of crossing the entire bridge is the probability of being alive on step 18, as the next leap is to safety. In theory the 9th player has nearly even odds of making it: a_{9,18} = 53381/131072 \approx 0.41, and the next player likely will: a_{10,18} = 77691/131072 \approx 0.59. In the show, 16 players compete in this challenge, so the last player has excellent odds, supposedly: a_{16,18} = 65493/65536 \approx 0.9993.  However our analysis does not account for human behaviour! In the show, time pressure, rivalries, and imperfect memory compete with logical decision making and the interests of the group as a whole. On the other hand, some players claim to distinguish the glass types by sight or sound, which would give an advantage. These make interesting plot elements, but would spoil the simplicity and purity of a mathematical analysis.

Returning to the recurrence relation, it simplifies to:

    \[a_{i,n} = \frac{a_{i,n-1} + a_{i-1,n-1}}{2}.\]

Hence each term is just the average of two previous terms. However I wanted to derive this via direct physical interpretation, not algebraic manipulation alone. This is intuitively satisfying. With the end result in mind, we relate P(i,n) \equiv a_{i,n} to the previous step and previous player. [Update, 21st April: A simpler way is to consider step 1. If player 1 guesses breaks it, there are i – 1 remaining players for the next n – 1 steps. If player 1 instead guesses correctly, there are i players for the next n – 1 steps. This gives the recurrence relation.] Consider the three cases for player i – 1: they (A) died before step n – 1, (B) died on step n – 1, or (C) made it safely to step n – 1 or further. The total probability is the sum of these cases:

    \[P(i,n) = P(i,n|A)P(A) + P(i,n|B)P(B) + P(i,n|C)P(C).\]

The first term for example is the “conditional probability” that i is alive at step n, given that case A occurred; times the probability of case A itself occurring. There is a similar decomposition to the above for P(i,n-1). Now most parts of the expression are straightforward. If i – 1 died at step n – 1, then the next player is definitely safe at that step, but may only guess at the following step, so P(i,n-1|B) = 1 and P(i,n|B) = 1/2. If i – 1 was safe at step n – 1 or further, then the next player is safe for an extra step: P(i,n-1|C) = 1 = P(i,n|C). For case A the conditional probabilities are more difficult, but we do not need to calculate them. Observe that if the previous player died before n – 1, then steps n – 1 and n are uncharted territory. Hence the chance the following player makes it to n safely, is half of whatever it was for them to reach n – 1 safely: P(i,n|A) = \frac{1}{2}P(i,n-1|A). Hence the decomposition becomes:

    \[P(i,n) = \frac{1}{2}P(i,n-1|A)P(A) + \frac{1}{2}P(B) + P(C).\]

But this is just \frac{1}{2}P(i,n-1) apart from the C term, as seen from expanding out the conditional cases. Now P(C) = P(i-1,n-1) \equiv a_{i-1,n-1}. It follows P(i,n) = \frac{1}{2}(P(i,n-1) + P(i-1,n-1)) as before. We did not need to evaluate P(A) or P(B), though this is straightforward.

Now that the reader (and author!) have more experience with conditional probability, let’s return to the third player in the Squid Game episode. Before anyone moved, he had chance a_{3,18} = 43/65536 \approx 1/1500 of surviving the bridge. This would seem to contradict the earlier calculation, which gave a lower chance by a factor of 21½, a surprising contrast! The black-masked “Front Man” said to the VIP observers, “I believe this next game will exceed your expectations” (~12:30 mark), but in this sense it did not 😆 . The distinction is the information learned. Conditional probability is a subtle and beautiful thing. If we know nothing about the previous attempts, nor the state of the bridge, the probabilities are our variables a_{i,n}. But if we are given the information player I died on step N for instance, then the following player has no information about later steps, and the bridge scenario is essentially reset from that point onwards. Hence P(3,18|2\textrm{ died on }3) = a_{3-2,18-3} = 1/2^{15}.

This scenario has been a valuable learning experience, as I had not worked with conditional probabilities before. Probability is very important in physics, particularly quantum physics where it is intrinsic (it is usually assumed). I originally came up with an incorrect recurrence relation, but realised this upon comparison with an article in Medium , which uses an elegant combinatorial argument. The scenario had already captured my attention, but realising my flaw drove further my need to understand. A related article  is also helpful; I recommend these if you find my discussion hard to follow. There is even a draft paper  on the Squid Game bridge probabilities! Presumably this is all little more than a specific application of textbook combinatorics. Still, it is fun to rediscover things for oneself.

🡐 ⸻ | Squid Game bridge | general solution 🠦

Coordinates adapted to observer 4-velocity field

Difficulty:   ★★★☆☆   undergraduate

Suppose you have a 4-velocity field \mathbf u, which might be interpreted physically as observers or a fluid. It may be useful to derive a time coordinate T which both coincides with proper time for the observers, and synchronises them in the usual way. Here we consider only the geodesic and vorticity-free case. Define:

    \[dT := -\mathbf u^\flat.\]

The “flat” symbol is just a fancy way to denote lowering the index, so the RHS is just -u_\mu. On the LHS, dT is the gradient of a scalar, which may be expressed using the familiar chain rule:

    \[dT = \frac{\partial T}{\partial x^0}dx^0 + \frac{\partial T}{\partial x^1}dx^1 + \cdots,\]

where x^\mu is a coordinate basis. Technically dT is a covector, with components (dT)_\mu = \partial T/\partial x^\mu in the cobasis dx^\mu. Similarly -\mathbf u^\flat = -u_0dx^0 -u_1dx^1 -\cdots, so we must match the components: \partial T/\partial x^\mu = -u_\mu. For our purposes we do not need to integrate explicitly, it is sufficient to know the original equation is well-defined. (No such time coordinate exists if there is acceleration or vorticity, which is a corollary of the Frobenius theorem, see Ellis+ 2012 §4.6.2.)

The new coordinate is timelike, since \langle dT,dT\rangle = \langle -\mathbf u^\flat,-\mathbf u^\flat\rangle = -1. One can show its change with proper time is dT/d\tau = \langle dT,\mathbf u\rangle = 1. Further, the T = \textrm{const} hypersurfaces are orthogonal to \mathbf u, since the normal vector (dT)^\sharp is parallel to \mathbf u. This orthogonality means that at each point, the hypersurface agrees with the usual simultaneity defined locally by the observer at that point. (Orthogonality corresponds to the Poincaré-Einstein convention, so named by H. Brown 2005 §4.6).

We want to replace the x^0-coordinate by T, and keep the others. What are the resulting metric components for this new coordinate? (Of course it’s the same metric, just a different expression of this tensor.) Notice the original components of the inverse metric satisfy g^{\mu\nu} = \langle dx^\mu,dx^\nu\rangle. Similarly one new component is g'^{TT} = \langle dT,dT\rangle = -1. Also g'^{Ti} = \langle dT,dx^i\rangle = -u^i, where i = 1,2,3. The g'^{iT} are the same by symmetry, and the remaining components are unchanged. Hence the new components in terms of original components are:

    \[g'^{\mu\nu} = \begin{pmatrix} -1 & -u^1 & -u^2 & -u^3 \\ -u^1 & g^{11} & g^{12} & g^{13} \\ -u^2 & g^{21} & g^{22} & g^{23} \\ -u^3 & g^{31} & g^{32} & g^{33} \end{pmatrix}.\]

The matrix inverse gives the new metric components g'_{\mu\nu}. The 4-velocity components are: u'_\mu = (-1,0,0,0) by the original equation. Also u'^T = \langle dT,\mathbf u\rangle = 1, and the u'^i = \langle dx^i,\mathbf u\rangle = u^i are unchanged. Hence u'^\mu = (1,u^1,u^2,u^3).

Anecdote: I used to write out dT = -u_0dx^0 - u_1dx^1 - \cdots, rearrange for dx^0, and substitute it into the original line element. This works but is clunky. My original inspiration was Taylor & Wheeler 2000 §B4, and I was thrilled to discover their derivation of Gullstrand-Painlevé coordinates from Schwarzschild coordinates plus certain radial velocities. (I give more references in MacLaurin 2019  §3.) I imagine that if a textbook presented the material above — given limited space and more formality — it may seem as if the more elegant approach were obvious. However I only (re?)-discovered it today by accident, using a specific 4-velocity from the previous post, and noticing the inverse metric components looked simple and familiar…