Raising and lowering indices on the Christoffel symbols

Given a connection \nabla on a manifold, along with any vector basis \mathbf e_\nu, the connection coefficients are:

    \[ \Gamma_{\mu\nu}^{\hphantom{\mu\nu}\sigma} := \langle\nabla_\mu\mathbf e_\nu,\mathbf e^\sigma\rangle. \]

[For the usual connection based on a metric, these are called Christoffel symbols. On rare occasions some add “…of the second kind”, which we will interpret to mean the component of the output is the only raised index. Also the coefficients may be packaged into Cartan’s connection 1-forms, with components \omega_{\nu\hphantom\sigma\mu}^{\hphantom\nu\sigma} = \Gamma_{\mu\nu}^{\hphantom{\mu\nu}\sigma}, as introduced in the previous article. Hence raising or lowering indices on the connection forms is the same problem.]

In our convention, the first index specifies the direction \mathbf e_\mu of differentiation, the second is the basis vector (field) being differentiated, and the last index is for the component of the resulting vector. We use the same angle bracket notation for the metric scalar product, inverse metric scalar product, and the contraction of a vector and covector (as above; and this does not require a metric).

This unified notation is very convenient for generalising the above connection coefficients to any variant of raised or lowered indices, as we will demonstrate by examples. We will take such variants on the original equation as a definition, then the relation to the original \Gamma_{\mu\nu}^{\hphantom{\mu\nu}\sigma} variant will be derived from that.

[Regarding index placement and their raising and lowering, I was formerly confused by this issue, in the case of non-tensorial objects, or using different vector bases for different indices. For example, to express an arbitrary frame in terms of a coordinate basis, some references write the components as e_a^{\hphantom a\mu}, as blogged previously. The Latin index is raised and lowered using the metric components in the general frame \mathbf e_a, whereas for the Greek index the metric components in a coordinate frame \partial_\mu are used. However while these textbook(s) gave useful practical formulae, I did not find them clear on what was definition vs. what was derived. I eventually concluded the various indices and their placements are best treated as a definition of components, with any formulae for swapping / raising / lowering being obtained from that.]

The last index is the most straightforward. We define: \Gamma_{\mu\nu\tau} := \langle\nabla_\mu\mathbf e_\nu,\mathbf e_\tau\rangle, with all indices lowered. These quantities are the overlaps between the vectors \nabla_\mu\mathbf e_\nu and \mathbf e_\tau. [Incidentally we could call them the “components” of \nabla_\mu\mathbf e_\nuif this vector is decomposed in the alternate vector basis (\mathbf e^\tau)^\sharp. After all, for any vector, \mathbf X = \langle\mathbf X,\mathbf e_\tau\rangle(\mathbf e^\tau)^\sharp, which may be checked by contracting both sides with \mathbf e_\sigma. ] To relate to the original \Gammas, note:

    \[ \Gamma_{\mu\nu\tau} = \langle\nabla_\mu\mathbf e_\nu,\mathbf e^\sigma g_{\sigma\tau}\rangle = \langle\nabla_\mu\mathbf e_\nu,\mathbf e^\sigma\rangle g_{\sigma\tau} = \Gamma_{\mu\nu}^{\hphantom{\mu\nu}\sigma}g_{\sigma\tau}, \]

using linearity. Hence this index is raised or lowered using the metric, as familiar for an index of a tensor. We could say it is a “tensorial” index. The output \nabla_\mu\mathbf e_\nu of the derivative is just a vector, after all. The \Gamma_{\mu\nu\tau} have been called “Christoffel symbols of the first kind”.

The first index is also fairly straightforward. We define a raised version by: \Gamma^{\tau\hphantom\nu\sigma}_{\hphantom\tau\nu} := \langle\nabla^\tau\mathbf e_\nu,\mathbf e^\sigma\rangle. However, this begs the question of what the notation ‘\nabla^\tau’ means. The raised index suggests \mathbf e^\tau, which is a covector, but we can take its dual using the metric (assuming there is one). Hence \nabla^\tau \equiv \nabla_{\mathbf e^\tau} := \nabla_{(\mathbf e^\tau)^\sharp} is a sensible definition, or for an arbitrary covector, \nabla_{\boldsymbol\alpha} := \nabla_{\boldsymbol\alpha^\sharp}. (For those not familiar with the notation, \boldsymbol\alpha^\sharp is simply the vector with components \alpha^\tau = \alpha_\mu g^{\mu\tau}.)

These duals are related to the vectors of the original basis by: (\mathbf e^\tau)^\sharp = g^{\tau\mu}\mathbf e_\mu, where g^{\tau\mu} = \langle\mathbf e^\tau,\mathbf e^\mu\rangle are still the inverse metric components for the original basis. Hence \nabla^\tau = \nabla_{g^{\tau\mu}\mathbf e_\mu} = g^{\tau\mu}\nabla_\mu, by the linearity of this slot. In terms of the coefficients, \Gamma^{\tau\hphantom\nu\sigma}_{\hphantom\tau\nu} = g^{\tau\mu}\Gamma_{\mu\nu}^{\hphantom{\mu\nu}\sigma}, hence the first index is also “tensorial”.

Finally, the middle index has the most interesting properties for raising or lowering. Let’s start with a raised middle index and lowered last index: \Gamma_{\mu\hphantom\sigma\nu}^{\hphantom\mu\sigma} := \langle\nabla_\mu\mathbf e^\sigma,\mathbf e_\nu\rangle. This means it is now a covector field which is being differentiated, and the components of the result are peeled off. For their relation to the original coefficients, note firstly \langle\mathbf e^\sigma,\mathbf e_\nu\rangle = \delta_\nu^\sigma, since these bases are dual. These are constants, hence their gradients vanish:

    \[ 0 = \nabla_\mu \langle\mathbf e^\sigma,\mathbf e_\nu\rangle = \langle\nabla_\mu\mathbf e^\sigma,\mathbf e_\nu\rangle + \langle\mathbf e^\sigma,\nabla_\mu\mathbf e_\nu\rangle. \]

[The second equality is not the metric-compatibility property of some connections, despite looking extremely similar in our notation. No metric is used here. Rather, it is a defining property of the covariant derivative; see e.g. Lee 2018  Prop. 4.15. But no doubt these defining properties are chosen for their “compatibility” with the vector–covector relationship.]

Hence \Gamma_{\mu\hphantom\sigma\nu}^{\hphantom\mu\sigma} = -\Gamma_{\mu\nu}^{\hphantom{\mu\nu}\sigma}. Note the order of the last two indices is swapped in this expression, and their “heights” are also changed. (I mean, it is not just that \sigma and \nu are interchanged.) This relation is antisymmetric in some sense, and looks more striking in notation which suppresses the first index. Connection 1-forms (can) do exactly that: \boldsymbol\omega^\sigma_{\hphantom\sigma\nu} = -\boldsymbol\omega_\nu^{\hphantom\nu\sigma}.

Note this is not the well-known property for an orthonormal basis, which is \boldsymbol\omega_\sigma^{\hphantom\sigma\nu} = -\boldsymbol\omega_\nu^{\hphantom\nu\sigma} using our conventions. Both equalities follow from conveniently chosen properties of bases: the dual basis relation in the former case, and orthonormality of a single basis in the latter. Our relation holds in any basis, and does not require a metric. But both relations are useful — they just apply to different contexts.

To isolate the middle index as the only one changed, raise the last index again: \Gamma_\mu^{\hphantom\mu\sigma\tau} = \Gamma_{\mu\hphantom\sigma\nu}^{\hphantom\mu\sigma}g^{\nu\tau} = -\Gamma_{\mu\nu}^{\hphantom{\mu\nu}\sigma}g^{\nu\tau}. Only now have we invoked the metric, in our discussion of the middle index.

The formula \Gamma_{\mu\hphantom\sigma\nu}^{\hphantom\mu\sigma} = -\Gamma_{\mu\nu}^{\hphantom{\mu\nu}\sigma} has an elegant simplicity. However at first it didn’t feel right to me as the sought-after relation. We are accustomed to tensors, where a single index is raised or lowered independently of the others. However in this case the middle index describes a vector which is differentiated, and differentiation is not linear (for our purposes here), but obeys a Leibniz rule. (To be precise, it is linear over sums, and multiplication by a constant scalar. It is not linear under multiplication by an arbitrary scalar. Mathematicians call this \mathbb R-linear but not C^\infty-linear.)

The formula could be much worse. Suppose we replaced the original equation with an arbitrary operator, and defined raised and lowered coefficients similarly. Assuming the different variants are related somehow, then in general each could depend on all coefficients from the original variant, for example: A_{\mu\hphantom\nu\sigma}^{\hphantom\mu\nu} = f_{\mu\hphantom\nu\sigma}^{\hphantom\mu\nu\hphantom\sigma\alpha\beta\gamma}A_{\alpha\beta\gamma}, for some functions f with 3 × 2 = 6 indices.

Much of the material here is not uncommon. In differential geometry it is well-known the connection coefficients are not tensors. And it is not rare to hear that in fact two of their indices do behave tensorially. But I do not remember seeing a clear definition of what it means to raise or lower any index, in the literature. (Though the approach in geometric algebra, of using two distinct vector bases \mathbf e_\mu and (\mathbf e^\mu)^\sharp, is similar. This requires a metric, and gives a unique interpretation to raising and lowering indices. But in fairness, most transformations we have described also require a metric.) Most textbooks probably do not define different variants of the connection forms, although I have not investigated this much at present.

Finally, it is easy to look back and say something is straightforward. But not when your attention is preoccupied with grasping the core aspects of a new thing, not its more peripheral details. When I was first learning about connection forms, it was not at all clear whether you could raise or lower their indices. Because the few references I consulted didn’t mention this at all, to me it felt like it should be obvious. But it is not obvious, and demands careful consideration, even though some results are familiar ideas in a new context.

Cartan’s connection 1-forms

Difficulty:   ★★★★☆   undergraduate / graduate

The connection 1-forms \boldsymbol\omega{_\mu^{\hphantom\mu\nu}} are one way to express a connection \nabla on a manifold. The connection coefficients \Gamma_{\sigma\mu}^{\hphantom{\sigma\mu}\nu} are more familiar and achieve the same purpose, but package the information differently. Connection forms are part of Cartan’s efficient and elegant “moving frames” approach to derivatives and curvature.

[I am only just learning this material, so this article is for my own notes, consolidation of understanding, and checking of conventions. It is a work in progress. There is not much actual derivation in what follows, so don’t be intimidated by its length, and most formulae here are just notation.]

Write \mathbf e_\mu for a vector basis at each point, and \mathbf e^\nu for its dual basis. For now, we do not assume these frames are orthonormal (in fact, we don’t even need a metric, for now). The connection forms for this basis are: \boldsymbol\omega_\mu^{\hphantom\mu\nu}(\mathbf X) := \langle\nabla_{\mathbf X}\mathbf e_\mu,\mathbf e^\nu\rangle, where \mathbf X is any input vector. (I will sometimes write \langle\cdot,\cdot\rangle for the contraction between a vector and covector, because the unified notation with the metric scalar product is convenient, although it is sometimes worth reminding oneself that no metric is needed in this case.) To find the components, substitute basis vectors \mathbf X \rightarrow \mathbf e_\sigma:

    \[ \omega_{\mu\hphantom\nu\sigma}^{\hphantom\mu\nu} := \boldsymbol\omega_\mu^{\hphantom\mu\nu}(\mathbf e_\sigma) = \langle\nabla_\sigma\mathbf e_\mu,\mathbf e^\nu\rangle =: \Gamma_{\sigma\mu}^{\hphantom{\sigma\mu}\nu}, \]

where \nabla_\sigma := \nabla_{\mathbf e_\sigma} as usual. Hence with our conventions, the \mu-index specifies which basis vector field is being differentiated, \sigma specifies the direction it is being differentiated in, and \nu specifies the component of the resulting vector. (Lee 2018  Problem 4-14 uses the same convention. MTW  §14.5, Frankel 2012  §9.3b, and Tu 2017  §11.1 would write \boldsymbol\omega^\nu_{\hphantom\nu\mu} for our expression — which swaps the index order.)

We could define other connection 1-forms \boldsymbol\omega^\nu_{\hphantom\nu\mu} for the dual basis. Note the different index placement. These are:

    \[ \omega\indices^\nu_{\hphantom\nu\mu\sigma} := \boldsymbol\omega^\nu_{\hphantom\nu\mu}(\mathbf e_\sigma) := \langle\nabla_\sigma\mathbf e^\nu,\mathbf e_\mu\rangle = -\langle\nabla_\sigma\mathbf e_\mu,\mathbf e^\nu\rangle = -\omega_{\mu\hphantom\nu\sigma}^{\hphantom\mu\nu}. \]

Hence the two sets of connection forms are related:

    \[ \boldsymbol\omega_\mu^{\hphantom\mu\nu} = -\boldsymbol\omega^\nu_{\hphantom\nu\mu}. \]

Caution: This is not the same antisymmetric relation which most texts refer to. These texts refer to a single set of connection forms for an orthonormal basis. In that case, we have additionally \boldsymbol\omega_\mu^{\hphantom\mu\nu} = -\boldsymbol\omega_\nu^{\hphantom\nu\mu}, and similarly for our alternate connection forms.

This used:

    \[ 0 = \nabla_\sigma\langle\mathbf e_\mu,\mathbf e^\nu\rangle = \langle\nabla_\sigma\mathbf e_\mu,\mathbf e^\nu\rangle + \langle\mathbf e_\mu,\nabla_\sigma\mathbf e^\nu\rangle. \]

For the first equality, \langle\mathbf e_\mu,\mathbf e^\nu\rangle = \delta_\mu^{\hphantom\mu\nu} is constant, so its gradient vanishes. The second equality follows from the defining properties of the covariant derivative, i.e. the extension of the connection to covectors and other tensors (e.g. Lee 2018  Prop. 4.15).

[Regarding index placement, and their raising and lowering, I was formerly confused by this issue in the context of vector bases, for a previous blog article. Specifically, to express an arbitrary frame in terms of a coordinate basis, some references write the components as \mathbf e_a^{\hphantom a\mu}. The Latin index is raised and lowered using the metric components in the arbitrary frame, whereas the Greek index uses the metric components in the coordinate frame. However textbooks were not clear on what was definition vs. what was derived. I eventually concluded the various indices and their placements are best treated as a definition of components, with any formulae for swapping/raising/lowering being obtained from that.]

It is interesting to express various covariant derivatives in terms of the connection forms. But firstly:

    \[ \boldsymbol\omega_\mu^{\hphantom\mu\nu} = \omega_{\mu\hphantom\nu\tau}^{\hphantom\mu\nu}\mathbf e^\tau, \qquad\qquad \boldsymbol\omega^\nu_{\hphantom\nu\mu} = \omega^\nu_{\hphantom\nu\mu\tau}\mathbf e^\tau, \]

simply from the definition of covector components. But to check anyway, contract both sides of either equality with \mathbf e_\sigma, to recover the defining formulae. It follows:

    \[ \nabla_\sigma\mathbf e_\mu = \langle\nabla_\sigma\mathbf e_\mu,\mathbf e^\tau\rangle\,\mathbf e_\tau = \omega_{\mu\hphantom\tau\sigma}^{\hphantom\mu\tau}\mathbf e_\tau. \]

To check: the first equality is just components of the vector `\nabla_\sigma\mathbf e_\mu‘. But can check it holds by contracting both sides with \mathbf e^\nu. Similarly for the covector gradients,

    \[ \nabla_\sigma\mathbf e^\nu = \langle\nabla_\sigma\mathbf e^\nu,\mathbf e_\tau\rangle\,\mathbf e^\tau = \omega^\nu_{\hphantom\nu\tau\sigma}\mathbf e^\tau = -\omega_{\tau\hphantom\nu\sigma}^{\hphantom\tau\nu}\mathbf e^\tau. \]

Now, \nabla = \mathbf e^\sigma\otimes\nabla_\sigma. Because: substitute \mathbf X into the left slot (in our convention) of both sides. The RHS becomes: \langle\mathbf e^\sigma,\mathbf X\rangle\nabla_\sigma = X^\sigma\nabla_\sigma = \nabla_{\mathbf X}, by linearity of this slot. Now, apply this identity to \mathbf e_\mu:

    \[ \nabla\mathbf e_\mu = \mathbf e^\sigma\otimes\nabla_\sigma\mathbf e_\mu = \omega_{\mu\hphantom\tau\sigma}^{\hphantom\mu\tau}\mathbf e^\sigma\otimes\mathbf e_\tau = \boldsymbol\omega_\mu^{\hphantom\mu\tau}\otimes\mathbf e_\tau. \]

And:

    \[\begin{split} \nabla\mathbf e^\nu &= e^\sigma\otimes\nabla_\sigma\mathbf e^\nu = \omega^\nu_{\hphantom\nu\tau\sigma}\mathbf e^\sigma\otimes\mathbf e^\tau = -\omega_{\tau\hphantom\nu\sigma}^{\hphantom\tau\nu}\mathbf e^\sigma\otimes\mathbf e^\tau \\ &= \boldsymbol\omega^\nu_{\hphantom\nu\tau}\otimes\mathbf e^\tau = -\boldsymbol\omega_\tau^{\hphantom\tau\nu}\otimes\mathbf e^\tau. \end{split}\]

The antisymmetric part of the covariant derivative is (one half times) the exterior derivative:

    \[\begin{split} d(\mathbf e^\nu) &= \boldsymbol\omega^\nu_{\hphantom\nu\sigma}\wedge\mathbf e^\sigma = \omega^\nu_{\hphantom\nu\sigma\tau}\mathbf e^\tau\wedge\mathbf e^\sigma = -\omega^\nu_{\hphantom\nu\sigma\tau}\mathbf e^\sigma\wedge\mathbf e^\tau \\ &= \omega_{\sigma\hphantom\nu\tau}^{\hphantom\sigma\nu}\mathbf e^\sigma\wedge\mathbf e^\tau. \end{split}\]

This is Cartan’s first structural equation! We have assumed a connection with no torsion.

Vector with a lowered component index?

Difficulty:   ★★★☆☆   undergraduate

Normally we recognise a vector as having a raised (component) index: u^\mu, whereas a covector has a lowered index: \omega_\nu. Similarly the dual to \mathbf u, using the metric, is the covector with components denoted u_\nu say; while the dual to \boldsymbol\omega is the vector with components \omega^\mu.

Recall what this notation means. It presupposes a vector basis \mathbf e_\mu say, where in this case the \mu labels entire vectors — the different vectors in the basis — rather than components. Hence we have the decomposition: \mathbf u = u^\mu\mathbf e_\mu. Similarly, the component notation \omega_\nu implies a basis of covectors \mathbf e^\nu, so: \boldsymbol\omega = \omega_\nu\mathbf e^\nu. These bases are taken to be dual to one another (in the sense of bases), meaning: \mathbf e^\nu(\mathbf e_\mu) = \delta^\nu_\mu. We also have \mathbf u^\flat = u_\nu\mathbf e^\nu and \boldsymbol\omega^\sharp = \omega^\mu\mathbf e_\mu, where as usual u_\nu = g_{\mu\nu}u^\mu and \omega^\mu = g^{\mu\nu}\omega_\nu. (The “sharp” and “flat” symbols are just a fancy way to denote the dual. This is called the musical isomorphism.)

However since \mathbf u = u^\mu\mathbf e_\mu, we may instead take the dual of both sides of this expression to get: \mathbf u^\flat = u^\mu(\mathbf e_\mu)^\flat. This gives a different decomposition than in the previous paragraph. Curiously, this expression contains a raised component index, even though it describes a covector. For each index value \mu, the component u^\mu is the same number as usual. But here we have paired them with different basis elements. Similarly \boldsymbol\omega^\sharp = \omega_\nu(\mathbf e^\nu)^\sharp is a different decomposition of the vector \boldsymbol\omega^\sharp. It describes a vector, despite using a lowered component index. Using the metric, the two vector bases are related by: \langle(\mathbf e^\nu)^\sharp,\mathbf e_\mu\rangle = \delta^\nu_\mu.

A good portion of the content here is just reviewing notation. However this article does not seem as accessible as I envisioned. The comparison with covectors is better suited to readers who already know the standard approach. (And I feel a pressure to demonstrate I understand the standard approach before challenging it a little, lest some readers dismiss this exposition.) However for newer students, it would seem better to start afresh by defining a vector basis \mathbf e^\nu to satisfy: \langle\mathbf e^\nu,\mathbf e_\mu\rangle = \delta^\nu_\mu. (While \mathbf e^\nu is identical notation to covectors, there need not be confusion, if no covectors are present anywhere.) This relation is intuitive, as the below diagram shows.

reciprocal bases
Figure: reciprocal bases in \mathbb R^3. In the diagram we index the vectors by colour, rather than the typical 1, 2, and 3. Rather than using a co-basis of covectors as in the standard modern approach, we take both bases to be made up of vectors. Note for example, the red vector on the right is orthogonal to the blue and green vectors on the left. (I made up the directions when plotting, so don’t expect everything to be quantifiably correct.) Arguably “inverse basis” would be the best name. Finally I would prefer to start from this picture, as it is simple and intuitive, rather than start with covectors and musical isomorphisms. Maybe next time I will be brave enough.

I learned this approach (of defining a second, “reciprocal basis” of vectors) from geometric algebra references. However, it is really just a revival of the traditional view. I used to think old physics textbooks which taught this way were unsophisticated, and unaware of the modern machinery (covectors). I no longer think this way. The alternate approach does require a metric, so is less general. However all topics I have worked on personally, in relativity, do have a metric present. But even for contexts with no metric, this approach could still serve as a concrete intuition, to motivate the introduction of covectors, which are more abstract. The alternate approach also challenges the usual distinction offered between contravariant and covariant transformation of (co)vectors and higher-rank tensors. It shows this is not about vectors vs covectors at all, but more generally about the basis chosen. I write about these topics in significant length in my MPhil thesis (2024, forthcoming, §2.3), and intend to write more later.

Euler and spinors

Difficulty:   ★★★☆☆   undergraduate

The discovery of spinors is most often credited to quantum physicists in the 1920s, and to Élie Cartan in the prior decade for an abstract mathematical approach. But it turns out the legendary mathematician Leonhard Euler discovered certain algebraic properties for the usual (2-component, Pauli) spinors, back in the 1700s! He gave a parametrisation for rotations in 3D, using essentially what were later known as Cayley-Klein parameters. There was not even the insight that each set of parameter values forms an interesting object in its own right. But we can recognise one key conceptual aspect of spinors in this accidental discovery: the association with rotations.

In 1771, Euler published a paper on orthogonal transformations, whose title translates to: “An algebraic problem that is notable for some quite extraordinary relations”. Euler scholars index it as “E407”, and the Latin original  is available from the Euler Archive website. I found an English translation  online, which also transcribes the original language.

Euler commences with the aim to find “nine numbers… arranged… in a square” which satisfy certain conditions. In modern notation this is a matrix M say, satisfying M^TM = 1 = MM^T, which describes an orthogonal matrix. While admittedly the paper is mostly abstract algebra, he is also motivated by geometry. In §3 he mentions that the equation for a surface is “transformed” under a change of [Cartesian] coordinates, including the case where the coordinate origins coincide. We recognise this (today, at least) as a rotation, possibly combined with a reflection. Euler also mentions “angles” (§4 and later), which is clearly geometric language.

He goes on to analyse orthogonal transformations in various dimensions. [I was impressed with the description of rotations about n(n – 1)/2 planes, in n dimensions, because I only first learned this in the technical context of higher-dimensional rotating black holes. It is only in 3D that rotations are specified by axis vectors.] Then near the end of the paper, Euler seeks orthogonal matrices containing only rational entries, a “Diophantine” problem. Recall rotation matrices typically contain many trigonometric terms like sin(θ) and cos(θ), which are irrational numbers for most values of the parameter θ. But using some free parameters “p, q, r, s”, Euler presents:

    \[\begin{tabular}{|c|c|c|}\hline $\frac{p^2+q^2+r^2+s^2}{u}$ & $\frac{2(qr+ps)}{u}$ & $\frac{2(qs-pr)}{u}$ \\ \hline $\frac{2(qr-ps)}{u}$ & $\frac{p^2-q^2+r^2-s^2}{u}$ & $\frac{2(pq+rs)}{u}$ \\ \hline $\frac{2(qs+pr)}{u}$ & $\frac{2(rs-pq)}{u}$ & $\frac{p^2-q^2-r^2+s^2}{u}$ \\ \hline\end{tabular},\]

where u := p^2 + q^2 + r^2 + s^2. (I have copied the style which Euler uses in some subsequent examples.) By choosing rational values of the parameters, the matrix entries will also be rational, however this is not our concern here. The matrix has determinant +1, so we know it represents a rotation. It turns out the parameters form the components of a spinor!! (p,q,r,s)/\sqrt u are the real components of a normalised spinor. We allow all real values, but will ignore some trivial cases. One aspect of spinors is clear from inspection: in the matrix the parameters occur only in pairs, hence the sets of values (p,q,r,s)/\sqrt u and -(p,q,r,s)/\sqrt u give rise to the same rotation matrix. (Those familiar with spinors will recall the spin group is the “double cover” of the rotation group.)

The standard approach is to combine the parameters into two complex numbers. But in the geometric algebra (or Clifford algebra) interpretation, a spinor is a rotation of sorts, or we might say a “half-rotation”. It is about the following plane:

    \[q\hat{\mathbf y}\wedge\hat{\mathbf z} + r\hat{\mathbf z}\wedge\hat{\mathbf x} + s\hat{\mathbf x}\wedge\hat{\mathbf y}.\]

(For those who haven’t seen the wedge product nor bivectors, you can visualise \hat{\mathbf x}\wedge\hat{\mathbf y} for example, as the parallelogram or plane spanned by those vectors. It also has a magnitude and handedness/orientation.) The sum is itself a plane, because we are in 3D. Dividing by \sqrt{q^2 + r^2 + s^2} gives a unit bivector. For the spinor, the angle of rotation θ/2 say, is given by (c.f. Doran & Lasenby 2003  §2.7.1):

    \[\cos(\theta/2) = p/\sqrt{u},\qquad \sin(\theta/2) = \sqrt{(q^2+r^2+s^2)/u}.\]

This determines θ/2 to within a range of 2π (if we include also the orientation of the plane). In contrast, the matrix given earlier effects a rotation by θ — twice the angle — about the same plane. This is because geometric algebra formulates rotations using two copies of the spinor. The matrix loses information about the sign of the spinor, and hence also any distinction between one or two full revolutions.

Euler extends the challenge of finding orthogonal matrices with rational entries to 4D. In §34 he parametrises matrices using “eight numbers at will a, b, c, d, p, q, r, s”. However the determinant of this matrix is -1, so it is not a rotation, and the parameters cannot form a spinor. Two of its eigenvalues are -1 and +1. Now the eigenvectors corresponding to distinct eigenvalues are orthogonal (a property most familiar for symmetric matrices, but it holds for orthogonal matrices also). It follows the matrix causes reflection along one axis, fixes an orthogonal axis, and rotates about the remaining plane. So it does not “include every possible solution” (§36). But I guess the parameters might form a subgroup of the Pin(4) group, the double cover of the 4-dimensional orthogonal group O(4).

Euler provides another 4×4 orthogonal matrix satisfying additional properties, in §36. This one has determinant +1, hence represents a rotation. It would appear no eigenvalues are +1 in general, so it may represent an arbitrary rotation. I guess the parameters (a,b,c,d,p,q,r,s)/\sqrt u, where I label by u the quantity (a^2+b^2+c^2+d^2)(p^2+q^2+r^2+s^2) mentioned by Euler, might form spinors of 4D space (not 3+1-dimensional spacetime). If so, these are members of Spin(4), the double cover of the 4D rotation group SO(4).

Euler was certainly unaware of his implicit discovery of spinors. His motive was to represent rotations using rational numbers, asserting these are “most suitable for use” (§11). Probably more significant today is that rotations are described by rational functions of spinor components. But the fact spinors would be rediscovered repeatedly in different applications suggests there is something very natural or Platonic about them. Euler says his 4D “solution deserves the more attention”, and that with a general procedure for this and higher dimensions, “Algebra… would be seen to grow very much.” (§36) He could not have anticipated how deserving of attention spinors are, nor their importance in algebra and elsewhere!

Spin-1/2 and a rotating mirror

Difficulty:   ★★☆☆☆   high school

Imagine a light ray reflecting off a mirror. If the mirror is rotating, the direction of the reflected beam will also rotate, but at twice the rate of the mirror! This follows from the way the angles work, if you recall for example “the angle of incidence equals the angle of reflection” and think about it carefully… Or, just play around with an animation until it looks right 😉 . In quantum physics this angle-doubling property turns up in the description of electrons for example, where it seems very mysterious and exotic (keywords: “spin-1/2” and “spinors”). So its appearance in an “ordinary” and intuitive setting is reassuring.

angles for the mirror and ray
Figure 1: A two-sided mirror and light ray (green arrows). The light arrives from the left, bounces off the mirror in the centre, and exits towards the upper-right. In this example the light beam has angle b = 165°, and the mirror 15°.

For simplicity let’s use a two-dimensional plane, as shown in Figure 1. We measure angles from the positive direction of the x-axis, as usual in polar coordinates. I choose to measure from the centre outwards, so for the incoming ray the angle assigned is the opposite of what the arrow might suggest. Label the incoming ray angle b, and mirror rotation angle m. Now if you increase m by some given amount, the outgoing ray angle increases by twice as much. But if you increase b instead, the outgoing ray angle decreases by the same amount. We also need an “initial condition” of sorts:  when the mirror is horizontal (m = 0°), and the ray arrives from directly above (b = 90°), the reflected beam is also at 90°. It follows:

reflected angle  =  2mb + 180°.

Now if the mirror rotates by 180°, the reflected ray completes a full 360° rotation, so is back to its original position. (We suppose the mirror is 2-sided.) If you hadn’t watched the rotation, you wouldn’t know anything had changed. But now suppose we make one side of the mirror red and the other blue, so the reflected ray takes on the colour of the closest side. Now the ray must make two complete revolutions, 720°, to get back to its original state! After one revolution it is back to the same position, but has a different colour, as the opposite side of the mirror faces the beam. Similarly, if the reflected ray is rotated by 180° in one direction, this is not the same as rotating by 180° in the opposite direction, as the colour is different. “Spinors” have these same features, except in place of red/blue their mathematical description picks up a factor of ±1.

two mirrors rotated by differing amounts
Figure 2: A two-sided coloured mirror. The incoming ray is depicted as grey, but think of it as white so its reflection is free to take on the appropriate colour. In the left image, the blue side of the mirror is facing upwards. Now slowly rotate the mirror by a half-revolution. (Actually I have drawn a slightly different angle just for variety.) The reflected ray changes angle by a full revolution, but has since turned red!

You might try animating this yourself. If you draw the rays with unit length, then the arrow for the incoming beam points from (cos b, sin b) to (0,0). The outgoing arrow points from (0,0) to -(cos(2mb), sin(2mb)), where a minus sign replaces the 180° term from earlier. The colour depends on whether the incoming ray is from 0 to 180° ahead of the mirror, or 0 to 180° behind. This is determined by the sign of sin(mb). It is convenient to allow angle parameters beyond 360°, which makes  no physical difference  at most only a change in colour, as we have learned 😀 . Below is Mathematica code I wrote, which uses slider controls for the angle parameters. The result is fun to play around with, and it helps make the angle-doubling more intuitive.

width = 0.3;
height = 0.02;
mirror = Polygon[{{-width,-height},{width,-height},{width,height},{-width,height}}, VertexColors->{Red,Red,Blue,Blue}];
Manipulate[ Graphics[ {Rotate[mirror,m], {Gray,Arrow[{{Cos[b],Sin[b]},{0,0}}]}, {If[Sin[b-m]>0,Blue,Red],Arrow[{{0,0},-{Cos[2m-b],Sin[2m-b]}}]}}, PlotRange->{{-1,1},{-1,1}} ], {{m,0},-2\[Pi],2\[Pi]}, {{b,3\[Pi]/4},0,2\[Pi]} ]

More on “covariant” versus “contravariant”

Difficulty:   ★★★☆☆   undergraduate

I have written on the topic of “covariant” and “contravariant” vectors (and higher-rank tensors) previously, and have been intending to write an update for a number of years. It must be noted some authors recommend avoiding these terms completely, including Schutz 2009  §3.3:

Most of these names are old-fashioned; ‘vectors’ and ‘dual vectors’ or ‘one-forms’ are the modern names.

Let’s return to Schutz’ reason soon. I have followed this naming practice myself, except I prefer to say “covector” or “1-form”, rather than “dual vector” which can be clumsy. (What would you call the vector which is the (metric) dual to a given 1-form: the “dual of a dual vector”, or a “dual-dual-vector”!?) People also talk about “up[stairs] indices” and “down[stairs] indices”, which seems alright.

But if you want to be cheeky, you might say a vector is covariant, while a 1-form is contravariant — the exact opposite of usual terminology! I remember a maths graduate student at some online school stating this. Similar sentiments are expressed by Spivak 1999 vol. 1  §4:

Nowadays such situations are always distinguished by calling the things which go in the same direction “covariant” and the things which go in the opposite direction “contravariant”. Classical terminology used these same words, and it just happens to have reversed this: a vector field is called a contravariant vector field, while a section of T*M is called a covariant vector field. And no one has had the gall or authority to reverse terminology so sanctified by years of usage. So it’s very easy to remember which kind of vector field is covariant, and which is contravariant — it’s just the opposite of what it logically ought to be.

While I love material which challenges my conceptual understanding, and Spivak’s humorous prose is fun; trying to be too “clever” with terms can hamper clear communication. Back to Schutz, who clarifies:

The reason that ‘co’ and ‘contra’ have been abandoned is that they mix up two very different things: the transformation of a basis is the expression of new vectors in terms of old ones; the transformation of components is the expression of the same object in terms of the new basis.

If you take the components of a fixed vector in a given basis, they transform contravariantly when the basis changes. But if you consider the vector as a whole — a single geometric object — and ask how a basis vector (specifically) is mapped to a new basis vector, the change is “covariant”. (Recall, as Schutz explains: “The property of transforming with basis vectors gives rise to the co in ‘covariant vector’ and its shorter form ‘covector’.”) In general, if you fix a set of components, by which I mean fixing an ordered set of numbers like (0,1,0,½) say, and then change the basis vectors these numbers refer to, then the change of a vector (as a whole entity) is “covariant”, so-called. For 1-forms, the converse of these statements apply. Some diagrams would make this paragraph clearer, but I leave this as an exercise, sorry.

However, it seems to me the most accurate description is that vectors don’t change at all, when you change a basis! Picture a vector as an arrow in space, then the arrow does not move. In this sense vectors are neither contravariant nor covariant, but invariant! (We could also say generally covariant, since they are geometric entities independent of any coordinate system. This is the usual modern meaning of the word “covariant”, but it’s a bit different to the covariant–contravariant distinction, so for clarity I avoid this language here.) In conclusion, one of the clearest descriptions is to simply say: vector, or 1-form / covector. Or, given historical usage, it is especially clear to say a vector’s “components transform contravariantly”. See the Table.

Table: transformation under basis change
object clarification vector 1-form
components same (co)vector, but components in a new basis contravariant covariant
(co)vector same (co)vector, treated as a whole invariant invariant
basis (co)vector transform to different (co)vector covariant contravariant

Addendum: I recently learned a dual basis need not be made up of 1-forms, as in the usual formulation in differential geometry, but of vectors instead! Recall the defining relation between a coordinate basis and cobasis: dx^\mu(\partial_\nu) = \delta^\mu_\nu, or \mathbf e^\mu(\mathbf e_\nu) = \delta^\mu_\nu for an arbitrary frame. In particular, each cobasis element is orthogonal to the “other” 3 vectors. But we can take the duals (dx^\mu)^\sharp, which are vectors but obey the same orthogonality relations, via the metric scalar product: \langle(dx^\mu)^\sharp,\partial_\nu\rangle = \delta^\mu_\nu. (By relating to the standard approach I may have made things look complicated, but this should be visualised as simply finding new vectors orthogonal to existing vectors.) (On a separate note, “dual” here means as an individual vector, not dual as a basis.) It seems this vector approach to a dual basis was the original one. In 1820 an Italian mathematician Giorgini distinguished between projezione oblique (parallel projections) and projezioni ortogonali (orthogonal projections) of line segments, now termed contravariant and covariant. According to one historian (Caparrini 2003 ), this was “one of the first clear-cut distinctions between the two types of projections in analytic geometry.” However priority goes to Hachette 1809 . Today in geometric algebra, also known as Clifford algebra, a dual basis is also defined as vectors not 1-forms, via \langle\mathbf e^\mu,\mathbf e_\nu\rangle = \delta^\mu_\nu (Doran & Lasenby 2003  §4.3).

Squid Game conditional probabilities

Difficulty:   ★★★☆☆   undergraduate

Earlier we analysed the probabilities for the bridge-crossing scenario in the Squid Game episode “VIPs”, which has “deadly high stakes” according to the Netflix blurb for the series. 🙂 So far, we made the assumption of no foreknowledge. This means our results for the players’ progress describe their chances as they stand before the game begins. Equivalently, if the game has started, our results assume the analyst knows nothing about prior contestants, and cannot view the state of the bridge.

But now, suppose we are told only that a specific player numbered i died on step number n. (That is, they stood safely on column n – 1, but chose wrongly amongst the next pair of glass panels on column n, breaking a pane and plummeting downwards.) Then the next player is definitely safe on step n, but has no information about later steps, so the game is essentially reset from that point. Hence the “conditional probability” that player I > i is still alive on step N > n is simply:

    \[P(I,N|i\textrm{ died on }n) = a_{I-i,N-n}.\]

Recall a_{i',n'} \equiv P(i',n') is the chance player i′ is alive on step n′ (given no information nor conditions). We labelled as b_{i',n'} = \binom{n'-1}{i'-1}2^{-n'} the chance they died on step n′ specifically, so analogously:

    \[P(I\textrm{ dies on }N|i\textrm{ died on }n) = b_{I-i,N-n}.\]

Now, suppose we are told only that a specific player I will die on step N. What is the probability for an earlier player’s progress? Bayes’ theorem says that given two events A and B, the conditional probabilities are related by P(A|B) = P(B|A)P(A)/P(B), which in our case is:

    \[\begin{aligned}c_{i,n} &:= P(i\textrm{ died on }n|I\textrm{ dies on }N) \\ &= \frac{b_{I-i,N-n}b_{i,n}}{b_{I,N}} \\ &= \frac{\binom{N-n-1}{I-i-1}\binom{n-1}{i-1}}{\binom{N-1}{I-1}}.\end{aligned}\]

The powers of 2 cancelled. The Table below shows some example numbers.

Table: Probability player i died on step n, given player I = 5 will die on step N = 8
step:

n = 1

2 3 4 5 6 7 8
player:

i = 1

4/7 2/7 4/35 1/35 0 0 0 0
2 0 2/7 12/35 9/35 4/35 0 0 0
3 0 0 4/35 9/35 12/35 2/7 0 0
4 0 0 0 1/35 4/35 2/7 4/7 0
5 0 0 0 0 0 0 0 1

In general, on any given row (fixed player i) the entries are nonzero only for n between i and N – I + i inclusive. This forms a diamond shape. For the row sum \sum_{n=i}^{N-I+i}c_{i,n} computer algebra returns a hypergeometric function times two binomial coefficients, which appears to simplify to 1 (for integer parameters) as expected, since player i must die somewhere. On any given column \sum_i c_{i,n} = (I-1)/(N-1) which is independent of n, meaning each step has equal chance that some player will die there. In particular the first entry itself takes this value: c_{1,1} = (I-1)/(N-1).

We examine other properties and special cases. By construction the last row and column are zeroes apart from c_{I,N} := 1; our general formula does not apply for n = N. If we are told where the second player I = 2 died, then player i = 1 has an equal chance 1/(N – 1) of dying on any earlier step. Also from the definition it is clear:

    \[c_{i,n} \equiv c_{I-i,N-n},\]

so the table is symmetric about its central point. The ratio of adjacent entries follows from the binomial coefficients:

    \[\begin{aligned}\frac{c_{i-1,n}}{c_{i,n}} &= \frac{(i-1)(N-I-n+i)}{(I-i)(n-i+1)}, \\ \frac{c_{i,n-1}}{c_{i,n}} &= \frac{(N-n)(n-i)}{(n-1)(N-I-n+i+1)}.\end{aligned}\]

It follows that at step n, player i = (I – 1)n/N and the subsequent player have the same “fail” chance. Presumably the maximum lies within this range. Physically we require the indices i and n to be integers. For the chosen Table parameters above, the relation just given is simply in/2, so every second column contains an adjacent pair of equal values. For the steps (columns) on the other hand, on n = (i – 1)(N – 1) / (I – 2) and the following step the “elimination” chance is equal. Note these special index values are linear functions of the other index (i or n respectively), where we regard I and N as fixed.

By rearranging terms we can write equivalent expressions for the chance to be eliminated, such as:

    \[c_{i,n} \equiv \frac{i\binom{N-I}{n-i}\binom{I-1}{i}}{n\binom{N-1}{n}}.\]

conditional probability in Squid Game
The probability Squid Game bridge contestants will expire on a given step, given the condition: player I = 9 will die on step N = 25. It forms a sort of boat- or saddle-shape. The blue dots are for integer index values, which are physical. In previous results the “ridge line” of high probability was at roughly n = 2i, but for the conditional probability it is spread out between the endpoint events, so in this case is roughly n = 3i.

For suitably large parameters, the probability resembles a gaussian curve. We can apply the de Moivre-Laplace approximation (with parameter p := ½ say) to the binomial coefficients. This gives a gaussian for a fixed step number n, as a function of the player number. I omit the height, but its centre and width are determined from the exponent which is:

    \[-\frac{\Big(i-\frac{N+nI-I-2n}{N-2}\Big)^2}{(n-1)(N-n-1)/2(N-2)}.\]

The spread is maximum at n = N/2, in this approximation. Now to obtain a gaussian approximation for a fixed player i, apply the results of the previous blog post using the substitutions x \rightarrow n-1, a \rightarrow i-1, b \rightarrow I-i-1, and X \rightarrow N-2. The centre is n_0 := x_0 + 1 = (i-1)(N-1)/(I-2)+1/2. One option for the height of the gaussians — when looking for a simple expression — is to use the sums 1 and (I – 1)/(N – 1) determined before. Recall for a normalised gaussian, the height 1/\sqrt{2\pi}\sigma is inversely proportional to the standard deviation.

There are other conditional probability questions one could pose. Suppose we are given a window, bounded by the events that player J died on column L, and later player K dies on column M? Inside this window, the probabilities reduce to our above analysis: the chance i dies on n is just c_{i-J,n-L}, where we also substitute I \rightarrow K-J and N \rightarrow M-L. As another possible scenario to analyse, we might be informed that player I is alive on step N. Then we would not know how far they progressed, just that it was at least that far. Or, we might be told player I died on or before step N.

A concluding thought: Bayes’ theorem is deceptively simple-looking. I tried harder ways beforehand, trying to puzzle through the subtlety of conditional probability on my own. But with Bayes, the main result followed easily from our previous work.

🡐 asymptotics | Squid Game bridge | ⸻ 🠦

Gaussian approximation to a certain product of binomial coefficients

Difficulty:   ★★★☆☆   undergraduate

Consider the following function, which is the product of a certain pair of binomial coefficients:

    \[f(x) := \binom{x}{a}\binom{X-x}{b}.\]

We take abX >> 1 to be constants, and x to have domain [a – 1, Xb + 1] which implies Xab – 2 at least. As usual \binom{x}{a} := x!/a!(x-a)!, and this is extended beyond integer values by replacing each factorial with a Gamma function. Note the independent variable x appears in the upper entries of the binomial coefficients. Curiously, from inspection f is well-approximated by a gaussian curve. To gain some insight, for integer values of the parameters f is the polynomial:

    \[(a! b!)^{-1}x(x-1)\cdots(x-a+1)\cdot(X-b+1-x)\cdots(X-x).\]

This has many zeroes, and sometimes oscillates wildly in between them, hence the domain of x specified earlier.

plot of function and a gaussian curve approximation
Figure: The function for a = 7, b = 10, and X = 20. It is shown beyond our stated domain, which is bounded by the roots at x = 6 and 11. The gaussian uses our estimated centre of x = 277/34 or approx. 8.147, whereas f‘s actual maximum occurs at around x = 8.139. The variance is from our approximate harmonic number formula, evaluated at the estimated centre point. Alternately, the “finite difference” derivatives give a poor estimate in this case. In general, the gaussian fit looks best for high parameter values with a near b, etc.

Now the usual approximations to a single binomial coefficient (actually, binomial distribution) are not helpful here. For example the de Moivre–Laplace approximation is a gaussian in terms of the lower entry in the binomial coefficient, whereas our x is in the upper entries. More promising is the approximation as a Poisson distribution, which leads to a polynomial which is itself gaussian-like, and motivated the previous post incidentally. However we proceed from first principles, by estimating the centre point and the second derivative there.

At the (central) maximum of f, the slope is zero. In general the derivative is f'(x) = f(x)(H_x-H_{x-a}-H_{X-x}+H_{X-b-x}), where the H’s are called harmonic numbers. There may not exist any simple explicit expression for the turning points. Instead, the ratio of nearby points is comparatively simple:

    \[\frac{f(x-1/2)}{f(x+1/2)} = \frac{(x-a+1/2)(X+1/2-x)}{(x+1/2)(X-b+1/2-x)},\]

using the properties of the binomial coefficient. The derivative is approximately zero where this ratio is unity, which occurs at:

    \[x_0 := \frac{2aX+a-b}{2(a+b)}.\]

This should be a close estimate for the central turning point. [To do better, substitute specific numbers for the parameters, and solve numerically.] It is typically not an integer. Our sought-for gaussian has form C\operatorname{exp}(-(x-x_0)^2/2\sigma^2). We set the height C := f(x_0). Only the width remains to be determined. The gaussian’s second derivative evaluated at its centre point is -C/\sigma^2. On the other hand:

    \[f''(x) = f'(x)^2/f(x) - (H_x^{(2)}-H_{x-a}^{(2)}+H_{X-x}^{(2)}-H_{X-b-x}^{(2)})f(x),\]

which uses the so-called harmonic numbers of order 2, and I incorporate the function and its derivative (both given earlier) for brevity of the expression. Matching the results at x_0 yields the variance parameter \sigma^2:

    \[\sigma^{-2} := H_{x_0}^{(2)}-H_{x_0-a}^{(2)}+H_{X-x_0}^{(2)}-H_{X-b-x_0}^{(2)},\]

using f'(x_0) \approx 0. (At large values the series H_x^{(2)} \approx \pi^2/6 -1/x +1/2x^2\cdots may give insight into the above.) But alternatively, we can approximate the second derivative using elementary operations. By sampling the function at x_0-1, x_0, and x_0+1 say, a “finite differences” approach gives approximate derivatives. We can use the simple ratio formula obtained earlier to reduce the sampling to one or two points only, which might gain some insight along the way (though I currently wonder if this is a dead end…).

Now f'(x_0-1/2) \approx f(x_0) - f(x_0-1), which becomes:

    \[\frac{2C(a+b)^3}{(2aX+a-b)(2bX-2b^2-2ab+a+3b)},\]

after using the ratio formula to obtain f(x_0-1) in terms of C. Similarly it turns out f'(x_0+1/2) is the negative of the above expression, but with a and b interchanged. Then a second derivative is: f''(x_0) \approx f'(x_0+1/2)-f'(x_0-1/2), but the combined expression does not simplify further so I won’t write it out. The last step is to set \tilde\sigma^2 := -C/f''(x_0), which is different to the earlier choice.

A slightly different approach uses f'(x_0-1/2) \approx (f(x_0+1/2)-f(x_0-3/2))/2, which may be expressed in terms of another sampled point E := f(x_0-1/2) = f(x_0+1/2). Similarly f'(x_0+1/2) \approx (f(x_0+3/2)-f(x_0-1/2))/2. The estimate for the second derivative follows, then later:

    \[\hat\sigma^2 := \frac{-2C(aX-b)(bX+a+2b)(bX-b^2-ab+a+2b)(aX-a^2-ab-b)}{E(a+b)^6}.\]

The expression is a little simpler in this approach, but at the cost of a second sample point. The use of f'(x_0-1) \approx f(x_0-1/2)-f(x_0-3/2) and f'(x_0+1) \approx f(x_0+3/2)-f(x_0+1/2) instead leads to the same result.

Gaussian approximation to a certain polynomial

Difficulty:   ★★★☆☆   undergraduate

Consider the function:

    \[x^A(X-x)^B,\]

where the independent variable x ranges between 0 and X, and the exponents are large: A, B \gg 1. [We could call it a “polynomial”, though the exponents need not be integers. Specifically it is the product of “monomials” in x and Xx, so might possibly be called a “sparse” polynomial in this sense.] Surprisingly, it closely resembles a gaussian curve, over our specified domain x \in [0,X].

approximation to a certain polynomial using a gaussian curve
Figure: The polynomial with parameters A = 10, B = 13, and X = 11. Our gaussian approximation is visually indistinguishable near the centre. Outside our specified domain the polynomial tends to \pm\infty, and for each non-integer exponent the tail on one side becomes imaginary.

The turning point is where the derivative equals zero. This occurs when x is the surprisingly simple expression:

    \[\tilde x := \frac{X}{1+B/A},\]

at which the function has value:

    \[A^A B^B \Big(\frac{X}{A+B}\Big)^{A+B} \equiv (B/A)^B \tilde x^{A+B}.\]

An arbitrary gaussian, not necessarily normalised, has form: Ce^{-(x-D)^2/2\sigma^2}. This has centre D which we equate with \tilde x, and maximum height C which we set to the above expression. We can fix the final parameter, the standard deviation, by matching the second derivatives at the turning point. Hence the variance is:

    \[\sigma^2 = \frac{AB}{(A+B)^3}X^2 \equiv \frac{B}{A^2X}\tilde x^3.\]

Hence our gaussian approximation may be expressed:

    \[\boxed{(B/A)^B \tilde x^{A+B} \operatorname{exp}\Big( -\frac{(x-\tilde x)^2}{2B\tilde x^3/A^2X} \Big).}\]

The integral of the original curve turns out to be:

    \[\int_0^X x^A(X-x)^Bdx = \frac{X^{A+B+1}}{(A+B+1)\binom{A+B}{A}}.\]

This uses the binomial coefficient \binom{A+B}{A} := (A+B)!/A!B!, which is extended to non-integer values by replacing the factorials with Gamma functions. We could then apply Stirling’s approximation A! \approx \sqrt{2\pi A}(A/e)^A to each factorial, to obtain:

    \[\int\cdots \approx \frac{\sqrt{2\pi(A+B)}}{A+B+1}(B/A)^{B+1/2}\tilde x^{A+B+1},\]

though this is more messy to write out. On the other hand, the integral of the gaussian approximation is:

    \[\int_{-\infty}^\infty \operatorname{exp}\cdots = \sqrt\frac{2\pi}{A+B}(B/A)^{B+1/2}\tilde x^{A+B+1}.\]

We evaluated this integral over all real numbers, because the expression is simpler and still approximately the same. The ratio of the above two expressions is (A+B)/(A+B+1) \approx 1.

Squid Game asymptotic probability

Difficulty:   ★★★☆☆   undergraduate

We continue with the bridge-crossing scenario from Squid Game called “Glass Stepping Stones”. Here I analyse the probabilities for late contestants on a very long bridge, and the expectation value for a player’s progress. Last time I found an exact expression for the probability P(i,n) that player number i is still alive on the nth step. Now seems a good place to mention there are equivalent expressions, such as:

    \[P(i,n) = 1 - \binom{n}{i}\cdot{_2F_1}(i,n+1,i+1;-1),\]

where {_2F_1} is called the hypergeometric function, and the other term is the binomial coefficient which is read as “n choose i”. The factor 2^{-n} seen previously has been absorbed. We listed several special cases of the probabilities last time. Another is:

    \[P(i,2i-1) = \frac{1}{2}.\]

So remarkably, the chance player i will be alive on step 2i – 1 is precisely 50%! For fixed large i, if we plot the probability distribution as a function of n it looks smooth, remaining near 1 for early steps before rolling down to near 0. Qualitatively this looks like a \tan^{-1}, tanh, or erf (“error function”). We reflect these curves, centre them on the value 1/2 at n = 2i – 1, and scale them linearly: so they have the appropriate bounds and match the slope at the centre point. See the Figure below.

probability for player 400
Figure: The probability player number i = 400 is still alive at step n. The scaled tanh function is close to the exact curve, while the scaled erf function nearly overlays it.

In fact the slope used in the Figure is only an approximation as described next, but this is a deliberate choice to show it still gives a good fit. The exact slope \partial P/\partial n evaluated at n = 2i – 1 seems a little too complicated to be useful. It contains a derivative of the hypergeometric function, which appears to approach -1/2 in the limit of large i, hence the slope at the centre point is asymptotic to -1/\sqrt{4\pi i}. Another approach is to consider the subsequent bridge step, for which:

    \[P(i,2i) = \frac{1}{2} - \frac{\Gamma(i+1/2)}{2\sqrt\pi\,\Gamma(i+1)},\]

which uses the Gamma function. The difference P(i,2i) – P(i,2i-1) approximates the slope, and is also asymptotic to -1/\sqrt{4\pi i} as i \rightarrow \infty. Hence our approximation for late players is:

    \[P(i,n) \approx \frac{1}{2}\Big( 1-\operatorname{erf}\frac{n+1-2i}{2\sqrt i} \Big).\]

Now the error function by definition is the integral of a gaussian curve. The derivative with respect to n of our approximation is precisely the righthand side below, which itself approximates the chance a late player dies on that step:

    \[P(i\textrm{ dies on }n) \approx -\frac{1}{\sqrt{4\pi i}}e^{-(n-2i+1)^2/4i}.\]

For fixed i this is a gaussian distribution with centre n = 2i – 1 and standard deviation \sqrt{2i}. It is normalised in the sense its integral over n \in \mathbb R is exactly 1, but physically we want the discrete sum over n \in \mathbb N^+. For the 10th player this is approx. 0.9991, which is already close. The exact chance for dying on a given step was determined in the previous article to be b_{i,n} = \binom{n-1}{i-1} \cdot 2^{-n}. The Figure below shows some early values. As before, we can extend the function beyond integer parameters.

probability of death
Figure: The probability player i will die on step n, given no foreknowledge (that is, before the game begins). The blue dots correspond to integer values of the parameters, which are physical. Contestants face a near-gaussian “hill of death” so to speak, which peaks at n = 2i – 2 and 2i – 1. I have included the “spires” at the back for sake of interest, as a peek into the rich structure for negative n, though this is unphysical.

The ratio b_{i,n}/b_{i,n+1} = 2(n-i+1)/n precisely. Hence for a given player, the adjacent steps n = 2i – 2 and 2i – 1 are equally likely locations their game will be “discontinued”. This is surely the maximum assuming integer parameters, apart from the first player for whom step 0 is safe but step 1 is their most likely “resting place”. Hence the reader might prefer to translate our gaussian approximation by half a step or so; apparently there are various approximations to a binomial coefficient. The subsequent step n = 2i is a more likely endpoint than the earlier step 2i – 3.

The expectation value for a given player’s death is:

    \[\sum_{n=1}^\infty n \cdot b_{i,n} = \binom{0}{i-1} \cdot {_2F_1}(1,-i,2-i;-1).\]

This function is very close to 2i, apart from a small oscillatory wiggle. At integer i it is singular, but from inspection of its plot it may be extended to a continuous function with value precisely 2i on physical parameter values (that is, integer i). Finally, for a given step n, the probability that some player breaks a tile is:

    \[\sum_{i=1}^n b_{i,n} = \frac{1}{2}\]

precisely, which is unsurprising. (Better terminology would be the nth “column”, as Henle+  use.) This assumes that n or more players have finished their run, otherwise the step is less likely to be broken.

Update, May 19: The death chance b_{i,n} is ½ times a binomial distribution in n. We previously found a gaussian curve for a given player i. Now. for a fixed step n, the de Moivre-Laplace approximation is a gaussian over the player number i:

    \[\frac{1}{\sqrt{2\pi(n-1)}}e^{-\frac{(i - n/2 - 1/2)^2}{(n - 1)/2}.\]

🡐 general solution | Squid Game bridge | conditional probabilities 🠦