Metric connection as a rotating frame

Difficulty:   ★★★☆☆   undergraduate

[This very rough outline is just a placeholder. I have been putting off writing this article, until I can draw a good diagram.]

“Recall” a connection \nabla instructs how to compare vectors at nearby points on a manifold. It is essential for defining what “parallel” means, and taking certain derivatives. In physics, it is typically obtained from a metric. This case is called a metric connection or Levi-Civita connection. The most familiar expression of a connection is the connection coefficients \Gamma_{\mu\nu}^\sigma, which are called Christoffel symbols in the special case of a metric connection.

One intuition is the gradient of basis vector (fields). \nabla_\mu\mathbf e_\nu = .... This decomposes the gradient of one basis vector field in terms of all the basis vectors. In fact, this gradient is expressed in totality by the total covariant derivative \nabla\mathbf e_\nu… For each fixed value of \nu (and choice of basis), this is actually a tensor, of rank 2.

However an even better one is a map from vectors to rotations. Input vector is the direction a frame is moved along the manifold. (To be slightly more precise, for an “infinitesimal” distance. Though one could certainly be much clearer still.) The output is the rotation the frame undergoes. Relative to the frame (field) already defined there, that is.

Cartan knew this interpretation. For example his book Riemannian geometry in an orthogonal frame is clear. It is not incidental, but infused throughout. Starting with section 1 of chapter 1! I am thoroughly impressed.(While I was well aware Cartan is a famous geometer, naturally I have not read much of his work carefully in the original sources. firsthand, experiential.)

I am deeply passionate about geometric intuition, because I like deep conceptual understanding. Conversely I feel very unsatisfied when equations are presented without meaning behind them — especially things like connections or curvature tensors, in a pedagogical setting. (It must be hard to teach a course on general relativity, where even the requisite differential geometry must be crammed in first. Besides it may not even be a lecturer’s specialty. Hence I am not criticising courses… much. But even most textbooks don’t present the degree of intuition which I would prefer. But at least one piece of good news for me personally is that this gives me an avenue where I can contribute. And I feel enthusiastic about that. Now this won’t actually be original understanding, in many cases. It would be more about drawing attention to under-represented insights.)

It seems intuition sometimes gets lost, historically. One cause seems to be abstraction and generalisation. Now, I have absolutely no problem with those. It is valuable and insightful, and it is good that specialists work on that. But I like to also see clear motivation for introducing a quantity. And knowing the original historical motives by the discoverer(s) can make it more relatable. Covectors are a good example. Even if a strange circuitous route (like a recent video about optics and Lagrangians iirc), because connects to existing insights. Also could add confidence: “I could do that too!” [The opposite is refining and polishing results. Now that is certainly valuable, in fact it is absolutely essential (pragmatically speaking) because it is too much work to read original sources, in most cases. But this already receives tremendous emphasis.]

Is gravity a force, curvature, torsion, or inertia?

Difficulty:   ★★☆☆☆   no equations, but some deep concepts

When you first learn some physics of gravity, it is introduced as a force. This is the usual phrasing for Newton’s theory of gravity, whose equations were published in 1687.

But if you then study general relativity (GR) — Einstein’s theory of gravity from 1915 — many textbooks correct you that gravity is not a force, but curvature of spacetime. Congratulations, you have now attained the sophisticated understanding. You have arrived!

…Or have you? When Newtonian gravity is formulated in modern geometric language, it is also interpreted as curvature of space–time. (This uses a manifold with metric(s) and connection, called Newton-Cartan theory.) Hence, the distinction between force and curvature is not about Newton vs Einstein, nor classical vs relativistic, but more about “geometric” formulations vs others.

To throw another spanner in the works, there is a niche approach named teleparallel gravity. Here, gravity is torsion, and there is no curvature at all! The theory uses a different mathematical description than GR, but ends up with the same physical predictions. [Anecdotes: I realised (or re-realised) this interpretation possibility during a lecture at a gravitational waves school at the Physikzentrum in Bad Honnef, Germany in 2019, I think. After the lecture, a Portuguese PhD student and I confirmed with one another what we had heard, and the above consequence. Also in an earlier year, I sat on a streetside kerb in Europe while a young guy (Martin Krššák?) talked passionately to me about teleparallel gravity. We were waiting for some aspect of a conference program to start, perhaps a tour or dinner.]

Yet another concept is inertia. Here I mean a gauge or convention about a “background” velocity or acceleration, for some chosen system at hand. (I am not talking about the more common usage, where inertial motion means geodesic. Regarding another issue, I do not promote an absolute background velocity, as in the aether.) For example, are you sitting still right now, moving at ≈1000 km/h due to Earth’s rotation, or moving at ≈600km/s as determined by the cosmic microwave background? One may ask related questions about acceleration (again, in the sense of an inertial background gauge, not the 4-acceleration \nabla_{\mathbf u}\mathbf u from the connection in the usual sense). I will write more on this in the future.

The modern view is that gravitation is curvature alone, since only that is physically measurable, locally. (In a ship floating in space, with no windows so you can’t compare yourself to the distant stars, then uniform inertial motion is not detectable, even in principle.) However I have been rethinking the sufficiency of curvature lately. In Newtonian theory, you can add a constant to the gravitational potential, which has no effect on physical predictions. You can also add a term which effects a background/inertial acceleration; this is constant over space but may vary over time. (Malament 2012  §4.2 has plenty of detail.) I speculate similar gauge choices would apply to relativistic spacetimes too, with appropriate generalisation. I further speculate this is essential for defining gravitational energy, at least in a physical and general way. Another inspiration for going beyond curvature is the Equivalence Principle, a concept which has morphed and diverged quite significantly over the past century. But Einstein’s version of it stressed the role of inertia, as Norton (1985)  evaluates in detail. I certainly appreciate the modern curvature-only view for what it affirms, just not what it denies.

In conclusion, there is a curious variety of terminology for gravity: force / curvature / torsion / inertia. Personally I continue to conflate “gravity” with “curvature” in communication, but this is mostly just a reflection of standard usage. In particular I do not imply a rejection of the torsion perspective of teleparallel gravity, but merely my unfamiliarity with it. I also wonder if the “force” interpretation might remain useful when frames are specified, including when gravity is described in a similar way to electromagnetism (called gravitomagnetism). As I understand the usage, we say “force” to suggest deviation from the natural trajectory, so the question becomes, which trajectories will one consider natural? Finally, I tentatively suggest an inertial gauge should be absorbed into the concept of “gravity”, in addition to curvature. However, to compromise with the usual terminology, where gravity means curvature, I’ll settle for the hybrid term “inertia–gravity” instead.

Laniakea and cosmic filaments

Difficulty:   ★★☆☆☆   high school

The following video shows the paths taken by galaxies and larger structures, within our region of the universe. This cosmic “bulk flow” is based on a huge amount of astrophysical data, analysed in Tully+ 2014 . These researchers dubbed this supercluster of galaxies Laniakea, which is Hawaiian for “open skies” or “immense heaven”. It is about 500 million light-years across, which is 5000 times the length of our Milky Way galaxy! The incredible image in the still frame below stuck in my mind since I watched the video years ago. Its (human) artist is credited here.

The red dot shows the location of the Milky Way. Now, matter in the universe is distributed fairly evenly (it is “homogeneous” and “isotropic”) on very large scales. However gravity causes it to clump together on smaller scales, into stars, galaxies, clusters of galaxies, and so on. A ball-like shape tends to collapse along a single one of its axes first into a sheet or “pancake”, then along another axis into a line or thread-like “filament”, and finally into a more compact blob, as shown theoretically by Zeldovich . In our universe, matter forms a “cosmic web”, which includes filaments with a heavy “node” at the end — a cluster or supercluster. On the other hand, cosmic “voids” are vast regions containing less matter than average. Over time, the matter clumps further, while the voids grow larger and sparser.

Laniakea and Perseus-Pisces
Figure: Two nearby superclusters, and the flow of galaxies (and larger structures like clusters) within them. The image is analogous to a drainage basin for rainfall on Earth. The “Local Void” is in the middle. The Milky Way would appear near the centre of the image, near the void, but barely within Laniakea. Sources: I took the image from Wikimedia, and the description is based on another video by the researchers.

The universe is also expanding (which means nothing more than matter moving apart, arguably). For everything discussed here, this expansion has already been subtracted off, so only the “peculiar velocity” remains. In fact Laniakea as a whole is not gravitationally bound. In the distant future it will split into smaller clusters, which will separate as the universe expands. (At least, based on the current understanding of dark energy.) By the way there is no universal agreement on the name Laniakea, nor on its precise boundary.

I remembered this research because I have been wondering about cosmic filaments. (I hoped they might provide a real-world basis for a certain idealised gravitational scenario I have in mind, for some theoretical work. But even if they don’t fulfil this, it is no loss to experience wonder at the beauty in the universe.)

While filaments are vast, they are mostly empty space, hence their gravity is “weak” so Newtonian theory works well. If you model one as cylindrically symmetric, then most of the gravitational force they exert is sideways, onto the filament. However they tend to end in a heavy cluster or supercluster, hence even a galaxy inside the filament will get pulled along its length. The gravitational acceleration of our galaxy and its neighbours (the Local Group) is only about 10-12m/s2 apparently, which is 10 trillion times less than Earth’s surface gravity! However the effect is cumulative, so over vast time scales this is consistent with the ≈600 km/s speed of the Local Group today.

SU(3) as spacetime structure

Difficulty:   ★★★★★   research

Quarks have “colour” charge, which has symmetry group SU(3). This is normally understood as an “internal symmetry”. But Hestenes has a remarkable proposal to interpret it geometrically in spacetime itself, using tangent vectors and bivectors formed from them, in a 1982 paper  §8.

To begin, SU(3) acts on a 3-dimensional complex vector space \mathbb C^3. So we seek an analogue of this structure in Minkowski spacetime. Choose an orthonormal basis: \mathbf t etc. Hestenes defines the bivectors R := \mathbf{tx} + \mathbf {yz}, G := \mathbf{ty} + \mathbf{zx} and B := \mathbf{tz} + \mathbf{xy}. This notation uses the geometric product, but in this case the vectors are orthogonal, so the result is just wedge products, e.g. \mathbf{tx} = \mathbf t\wedge\mathbf x. Together, the scalar and pseudoscalar parts from the geometric algebra form a subalgebra which is isomorphic to \mathbb C, the underlying field for the vector space \mathbb C^3. You can define the equivalent of complex conjugation. I completed an answer on Physics StackExchange today which has more details on this bit.

For intuition behind the bivectors, recall bivectors generate rotations. Here my use of the word “rotation” is contextual: the metric signature is Lorentzian, so I mean Lorentz transformations SO(1,3) (or maybe some subgroup), hence these spacetime “rotations” include both boosts and spatial rotations. Then R encodes a boost in the x-direction, combined with a rotation about the x-axis. This suggests to me a screw or helix picture… but with some strong cautions: the screws are not rigid bodies under many SU(3) transformations, and some transformations even swap handedness.

Now SU(3) consists of “complex rotations”. This may be interpreted as a subgroup of SO(6), the rotations on a 6-dimensional real space. Specifically, those which preserve the structure of the complex axes, relative to one another. However in our case these are not spacetime rotations, but act on the space of bivectors. This is quite abstract, and I envision making illustrations in future work. But just as a 2D rotation changes the x and y-components of a vector, these SO(6) operators change the components of a bivector (the coefficients when decomposed in the 6 basis bivectors \mathbf{tx} and so on). Returning to SU(3)) specifically, one example is rotatingRandGaround (or "into") one another. Another example is changing the "phase" ofRin one direction, while changingGthe opposite way. And in turn, changing the phase ofRmeans keeping its overall (complex) magnitude the same, while redistributing the magnitude of the timelike and spacelike blades (the boost and spatial rotation).  This suggests <em>quark colour is a bivector</em>. Or as Hestenes proposes, "we associate quark states with even multivectors". I am unconcerned with the distinction, for the purposes of this brief sketch. (Though I do lean towards Hestenes tbh.) Both characterise rotations, with even multivectors including an extra phase term, essentially.  The gluon potential field may be interpreted as a map from vectors to\mathfrak{su}(3). This is the Lie algebra ofSU(3), so its elements generate theSU(3)transformations, so we realise them as certain bivectors. But these operators can themselves be characterised as bivectors, since in geometric algebra elements can also act as operators. (I am little concerned about the difference between rotations and rotation generators for the purposes of this current article, which is conceptual and a brief sketch.) The intuition is that as you move in a given direction, there is a "rotation" of colour space. This is very similar to one visualisation of a connection. Similarly the gluon field strength can be taken as a map from bivectors to\mathfrak{su}(3). As you rotate around a small loop (specified by the bivector), the output of the map describes the resulting rotation in colour space. This is very similar to one visualisation of Riemann curvature. The colour current is also a map from vectors to\mathfrak{su}(3).  What is the interpretation of a "white" or colour-neutral quark combination? Apparently the totally antisymmetric tensor\epsilonappears in the definition, which suggests a rewrite as a wedge product and possibly a Hodge dual. It is simply required the result be nonzero, for exampleR\wedge_{\mathbb C}G\wedge_{\mathbb C}B$ is valid. The wedge here is not the one acting on tangent vectors, but “complex vectors” which for us are basically bivectors. Intuitively, the “complex trivector” fills all directions of (real) bivector space. Just like the 4-volume element formed from vectors fills all spacetime directions (within a tangent space at a point, that is).

I have plenty of questions myself, and topics to ponder on:

  • Quarks also satisfy the Dirac equation, which leads to a 4-velocity vector. But a quark’s colour bivector would also seem to give rise to a 4-velocity vector. It does not make sense for one particle to have two different velocities! Perhaps we can split them into two separate fields/particles, or else just force the two velocities to coincide, although I’ll assume for now that both those wild ideas are wrong. [I will take the conservative (cautious) approach for now, and treat Hestenes’ idea as just a reformulation. But certainly I have hopes it will be more than that!]
  • Since the quarks (within a single hadron) are separate particles, they should be housed in different copies of space
  • Individual quarks are unobservable
  • Hestenes’ construction relies on choice of a timelike vector, which suggests the meaning of red/green/blue is frame-dependent. It would be very interesting to boost between frames.

Raising and lowering indices on the Christoffel symbols

Given a connection \nabla on a manifold, along with any vector basis \mathbf e_\nu, the connection coefficients are:

    \[ \Gamma_{\mu\nu}^{\hphantom{\mu\nu}\sigma} := \langle\nabla_\mu\mathbf e_\nu,\mathbf e^\sigma\rangle. \]

[For the usual connection based on a metric, these are called Christoffel symbols. On rare occasions some add “…of the second kind”, which we will interpret to mean the component of the output is the only raised index. Also the coefficients may be packaged into Cartan’s connection 1-forms, with components \omega_{\nu\hphantom\sigma\mu}^{\hphantom\nu\sigma} = \Gamma_{\mu\nu}^{\hphantom{\mu\nu}\sigma}, as introduced in the previous article. Hence raising or lowering indices on the connection forms is the same problem.]

In our convention, the first index specifies the direction \mathbf e_\mu of differentiation, the second is the basis vector (field) being differentiated, and the last index is for the component of the resulting vector. We use the same angle bracket notation for the metric scalar product, inverse metric scalar product, and the contraction of a vector and covector (as above; and this does not require a metric).

This unified notation is very convenient for generalising the above connection coefficients to any variant of raised or lowered indices, as we will demonstrate by examples. We will take such variants on the original equation as a definition, then the relation to the original \Gamma_{\mu\nu}^{\hphantom{\mu\nu}\sigma} variant will be derived from that.

[Regarding index placement and their raising and lowering, I was formerly confused by this issue, in the case of non-tensorial objects, or using different vector bases for different indices. For example, to express an arbitrary frame in terms of a coordinate basis, some references write the components as e_a^{\hphantom a\mu}, as blogged previously. The Latin index is raised and lowered using the metric components in the general frame \mathbf e_a, whereas for the Greek index the metric components in a coordinate frame \partial_\mu are used. However while these textbook(s) gave useful practical formulae, I did not find them clear on what was definition vs. what was derived. I eventually concluded the various indices and their placements are best treated as a definition of components, with any formulae for swapping / raising / lowering being obtained from that.]

The last index is the most straightforward. We define: \Gamma_{\mu\nu\tau} := \langle\nabla_\mu\mathbf e_\nu,\mathbf e_\tau\rangle, with all indices lowered. These quantities are the overlaps between the vectors \nabla_\mu\mathbf e_\nu and \mathbf e_\tau. [Incidentally we could call them the “components” of \nabla_\mu\mathbf e_\nuif this vector is decomposed in the alternate vector basis (\mathbf e^\tau)^\sharp. After all, for any vector, \mathbf X = \langle\mathbf X,\mathbf e_\tau\rangle(\mathbf e^\tau)^\sharp, which may be checked by contracting both sides with \mathbf e_\sigma. ] To relate to the original \Gammas, note:

    \[ \Gamma_{\mu\nu\tau} = \langle\nabla_\mu\mathbf e_\nu,\mathbf e^\sigma g_{\sigma\tau}\rangle = \langle\nabla_\mu\mathbf e_\nu,\mathbf e^\sigma\rangle g_{\sigma\tau} = \Gamma_{\mu\nu}^{\hphantom{\mu\nu}\sigma}g_{\sigma\tau}, \]

using linearity. Hence this index is raised or lowered using the metric, as familiar for an index of a tensor. We could say it is a “tensorial” index. The output \nabla_\mu\mathbf e_\nu of the derivative is just a vector, after all. The \Gamma_{\mu\nu\tau} have been called “Christoffel symbols of the first kind”.

The first index is also fairly straightforward. We define a raised version by: \Gamma^{\tau\hphantom\nu\sigma}_{\hphantom\tau\nu} := \langle\nabla^\tau\mathbf e_\nu,\mathbf e^\sigma\rangle. However, this begs the question of what the notation ‘\nabla^\tau’ means. The raised index suggests \mathbf e^\tau, which is a covector, but we can take its dual using the metric (assuming there is one). Hence \nabla^\tau \equiv \nabla_{\mathbf e^\tau} := \nabla_{(\mathbf e^\tau)^\sharp} is a sensible definition, or for an arbitrary covector, \nabla_{\boldsymbol\alpha} := \nabla_{\boldsymbol\alpha^\sharp}. (For those not familiar with the notation, \boldsymbol\alpha^\sharp is simply the vector with components \alpha^\tau = \alpha_\mu g^{\mu\tau}.)

These duals are related to the vectors of the original basis by: (\mathbf e^\tau)^\sharp = g^{\tau\mu}\mathbf e_\mu, where g^{\tau\mu} = \langle\mathbf e^\tau,\mathbf e^\mu\rangle are still the inverse metric components for the original basis. Hence \nabla^\tau = \nabla_{g^{\tau\mu}\mathbf e_\mu} = g^{\tau\mu}\nabla_\mu, by the linearity of this slot. In terms of the coefficients, \Gamma^{\tau\hphantom\nu\sigma}_{\hphantom\tau\nu} = g^{\tau\mu}\Gamma_{\mu\nu}^{\hphantom{\mu\nu}\sigma}, hence the first index is also “tensorial”.

Finally, the middle index has the most interesting properties for raising or lowering. Let’s start with a raised middle index and lowered last index: \Gamma_{\mu\hphantom\sigma\nu}^{\hphantom\mu\sigma} := \langle\nabla_\mu\mathbf e^\sigma,\mathbf e_\nu\rangle. This means it is now a covector field which is being differentiated, and the components of the result are peeled off. For their relation to the original coefficients, note firstly \langle\mathbf e^\sigma,\mathbf e_\nu\rangle = \delta_\nu^\sigma, since these bases are dual. These are constants, hence their gradients vanish:

    \[ 0 = \nabla_\mu \langle\mathbf e^\sigma,\mathbf e_\nu\rangle = \langle\nabla_\mu\mathbf e^\sigma,\mathbf e_\nu\rangle + \langle\mathbf e^\sigma,\nabla_\mu\mathbf e_\nu\rangle. \]

[The second equality is not the metric-compatibility property of some connections, despite looking extremely similar in our notation. No metric is used here. Rather, it is a defining property of the covariant derivative; see e.g. Lee 2018  Prop. 4.15. But no doubt these defining properties are chosen for their “compatibility” with the vector–covector relationship.]

Hence \Gamma_{\mu\hphantom\sigma\nu}^{\hphantom\mu\sigma} = -\Gamma_{\mu\nu}^{\hphantom{\mu\nu}\sigma}. Note the order of the last two indices is swapped in this expression, and their “heights” are also changed. (I mean, it is not just that \sigma and \nu are interchanged.) This relation is antisymmetric in some sense, and looks more striking in notation which suppresses the first index. Connection 1-forms (can) do exactly that: \boldsymbol\omega^\sigma_{\hphantom\sigma\nu} = -\boldsymbol\omega_\nu^{\hphantom\nu\sigma}.

Note this is not the well-known property for an orthonormal basis, which is \boldsymbol\omega_\sigma^{\hphantom\sigma\nu} = -\boldsymbol\omega_\nu^{\hphantom\nu\sigma} using our conventions. Both equalities follow from conveniently chosen properties of bases: the dual basis relation in the former case, and orthonormality of a single basis in the latter. Our relation holds in any basis, and does not require a metric. But both relations are useful — they just apply to different contexts.

To isolate the middle index as the only one changed, raise the last index again: \Gamma_\mu^{\hphantom\mu\sigma\tau} = \Gamma_{\mu\hphantom\sigma\nu}^{\hphantom\mu\sigma}g^{\nu\tau} = -\Gamma_{\mu\nu}^{\hphantom{\mu\nu}\sigma}g^{\nu\tau}. Only now have we invoked the metric, in our discussion of the middle index.

The formula \Gamma_{\mu\hphantom\sigma\nu}^{\hphantom\mu\sigma} = -\Gamma_{\mu\nu}^{\hphantom{\mu\nu}\sigma} has an elegant simplicity. However at first it didn’t feel right to me as the sought-after relation. We are accustomed to tensors, where a single index is raised or lowered independently of the others. However in this case the middle index describes a vector which is differentiated, and differentiation is not linear (for our purposes here), but obeys a Leibniz rule. (To be precise, it is linear over sums, and multiplication by a constant scalar. It is not linear under multiplication by an arbitrary scalar. Mathematicians call this \mathbb R-linear but not C^\infty-linear.)

The formula could be much worse. Suppose we replaced the original equation with an arbitrary operator, and defined raised and lowered coefficients similarly. Assuming the different variants are related somehow, then in general each could depend on all coefficients from the original variant, for example: A_{\mu\hphantom\nu\sigma}^{\hphantom\mu\nu} = f_{\mu\hphantom\nu\sigma}^{\hphantom\mu\nu\hphantom\sigma\alpha\beta\gamma}A_{\alpha\beta\gamma}, for some functions f with 3 × 2 = 6 indices.

Much of the material here is not uncommon. In differential geometry it is well-known the connection coefficients are not tensors. And it is not rare to hear that in fact two of their indices do behave tensorially. But I do not remember seeing a clear definition of what it means to raise or lower any index, in the literature. (Though the approach in geometric algebra, of using two distinct vector bases \mathbf e_\mu and (\mathbf e^\mu)^\sharp, is similar. This requires a metric, and gives a unique interpretation to raising and lowering indices. But in fairness, most transformations we have described also require a metric.) Most textbooks probably do not define different variants of the connection forms, although I have not investigated this much at present.

Finally, it is easy to look back and say something is straightforward. But not when your attention is preoccupied with grasping the core aspects of a new thing, not its more peripheral details. When I was first learning about connection forms, it was not at all clear whether you could raise or lower their indices. Because the few references I consulted didn’t mention this at all, to me it felt like it should be obvious. But it is not obvious, and demands careful consideration, even though some results are familiar ideas in a new context.

Connection forms and covariant derivatives

In the previous article I introduced Cartan’s connection 1-forms. It is interesting to express various covariant derivatives in terms of them. But firstly:

    \[ \boldsymbol\omega_\mu^{\hphantom\mu\nu} = \omega_{\mu\hphantom\nu\tau}^{\hphantom\mu\nu}\mathbf e^\tau, \qquad\qquad \boldsymbol\omega^\nu_{\hphantom\nu\mu} = \omega^\nu_{\hphantom\nu\mu\tau}\mathbf e^\tau, \]

simply from the definition of covector components. But to check anyway, contract both sides of either equality with \mathbf e_\sigma, to recover the defining formulae. It follows:

    \[ \nabla_\sigma\mathbf e_\mu = \langle\nabla_\sigma\mathbf e_\mu,\mathbf e^\tau\rangle\,\mathbf e_\tau = \omega_{\mu\hphantom\tau\sigma}^{\hphantom\mu\tau}\mathbf e_\tau. \]

To check: the first equality is just components of the vector `\nabla_\sigma\mathbf e_\mu‘. But can check it holds by contracting both sides with \mathbf e^\nu. Similarly for the covector gradients,

    \[ \nabla_\sigma\mathbf e^\nu = \langle\nabla_\sigma\mathbf e^\nu,\mathbf e_\tau\rangle\,\mathbf e^\tau = \omega^\nu_{\hphantom\nu\tau\sigma}\mathbf e^\tau = -\omega_{\tau\hphantom\nu\sigma}^{\hphantom\tau\nu}\mathbf e^\tau. \]

Now, \nabla = \mathbf e^\sigma\otimes\nabla_\sigma. Because: substitute \mathbf X into the left slot (in our convention) of both sides. The RHS becomes: \langle\mathbf e^\sigma,\mathbf X\rangle\nabla_\sigma = X^\sigma\nabla_\sigma = \nabla_{\mathbf X}, by linearity of this slot. Now, apply this identity to \mathbf e_\mu:

    \[ \nabla\mathbf e_\mu = \mathbf e^\sigma\otimes\nabla_\sigma\mathbf e_\mu = \omega_{\mu\hphantom\tau\sigma}^{\hphantom\mu\tau}\mathbf e^\sigma\otimes\mathbf e_\tau = \boldsymbol\omega_\mu^{\hphantom\mu\tau}\otimes\mathbf e_\tau. \]

And:

    \[\begin{split} \nabla\mathbf e^\nu &= e^\sigma\otimes\nabla_\sigma\mathbf e^\nu = \omega^\nu_{\hphantom\nu\tau\sigma}\mathbf e^\sigma\otimes\mathbf e^\tau = -\omega_{\tau\hphantom\nu\sigma}^{\hphantom\tau\nu}\mathbf e^\sigma\otimes\mathbf e^\tau \\ &= \boldsymbol\omega^\nu_{\hphantom\nu\tau}\otimes\mathbf e^\tau = -\boldsymbol\omega_\tau^{\hphantom\tau\nu}\otimes\mathbf e^\tau. \end{split}\]

The antisymmetric part of the covariant derivative is (one half times) the exterior derivative:

    \[\begin{split} d(\mathbf e^\nu) &= \boldsymbol\omega^\nu_{\hphantom\nu\sigma}\wedge\mathbf e^\sigma = \omega^\nu_{\hphantom\nu\sigma\tau}\mathbf e^\tau\wedge\mathbf e^\sigma = -\omega^\nu_{\hphantom\nu\sigma\tau}\mathbf e^\sigma\wedge\mathbf e^\tau \\ &= \omega_{\sigma\hphantom\nu\tau}^{\hphantom\sigma\nu}\mathbf e^\sigma\wedge\mathbf e^\tau. \end{split}\]

This is Cartan’s first structural equation! We have assumed a connection with no torsion.

Cartan’s connection 1-forms

Difficulty:   ★★★★☆   undergraduate / graduate

The connection 1-forms \boldsymbol\omega{_\mu^{\hphantom\mu\nu}} are one way to express a connection \nabla on a manifold. The connection coefficients \Gamma_{\sigma\mu}^{\hphantom{\sigma\mu}\nu} are more familiar and achieve the same purpose, but package the information differently. Connection forms are part of Cartan’s efficient and elegant “moving frames” approach to derivatives and curvature.

[I am only just learning this material, so this article is for my own notes, consolidation of understanding, and checking of conventions. It is a work in progress. There is limited actual derivation in what follows, so don’t be intimidated by the formulae, as they really just introduce notation and a couple of basic properties.]

Write \mathbf e_\mu for a vector basis at each point, and \mathbf e^\nu for its dual basis. For now, we do not assume these frames are orthonormal (in fact, we don’t even need a metric, for now). The connection forms for this basis are: \boldsymbol\omega_\mu^{\hphantom\mu\nu}(\mathbf X) := \langle\nabla_{\mathbf X}\mathbf e_\mu,\mathbf e^\nu\rangle, where \mathbf X is any input vector. (I will sometimes write \langle\cdot,\cdot\rangle for the contraction between a vector and covector, which is not uncommon in the literature. The unified notation with the metric scalar product is convenient, although it is sometimes worth reminding oneself that no metric is needed in this particular case.) To find the components, substitute basis vectors \mathbf X \rightarrow \mathbf e_\sigma:

    \[ \omega_{\mu\hphantom\nu\sigma}^{\hphantom\mu\nu} := \boldsymbol\omega_\mu^{\hphantom\mu\nu}(\mathbf e_\sigma) = \langle\nabla_\sigma\mathbf e_\mu,\mathbf e^\nu\rangle =: \Gamma_{\sigma\mu}^{\hphantom{\sigma\mu}\nu}, \]

where \nabla_\sigma := \nabla_{\mathbf e_\sigma} as usual. Hence with our conventions, the \mu-index specifies which basis vector field is being differentiated, \sigma specifies the direction it is being differentiated in, and \nu specifies the component of the resulting vector. (Lee 2018  Problem 4-14 uses the same convention. MTW  §14.5, Frankel 2012  §9.3b, and Tu 2017  §11.1 would write \boldsymbol\omega^\nu_{\hphantom\nu\mu} for our expression — which swaps the index order.)

We could define separate connection 1-forms \boldsymbol\omega^\nu_{\hphantom\nu\mu} for the dual basis. Note the different index placement. These are:

    \[ \omega\indices^\nu_{\hphantom\nu\mu\sigma} := \boldsymbol\omega^\nu_{\hphantom\nu\mu}(\mathbf e_\sigma) := \langle\nabla_\sigma\mathbf e^\nu,\mathbf e_\mu\rangle = -\langle\nabla_\sigma\mathbf e_\mu,\mathbf e^\nu\rangle = -\omega_{\mu\hphantom\nu\sigma}^{\hphantom\mu\nu}. \]

Hence the two sets of connection forms are related:

    \[ \boldsymbol\omega_\mu^{\hphantom\mu\nu} = -\boldsymbol\omega^\nu_{\hphantom\nu\mu}. \]

Caution: This is not the skew-symmetric relation for an orthonormal basis, which is much more common I think. Here we have not even used a metric, so far. The above uses only the natural duality between vectors and covectors. And it compares two different types of connection forms. The equation used:

    \[ 0 = \nabla_\sigma\langle\mathbf e_\mu,\mathbf e^\nu\rangle = \langle\nabla_\sigma\mathbf e_\mu,\mathbf e^\nu\rangle + \langle\mathbf e_\mu,\nabla_\sigma\mathbf e^\nu\rangle. \]

For the first equality, \langle\mathbf e_\mu,\mathbf e^\nu\rangle = \delta_\mu^{\hphantom\mu\nu} is constant, so its gradient vanishes. The second equality follows from the defining properties of the covariant derivative, i.e. the extension of the connection to covectors and other tensors (e.g. Lee 2018  Prop. 4.15).

[Regarding index placement, and their raising and lowering, I was formerly confused by this issue in the context of vector bases, for a previous blog article. Specifically, to express an arbitrary frame in terms of a coordinate basis, some references write the components as \mathbf e_a^{\hphantom a\mu}. The Latin index is raised and lowered using the metric components in the arbitrary frame, whereas the Greek index uses the metric components in the coordinate frame. However textbooks were not clear on what was definition vs. what was derived, I thought. I eventually concluded the various indices and their placements are best treated as a definition of components, with any formulae for swapping/raising/lowering being obtained from that.]

But let’s now suppose there is a metric, with compatible connection, and an orthonormal basis. Then a common result is: \boldsymbol\omega_\mu^{\hphantom\mu\nu} = -\boldsymbol\omega_\nu^{\hphantom\nu\mu}, in terms of our notation. I did not see how to prove this, so initially I just copied and affirmed it. But I am now updating this 6 months later, and I realise it is only true for a metric of Riemannian signature. Sorry about that.

Instead, considering the gradient of \langle\mathbf e_\mu,\mathbf e_\alpha\rangle leads to \omega_{\mu\alpha\sigma} = -\omega_{\alpha\mu\sigma}. Multiply both sides by g^{\alpha\nu} and sum, which gives:
\omega_{\mu\hphantom\nu\sigma}^{\hphantom\mu\nu} = -\omega_{\alpha\mu\sigma}g^{\alpha\nu}. On the LHS, this raised the second index, which is tensorial or “linear”. But on the RHS, the first index doesn’t obey the tensorial rule. (See the next article I would write on raising and lowering here.) The RHS is equal to: -\omega_{\alpha\hphantom\tau\sigma}^{\hphantom\alpha\tau}g^{\alpha\nu}g_{\tau\mu}. Now apply orthonormality again, so the sum collapses to -\omega_{\nu\hphantom\mu\sigma}^{\hphantom\nu\mu}\eta^{\mu\mu}\eta_{\nu\nu}, hence:

    \[ \boldsymbol\omega_\mu^{\hphantom\mu\nu} = -\eta_{\mu\mu}\eta_{\nu\nu}\,\boldsymbol\omega_\nu^{\hphantom\nu\mu}. \]

This generalises the orthonormal basis case to Lorentzian signature. Similarly for our alternate connection forms: \boldsymbol\omega^\mu_{\hphantom\mu\nu} = -\eta_{\mu\mu}\eta_{\nu\nu}\,\boldsymbol\omega^\nu_{\hphantom\nu\mu}.

It is interesting to relate the connection forms to various covariant derivative expressions, but I’ll spin that off into a separate article. I also recently learned (or clarified) that a metric connection may be interpreted as rotations of a frame, a beautiful geometric insight.

Vector with a lowered component index?

Difficulty:   ★★★☆☆   undergraduate

Normally we recognise a vector as having a raised (component) index: u^\mu, whereas a covector has a lowered index: \omega_\nu. Similarly the dual to \mathbf u, using the metric, is the covector with components denoted u_\nu say; while the dual to \boldsymbol\omega is the vector with components \omega^\mu.

Recall what this notation means. It presupposes a vector basis \mathbf e_\mu say, where in this case the \mu labels entire vectors — the different vectors in the basis — rather than components. Hence we have the decomposition: \mathbf u = u^\mu\mathbf e_\mu. Similarly, the component notation \omega_\nu implies a basis of covectors \mathbf e^\nu, so: \boldsymbol\omega = \omega_\nu\mathbf e^\nu. These bases are taken to be dual to one another (in the sense of bases), meaning: \mathbf e^\nu(\mathbf e_\mu) = \delta^\nu_\mu. We also have \mathbf u^\flat = u_\nu\mathbf e^\nu and \boldsymbol\omega^\sharp = \omega^\mu\mathbf e_\mu, where as usual u_\nu = g_{\mu\nu}u^\mu and \omega^\mu = g^{\mu\nu}\omega_\nu. (The “sharp” and “flat” symbols are just a fancy way to denote the dual. This is called the musical isomorphism.)

However since \mathbf u = u^\mu\mathbf e_\mu, we may instead take the dual of both sides of this expression to get: \mathbf u^\flat = u^\mu(\mathbf e_\mu)^\flat. This gives a different decomposition than in the previous paragraph. Curiously, this expression contains a raised component index, even though it describes a covector. For each index value \mu, the component u^\mu is the same number as usual. But here we have paired them with different basis elements. Similarly \boldsymbol\omega^\sharp = \omega_\nu(\mathbf e^\nu)^\sharp is a different decomposition of the vector \boldsymbol\omega^\sharp. It describes a vector, despite using a lowered component index. Using the metric, the two vector bases are related by: \langle(\mathbf e^\nu)^\sharp,\mathbf e_\mu\rangle = \delta^\nu_\mu.

A good portion of the content here is just reviewing notation. However this article does not seem as accessible as I envisioned. The comparison with covectors is better suited to readers who already know the standard approach. (And I feel a pressure to demonstrate I understand the standard approach before challenging it a little, lest some readers dismiss this exposition.) However for newer students, it would seem better to start afresh by defining a vector basis \mathbf e^\nu to satisfy: \langle\mathbf e^\nu,\mathbf e_\mu\rangle = \delta^\nu_\mu. (While \mathbf e^\nu is identical notation to covectors, there need not be confusion, if no covectors are present anywhere.) This relation is intuitive, as the below diagram shows.

reciprocal bases
Figure: reciprocal bases in \mathbb R^3. In the diagram we index the vectors by colour, rather than the typical 1, 2, and 3. Rather than using a co-basis of covectors as in the standard modern approach, we take both bases to be made up of vectors. Note for example, the red vector on the right is orthogonal to the blue and green vectors on the left. (I made up the directions when plotting, so don’t expect everything to be quantifiably correct.) Arguably “inverse basis” would be the best name. Finally I would prefer to start from this picture, as it is simple and intuitive, rather than start with covectors and musical isomorphisms. Maybe next time I will be brave enough.

I learned this approach (of defining a second, “reciprocal basis” of vectors) from geometric algebra references. However, it is really just a revival of the traditional view. I used to think old physics textbooks which taught this way were unsophisticated, and unaware of the modern machinery (covectors). I no longer think this way. The alternate approach does require a metric, so is less general. However all topics I have worked on personally, in relativity, do have a metric present. But even for contexts with no metric, this approach could still serve as a concrete intuition, to motivate the introduction of covectors, which are more abstract. The alternate approach also challenges the usual distinction offered between contravariant and covariant transformation of (co)vectors and higher-rank tensors. It shows this is not about vectors vs covectors at all, but more generally about the basis chosen. I write about these topics in significant length in my MPhil thesis (2024, forthcoming, §2.3), and intend to write more later.

Euler and spinors

Difficulty:   ★★★☆☆   undergraduate

The discovery of spinors is most often credited to quantum physicists in the 1920s, and to Élie Cartan in the prior decade for an abstract mathematical approach. But it turns out the legendary mathematician Leonhard Euler discovered certain algebraic properties for the usual (2-component, Pauli) spinors, back in the 1700s! He gave a parametrisation for rotations in 3D, using essentially what were later known as Cayley-Klein parameters. There was not even the insight that each set of parameter values forms an interesting object in its own right. But we can recognise one key conceptual aspect of spinors in this accidental discovery: the association with rotations.

In 1771, Euler published a paper on orthogonal transformations, whose title translates to: “An algebraic problem that is notable for some quite extraordinary relations”. Euler scholars index it as “E407”, and the Latin original  is available from the Euler Archive website. I found an English translation  online, which also transcribes the original language.

Euler commences with the aim to find “nine numbers… arranged… in a square” which satisfy certain conditions. In modern notation this is a matrix M say, satisfying M^TM = 1 = MM^T, which describes an orthogonal matrix. While admittedly the paper is mostly abstract algebra, he is also motivated by geometry. In §3 he mentions that the equation for a surface is “transformed” under a change of [Cartesian] coordinates, including the case where the coordinate origins coincide. We recognise this (today, at least) as a rotation, possibly combined with a reflection. Euler also mentions “angles” (§4 and later), which is clearly geometric language.

He goes on to analyse orthogonal transformations in various dimensions. [I was impressed with the description of rotations about n(n – 1)/2 planes, in n dimensions, because I only first learned this in the technical context of higher-dimensional rotating black holes. It is only in 3D that rotations are specified by axis vectors.] Then near the end of the paper, Euler seeks orthogonal matrices containing only rational entries, a “Diophantine” problem. Recall rotation matrices typically contain many trigonometric terms like sin(θ) and cos(θ), which are irrational numbers for most values of the parameter θ. But using some free parameters “p, q, r, s”, Euler presents:

    \[\begin{tabular}{|c|c|c|}\hline $\frac{p^2+q^2+r^2+s^2}{u}$ & $\frac{2(qr+ps)}{u}$ & $\frac{2(qs-pr)}{u}$ \\ \hline $\frac{2(qr-ps)}{u}$ & $\frac{p^2-q^2+r^2-s^2}{u}$ & $\frac{2(pq+rs)}{u}$ \\ \hline $\frac{2(qs+pr)}{u}$ & $\frac{2(rs-pq)}{u}$ & $\frac{p^2-q^2-r^2+s^2}{u}$ \\ \hline\end{tabular},\]

where u := p^2 + q^2 + r^2 + s^2. (I have copied the style which Euler uses in some subsequent examples.) By choosing rational values of the parameters, the matrix entries will also be rational, however this is not our concern here. The matrix has determinant +1, so we know it represents a rotation. It turns out the parameters form the components of a spinor!! (p,q,r,s)/\sqrt u are the real components of a normalised spinor. We allow all real values, but will ignore some trivial cases. One aspect of spinors is clear from inspection: in the matrix the parameters occur only in pairs, hence the sets of values (p,q,r,s)/\sqrt u and -(p,q,r,s)/\sqrt u give rise to the same rotation matrix. (Those familiar with spinors will recall the spin group is the “double cover” of the rotation group.)

The standard approach is to combine the parameters into two complex numbers. But in the geometric algebra (or Clifford algebra) interpretation, a spinor is a rotation of sorts, or we might say a “half-rotation”. It is about the following plane:

    \[q\hat{\mathbf y}\wedge\hat{\mathbf z} + r\hat{\mathbf z}\wedge\hat{\mathbf x} + s\hat{\mathbf x}\wedge\hat{\mathbf y}.\]

(For those who haven’t seen the wedge product nor bivectors, you can visualise \hat{\mathbf x}\wedge\hat{\mathbf y} for example, as the parallelogram or plane spanned by those vectors. It also has a magnitude and handedness/orientation.) The sum is itself a plane, because we are in 3D. Dividing by \sqrt{q^2 + r^2 + s^2} gives a unit bivector. For the spinor, the angle of rotation θ/2 say, is given by (c.f. Doran & Lasenby 2003  §2.7.1):

    \[\cos(\theta/2) = p/\sqrt{u},\qquad \sin(\theta/2) = \sqrt{(q^2+r^2+s^2)/u}.\]

This determines θ/2 to within a range of 2π (if we include also the orientation of the plane). In contrast, the matrix given earlier effects a rotation by θ — twice the angle — about the same plane. This is because geometric algebra formulates rotations using two copies of the spinor. The matrix loses information about the sign of the spinor, and hence also any distinction between one or two full revolutions.

Euler extends the challenge of finding orthogonal matrices with rational entries to 4D. In §34 he parametrises matrices using “eight numbers at will a, b, c, d, p, q, r, s”. However the determinant of this matrix is -1, so it is not a rotation, and the parameters cannot form a spinor. Two of its eigenvalues are -1 and +1. Now the eigenvectors corresponding to distinct eigenvalues are orthogonal (a property most familiar for symmetric matrices, but it holds for orthogonal matrices also). It follows the matrix causes reflection along one axis, fixes an orthogonal axis, and rotates about the remaining plane. So it does not “include every possible solution” (§36). But I guess the parameters might form a subgroup of the Pin(4) group, the double cover of the 4-dimensional orthogonal group O(4).

Euler provides another 4×4 orthogonal matrix satisfying additional properties, in §36. This one has determinant +1, hence represents a rotation. It would appear no eigenvalues are +1 in general, so it may represent an arbitrary rotation. I guess the parameters (a,b,c,d,p,q,r,s)/\sqrt u, where I label by u the quantity (a^2+b^2+c^2+d^2)(p^2+q^2+r^2+s^2) mentioned by Euler, might form spinors of 4D space (not 3+1-dimensional spacetime). If so, these are members of Spin(4), the double cover of the 4D rotation group SO(4).

Euler was certainly unaware of his implicit discovery of spinors. His motive was to represent rotations using rational numbers, asserting these are “most suitable for use” (§11). Probably more significant today is that rotations are described by rational functions of spinor components. But the fact spinors would be rediscovered repeatedly in different applications suggests there is something very natural or Platonic about them. Euler says his 4D “solution deserves the more attention”, and that with a general procedure for this and higher dimensions, “Algebra… would be seen to grow very much.” (§36) He could not have anticipated how deserving of attention spinors are, nor their importance in algebra and elsewhere!

Spin-1/2 and a rotating mirror

Difficulty:   ★★☆☆☆   high school

Imagine a light ray reflecting off a mirror. If the mirror is rotating, the direction of the reflected beam will also rotate, but at twice the rate of the mirror! This follows from the way the angles work, if you recall for example “the angle of incidence equals the angle of reflection” and think about it carefully… Or, just play around with an animation until it looks right 😉 . In quantum physics this angle-doubling property turns up in the description of electrons for example, where it seems very mysterious and exotic (keywords: “spin-1/2” and “spinors”). So its appearance in an “ordinary” and intuitive setting is reassuring.

angles for the mirror and ray
Figure 1: A two-sided mirror and light ray (green arrows). The light arrives from the left, bounces off the mirror in the centre, and exits towards the upper-right. In this example the light beam has angle b = 165°, and the mirror 15°.

For simplicity let’s use a two-dimensional plane, as shown in Figure 1. We measure angles from the positive direction of the x-axis, as usual in polar coordinates. I choose to measure from the centre outwards, so for the incoming ray the angle assigned is the opposite of what the arrow might suggest. Label the incoming ray angle b, and mirror rotation angle m. Now if you increase m by some given amount, the outgoing ray angle increases by twice as much. But if you increase b instead, the outgoing ray angle decreases by the same amount. We also need an “initial condition” of sorts:  when the mirror is horizontal (m = 0°), and the ray arrives from directly above (b = 90°), the reflected beam is also at 90°. It follows:

reflected angle  =  2mb + 180°.

Now if the mirror rotates by 180°, the reflected ray completes a full 360° rotation, so is back to its original position. (We suppose the mirror is 2-sided.) If you hadn’t watched the rotation, you wouldn’t know anything had changed. But now suppose we make one side of the mirror red and the other blue, so the reflected ray takes on the colour of the closest side. Now the ray must make two complete revolutions, 720°, to get back to its original state! After one revolution it is back to the same position, but has a different colour, as the opposite side of the mirror faces the beam. Similarly, if the reflected ray is rotated by 180° in one direction, this is not the same as rotating by 180° in the opposite direction, as the colour is different. “Spinors” have these same features, except in place of red/blue their mathematical description picks up a factor of ±1.

two mirrors rotated by differing amounts
Figure 2: A two-sided coloured mirror. The incoming ray is depicted as grey, but think of it as white so its reflection is free to take on the appropriate colour. In the left image, the blue side of the mirror is facing upwards. Now slowly rotate the mirror by a half-revolution. (Actually I have drawn a slightly different angle just for variety.) The reflected ray changes angle by a full revolution, but has since turned red!

You might try animating this yourself. If you draw the rays with unit length, then the arrow for the incoming beam points from (cos b, sin b) to (0,0). The outgoing arrow points from (0,0) to -(cos(2mb), sin(2mb)), where a minus sign replaces the 180° term from earlier. The colour depends on whether the incoming ray is from 0 to 180° ahead of the mirror, or 0 to 180° behind. This is determined by the sign of sin(mb). It is convenient to allow angle parameters beyond 360°, which makes  no physical difference  at most only a change in colour, as we have learned 😀 . Below is Mathematica code I wrote, which uses slider controls for the angle parameters. The result is fun to play around with, and it helps make the angle-doubling more intuitive.

width = 0.3;
height = 0.02;
mirror = Polygon[{{-width,-height},{width,-height},{width,height},{-width,height}}, VertexColors->{Red,Red,Blue,Blue}];
Manipulate[ Graphics[ {Rotate[mirror,m], {Gray,Arrow[{{Cos[b],Sin[b]},{0,0}}]}, {If[Sin[b-m]>0,Blue,Red],Arrow[{{0,0},-{Cos[2m-b],Sin[2m-b]}}]}}, PlotRange->{{-1,1},{-1,1}} ], {{m,0},-2\[Pi],2\[Pi]}, {{b,3\[Pi]/4},0,2\[Pi]} ]