Spatial gradient examples

Difficulty:   ★★★☆☆   undergraduate general relativity

Last time we discussed the “spatial gradient” or “3-gradient”, and here we follow up with two examples. Recall from before that a scalar field \Phi has gradient d\Phi, and the part of this which is orthogonal to an observer 4-velocity \mathbf u is, as a vector:

    \[^{(3)}(d\Phi)^\sharp := (d\Phi)^\sharp + \langle d\Phi,\mathbf u\rangle \mathbf u.\]

This direction has the greatest increase of \Phi, for any vector in \mathbf u’s 3-space (that is, orthogonal to \mathbf u), per length of the vector.

As an example, suppose the 4-gradient vector (d\Phi)^\sharp is a null, future-pointing vector. It can be decomposed E(\mathbf u+\boldsymbol\xi), where E := -\langle d\Phi,\mathbf u\rangle, and \boldsymbol\xi is a unit spatial vector orthogonal to \mathbf u. Physically, this gradient may be interpreted as a null wave or photon, which the observer determines to have energy (or related quantity, such as frequency) E, and to move in the spatial direction \boldsymbol\xi. The 3-gradient vector is E\boldsymbol\xi, hence the direction of relative velocity also has the steepest increase of \Phi, within the observer’s 3-space.

Suppose now (d\Phi)^\sharp is a unit, timelike, future-pointing vector, so that we may interpret it as the 4-velocity \mathbf v of a second observer. Then ^{(3)}(d\Phi)^\sharp = \mathbf v - \gamma\mathbf u, where \gamma = - \langle\mathbf u,\mathbf v\rangle is the Lorentz factor between the pair. But we also have the “relative velocity” decomposition \mathbf v = \gamma(\mathbf u + \mathbf V), where \mathbf V is the relative velocity of \mathbf v as determined in \mathbf u’s frame, as I discussed previously. Combining these, ^{(3)}(d\Phi)^\sharp = \gamma\mathbf V. Hence within the observer’s 3-space, \Phi again increases most sharply in the direction of the relative velocity.

timelike 1-form
Spacetime diagram, from the observer \mathbf u‘s perspective. The timelike 1-form d\Phi \equiv \mathbf v^\flat is suggested by dotted blue lines, given at intervals of 1/4 for more resolution. These are orthogonal to the vector \mathbf v, in the Lorentzian sense.

The figure shows the single tangent space — think of this as the linearisation of what is happening locally over the manifold itself. The hyperplanes are numbered by \Phi, where only the differences between them are relevant, as an overall constant was not specified. Observe \mathbf v crosses four of them, spanning an interval \Delta\Phi = \langle d\Phi,\mathbf v\rangle = -1, so \Phi is the negative of \mathbf v’s proper time; see a previous post for more background. In both our examples, the scalar decreases towards the future (or can vanish in the null case), even though the gradient vectors are future-pointing. That is, the gradient vectors actually point “down” the slope! This quirk is due to our −+++ metric signature, and would apply to spacelike gradients if +−−− were used instead. This really hurt my brain, until I drew the diagram. 🙁

To construct it, consider the action of d\Phi on the axes. The horizontal axis is the relative velocity direction, with unit vector \hat{\mathbf V} := \mathbf V/\beta. One can show \langle d\Phi,\hat{\mathbf V}\rangle = \beta\gamma. Also \langle d\Phi,\mathbf u\rangle = -\gamma, but I find it easier to think of: \langle d\Phi,-\mathbf u\rangle = \gamma. These give the number of hyperplanes crossed by the unit axes vectors, then you can literally “connect the dots” since the 1-form is linear. In the figure \beta = 1/2, so \gamma \approx 1.15. (As for the 3-gradient, it vanishes in the \mathbf u direction, hence \mathbf u must cross no contours of ^{(3)}d\Phi. It would be drawn as vertical lines, with corresponding vector pointing to the right.)

Most of our discussion applies to arbitrary 1-forms, not just gradients which are termed exact 1-forms. I derived the work here independently, but the literature contains some similar material. It turns out Jantzen, Carini & Bini 1992  §2 explicitly define the “spatial gradient”, as they most appropriately call it. A few textbooks discuss scalar waves, for which the 3-gradient vector is the wave 3-vector, which is orthogonal to the wavefronts within a given frame, as discussed shortly.

Spatial gradient of a scalar

Difficulty:   ★★★☆☆   undergraduate general relativity

Suppose you have a scalar field \Phi, and at a given point in spacetime: a 4-velocity vector interpreted as an “observer”. In which direction does \Phi increase most steeply, when restricted to the observer’s local 3-dimensional space?

Last time I reviewed the gradient 1-form or covector d\Phi, and its associated gradient vector (d\Phi)^\sharp obtained by raising the index as usual. The gradient vector has been described as the direction of greatest increase in \Phi per unit length (Schutz 2009  §3.3). However this is only guaranteed when the metric is positive definite, meaning a Riemannian manifold, rather than a Lorentzian manifold as used to model spacetime.

The observer’s 4-velocity splits vectors and 1-forms into purely “time” parts parallel to \mathbf u, and purely “space” parts orthogonal to it. (Intuitively, it may help to think of a basis \{\mathbf e_\alpha\}adapted to the observer, meaning \mathbf e_0 := \mathbf u, and the \mathbf e_i vectors are orthogonal to \mathbf u, where i = 1,2,3. Then a purely spatial vector is spanned by the \mathbf e_i. Since vectors and covectors are linear, we need only specify their values on a basis set.)

Consider the tangent space at the specified point. Imagine working within the observer’s local 3-space, by which I mean the 3-dimensional subspace consisting of vectors orthogonal to \mathbf u. Label the gradient as restricted to this subspace by ^{(3)}d\Phi. On the subspace the metric has Riemannian signature, hence the corresponding vector ^{(3)}(d\Phi)^\sharp is the direction of steepest increase. We can mimic this mathematically by staying in 4 dimensions, but setting the “time” part to zero:

    \[^{(3)}d\Phi := d\Phi + \langle d\Phi,\mathbf u\rangle \mathbf u^\flat.\]

This is a 4-dimensional object, but I reuse the notation “^{(3)}” to imply it vanishes in the observer’s time direction. This “3-gradient” is the projection of d\Phi orthogonal to \mathbf u^\flat. The angle brackets signify contraction of the 1-form and vector, and the “flat” symbol denotes the 1-form obtained from \mathbf u by “lowering the index” using the metric. The vector 3-gradient is:

    \[^{(3)}(d\Phi)^\sharp = (d\Phi)^\sharp + \langle d\Phi,\mathbf u\rangle \mathbf u.\]

This follows from “raising the index” using the inverse metric g^{\mu\nu} as usual. Note that on the subspace, the inverse metric coincides with the inverse 3-metric which has components (g_{ij})^{-1}, for i,j=1,2,3. Equivalently, one can apply the spatial projector g^{\mu\nu}+u^\mu u^\nu to either d\Phi or ^{(3)}d\Phi, with the same result. This projector agrees with the inverse metric on the 3-space, and is zero on purely timelike covectors. Either way, the essential part of the process is to remove the “time” component of the gradient. I will give examples in the following post.

Gradient of a scalar

Difficulty:   ★★★☆☆   undergraduate general relativity

Suppose a scalar field \Phi is defined on some region of spacetime. Its gradient d\Phi\equiv \nabla\Phi expresses the change in \Phi (that is, its derivative) in each direction. In a coordinate system, it has components:

    \[\nabla_\mu\Phi = (d\Phi)_\mu = \Phi_{,\mu} := \frac{\partial\Phi}{\partial x^\mu}.\]

d\Phi is a 1-form or covector. [Recall a 1-form is just a (0,1)-tensor. Schutz 2009  also uses the term dual vector, though I find this can lead to clumsy wording, such as the hypothetical phrase: “the vector [which is] dual to a dual vector”. Traditionally the term covariant vector has been used, meaning its components transform “covariantly” with a change of basis. 1-forms are a rigorous version of differentials, superceding the older idea of infinitesimals but using similar notation (Schutz 1980  §2.19; Spivak vol. 1  §4).] Above, “d” is called the exterior derivative, and \nabla is the covariant derivative, but when acting on a scalar these coincide. Recall a 1-form accepts a vector and returns a number. In this case, the vector is the direction of differentiation, and the output is the derivative of \Phi in that direction (where the vector’s magnitude matters also).

The 1-form d\Phi may be visualised as a set of hypersurfaces or level sets \Phi = \textrm{const}, on the manifold (MTW  §2.5–2.7, Box 4.4; Schutz 2009 §3.3). Ideally these could be spaced at intervals \Delta\Phi = 1. Given some vector \mathbf Y, the contraction:

    \[d\Phi(\mathbf Y) \equiv \langle d\Phi,\mathbf Y\rangle = Y^\mu\frac{\partial\Phi}{\partial x^\mu}\]

is visualised as the number of hypersurfaces the vector pierces, or “bongs of [a] bell” in MTW’s colourful terminology. Technically however, vectors and 1-forms exist in the (co-)tangent spaces, not extended along the manifold. At any given point, d\Phi is the linear approximation to \Phi, ignoring the constant term (MTW §2.6). Hence d\Phi is more accurately visualised as hyperplanes within the tangent space there. The diagram below shows both artistic choices. Note in two dimensions, hypersurfaces and hyperplanes are just curves and straight lines, respectively.

sphere with dtheta field
A 2-sphere with visualisations of the 1-form field d\theta, both over the entire manifold and within a single tangent space. The spacing is \Delta\theta = \pi/10.

The gradient vector is the dual to d\Phi, with components obtained by raising the index in the usual way: g^{\mu\nu}(d\Phi)_\nu. This may be elegantly written (d\Phi)^\sharp, where the “sharp” symbol is part of the “musical isomorphism” notation. While the gradient is usually first encountered as a vector, it is most naturally a 1-form, as this does not require a metric (MTW §9.4). As Schutz 2009 §3.3 explains:

…we in general cannot call a gradient a vector. We would like to identify the vector gradient as that vector pointing ‘up’ the slope, i.e. in such a way that it crosses the greatest number of contours per unit length. The key phrase is ‘per unit length’. If there is a metric, a measure of distance in the space, then a vector can be associated with a gradient. But the metric must intervene here in order to produce a vector. Geometrically, on its own, the gradient is a one-form.

But if one does not know how to compare the lengths of vectors that point in different directions, one cannot define a direction of steepest ascent…

The last line is from Schutz 1980 §2.19, where the discussion is similar. These textbooks give a superb introductory account of 1-forms, however the steepness comments are only valid for a Riemannian metric, with positive-definite signature. Consider Minkowski spacetime with coordinates (t,x,y,z). By linearity, we need only consider unit vectors. The 1-form dx has components (0,1,0,0), with (dx)^\sharp just \partial_x. These contract to give unity. If we restrict to vectors spanned by \partial_x, \partial_y and \partial_z, Schutz’ steepness comments apply. However \pm(\beta\gamma,\gamma,0,0) is also a unit spacelike vector, where \gamma = (1-\beta^2)^{-1/2}, but combines with dx to give \pm\gamma, hence crosses more intervals x = \textrm{const} than the gradient vector does. Similarly for dt, the contraction with (dt)^\sharp = -\partial_t returns -1, but with the unit timelike vector \pm(\gamma,\beta\gamma,0,0) yields \pm\gamma. Hence for a timelike 1-form, its gradient vector crosses the least contours (taking the absolute value) per unit length, compared to other timelike vectors only. For a null 1-form, its gradient vector lies along the hyperplanes, so crosses zero of them (MTW Figure 2.7)!

Instead, we are left with saying the gradient vector is orthogonal to all vectors \mathbf Y on which the 1-form vanishes: \langle (d\Phi)^\sharp,\mathbf Y\rangle = 0 whenever \langle d\Phi,\mathbf Y\rangle = 0. The angle brackets mean contraction using the metric, with indices appropriately raised or lowered. Another property is the gradient vector’s squared-norm equals the 1-form’s squared-norm, which also matches the number of contours crossed:

    \[\langle (d\Phi)^\sharp,(d\Phi)^\sharp\rangle = \langle d\Phi,d\Phi\rangle = \langle d\Phi,(d\Phi)^\sharp\rangle.\]

The above statements are basically tautologies, but they help clarify what metric duality means. Incidentally, not all 1-forms arise as the “d” of a scalar, but only those termed exact (Wald  §B1). Most of this post applies also to arbitrary 1-forms \boldsymbol\alpha, for which the hyperplanes are spanned by vectors satisfying \langle\boldsymbol\alpha,\mathbf Y\rangle = 0. For many creative illustrations see MTW, including their “honeycomb” and “egg crate” analogies for 2-forms and 3-forms, and their Figure 4.5 for the 2-form d\theta\wedge d\phi. Finally, I previously reviewed contractions like d\Phi(\mathbf u) = d\Phi/d\tau, which give the rate of change of the scalar by proper time along a worldline.

Affine connection for spherical symmetry

Difficulty:   ★★★★☆   graduate

Suppose you have a spherically symmetric vector field, as in the diagram. Can we find an affine connection which transports the vectors into one-another? That is, a geometry in which they are all “parallel”?

Portion of a sphere, with vectors orthogonal to its surface
The vectors (red arrows) are clearly not parallel in the usual sense. But can we define a new connection in which they are transported into one-another?

Take Schwarzschild spacetime, in the usual coordinates (t,r,\theta,\phi). The coordinate basis vectors are \partial_t, \partial_r, \partial_\theta, and \partial_\phi. I will write these as \mathbf e_\mu, so for \mu = 1 for example, this is the vector \mathbf e_r with components (e_r)^\nu = (0,1,0,0). Recall a connection \nabla is defined by:

    \[\nabla_{\mathbf e_\mu}\mathbf e_\nu = \Gamma^\alpha_{\mu\nu}\mathbf e_\alpha,\]

where the \Gamma are the connection coefficients, also called Christoffel symbols in the specific case of the Levi-Civita connection. (Recall the Levi-Civita connection is the one inherited from the metric: it is the unique symmetric and metric-compatible connection.) For each pair (\mu,\nu), this definition is interpreted as the derivative of the \mathbf e_\nu field, in the direction \mathbf e_\mu.

Now consider an arbitrary vector field of the form:

    \[u^\mu = (A(t,r),B(t,r),0,0).\]

We would not expect the sought-for parallel transport to work for vectors with components in the \theta or \phi-directions — at least, not without imposing extra choices. In particular, the “hairy ball theorem” states no smooth, non-vanishing vector field along the 2-sphere exists: that is, within its 2-dimensional tangent bundle. For Schwarzschild spacetime, we move around a 2-sphere of constant t and r, by taking “directional derivatives” along the \theta\phi-plane. As expected, \nabla\mathbf u does not vanish, even in these directions:

    \[\nabla_{C\partial_\theta + D\partial_\phi}\mathbf u = \Big(0,0,\frac{C B(t,r)}{r},\frac{D B(t,r)}{r}\Big).\]

The offending Christoffel symbols turn out to be \Gamma_{\theta r}^\theta = 1/r and \Gamma_{\phi r}^\phi = 1/r. These arise from \nabla_{\partial_\theta}\partial_r = r^{-1}\partial_\theta and \nabla_{\partial_\phi}\partial_r = r^{-1}\partial_\phi. These quantify how the radial coordinate vector changes as you move around on a sphere.

One option is to simply define new connection coefficients for which these vanish: \tilde\Gamma_{\theta r}^\theta := 0 and \tilde\Gamma_{\phi r}^\phi := 0, and keep the remaining Christoffel symbols, in order to remain as close as possible to the metric connection. This procedure is justified, because given a frame field, any choice of smooth functions \tilde\Gamma^\alpha_{\mu\nu} yields a valid connection (Lee 2018 , Introduction to Riemannian manifolds, Lemma 4.10). We can also write this new connection as the usual (Levi-Civita) covariant derivative plus a bilinear correction:

    \[\tilde\nabla_{\mathbf v}\mathbf u := \nabla_{\mathbf v}\mathbf u - \frac{1}{r} \big(\partial_\theta\otimes d\theta\otimes dr + \partial_\phi\otimes d\phi\otimes dr)(\mathbf v,\mathbf u).\]

The parenthetical term is a (1,2)-tensor we interpret as accepting the vectors in the last two slots (\mathbf v in the second slot, and \mathbf u into the last), returning another vector. The correction term may also be written -\frac{1}{r}\langle dr,\mathbf u\rangle \big(\langle d\theta,\mathbf v\rangle\partial_\theta + \langle d\phi,\mathbf v\rangle\partial_\phi\big), where the angle brackets mean contraction of a 1-form and vector in this case. Intuitively, the parenthetical term just above is also a projection, returning only the angular part of the differentiation direction \mathbf v. This is the blue arrow in the original diagram. For large r, the basis vectors \partial_\theta and \partial_\phi grow very large, but the red \mathbf u vectors must adjust only by the angle rotated through, hence the 1/r multiplier. \langle dr,\mathbf u\rangle returns the radial component u^r.

As a check, \tilde\nabla_{C\partial_\theta+D\partial_\phi}\mathbf u = 0 as required. The new connection is not symmetric, because \tilde\Gamma_{r\theta}^\theta and \tilde\Gamma_{r\phi}^\phi remain non-vanishing. Hence the connection has “torsion”. I won’t write out its Riemann and Ricci tensors, but the scalar curvature is 2/r^2! At face value this violates the Einstein field equations, for which the Ricci tensor (and hence the scalar curvature) always vanish in a vacuum, however Einstein’s equations use the Levi-Civita connection. Curiously, the value is precisely the scalar curvature for a 2-sphere.

We can also construct a symmetric connection \bar\nabla for which additionally \bar\Gamma_{r\theta}^\theta := 0 =: \bar\Gamma_{r\phi}^\phi. In the (somewhat) index-free expression:

    \[\bar\nabla_{\mathbf v}\mathbf u := \nabla_{\mathbf v}\mathbf u - \frac{2}{r} \big(\partial_\theta\otimes dr\,d\theta + \partial_\phi\otimes dr\,d\phi)(\mathbf v,\mathbf u),\]

where 2dr\,d\theta := dr\otimes d\theta + d\theta\otimes dr is the symmetric product, and analogously for dr\,d\phi. This connection has Ricci tensor equal to the metric in the t and r components, apart from a scalar factor 2M/r^3, and vanishing elsewhere. Its scalar curvature is 4M/r^3.

Hence we have constructed connections which parallel transport our spherically symmetric vector field around a sphere, and deviate as little as possible from the Levi-Civita connection. Neither of the new connections are “metric-compatible”, for instance 0 = \tilde\nabla_{\partial_\theta}\langle\partial_r,\partial_\theta\rangle \ne \langle\tilde\nabla_{\partial_\theta}\partial_r,\partial_\theta\rangle + \langle\partial_r,\tilde\nabla_{\partial_\theta}\partial_\theta\rangle = -r. Hence \tilde\nabla\mathbf g \ne 0. The same holds for \bar\nabla.

If you find some formulae here do not work for you, compare your convention for the connection coefficient index order, or try swapping \mathbf u and \mathbf v in the correction terms. I had problems myself, so undertook a painstaking review of my own conventions, and wrote a new page describing them. Finally, beware of coordinate basis vectors! The “vectors” \partial_\theta and \partial_\phi actually depend on all four coordinates, which is related to the so-called “second fundamental confusion of calculus”! In case of ambiguity, perhaps some should be replaced with (d\theta)^\sharp and (d\phi)^\sharp, or scalar multiples thereof. I avoided this technicality in the interests of readability. This concern only applies to coordinate systems in which the metric is non-diagonal.

Multiple images of the same supernova

Difficulty:   ★☆☆☆☆   no science: just travel, diary or musings

One of my favourite astrophysical phenomena was the appearance of the same supernova at various different times and places in the sky. The light from “Supernova Refsdal” was bent by a cluster of galaxies, whose gravity acted like a magnifying glass. This “gravitational lensing” meant that multiple images appeared.

Imagine rays of light heading out in all directions from the exploding star. Normally we would expect only one of those rays or directions to intercept a given telescope, hence the star would appear at a single location. However according to general relativity, heavy masses curve spacetime. As the rays pass through the complicated gravitational field of the galaxy cluster (which incidentally is much closer to us than the supernova), their paths are deflected.

The supernova was first observed in late 2014, as four separate images surrounding a single galaxy. They formed a very rare “Einstein cross” pattern. The galaxy had deflected the passing light. Amazingly, astrophysicists predicted another image of the same supernova would appear about a year later, in a precise location. This is because they had already mapped out the mass distribution in the cluster of galaxies, and already observed multiple copies (up to 7) of various objects. This gave the possible paths in space and time. Indeed, in late 2015 the Hubble telescope detected the supernova as predicted. Researchers also predict (rather “postdict”) the supernova would have appeared decades earlier in a different spot.

Note the video embedded above uses a little artistic license for the supernova. Another video tries to illustrate the various paths that reached us. The supernova was also featured in NASA’s Astronomy Picture of the Day , which has an excellent description.

Coronavirus and online seminars

Difficulty:   ☆☆☆☆☆   

It has been a great year for online research talks. I have listened to mathematical relativity talks from places like Poland, Vienna, and Tübingen, including by top experts, while at home in Brisbane, Australia. This is thanks to centralised coordinating by Piotr Chruściel and others. I have “been to” relativity conferences in Taiwan and Belarus, and a summer school in Vienna. One talk I randomly discovered was particularly unique, by 79 year old Yvette Kosmann-Schwarzbach on the history of the Noether theorems. (One takeaway message: many authors claimed generalisations but hadn’t read the original papers; only since the 1970s have legitimate generalisations appeared.)

University of Queensland, empty campus
The unusually empty central Great Court at the University of Queensland, on Tuesday 14th April 2020

At my university, lectures switched to online in March. Many research groups followed suit, soon afterwards. The usual stream of PhD progress talks has continued. For physics, exams have been online, with no supervision: you download the exam questions PDF when it becomes available, then upload your answers a few hours later. Overall, the research world seemed to adapt quickly. After all, people are used to working on their computers, and even video conferencing. The photo shows how empty my campus was in April, but many people have returned now.

Of course covid19 has been difficult (or devastating) for many, I don’t deny this, but am focusing on one positive outcome here. It is a rare opportunity to hear niche research talks without flying around the world and filling up the atmosphere with CO2. I hope that both this availability, and the environmental friendliness, continue.

Roger Penrose wins Nobel Prize

Roger Penrose has been awarded (half) the 2020 Nobel Prize for Physics, for the discovery that black hole formation is a robust prediction of the general theory of relativity”. Penrose is a remarkable figure, known for his technical brilliance, communication of science to the general public, and having unique views. His main expertise is general relativity:

More than any other individual, it was Roger Penrose who originated the concepts, insights and techniques that have shaped Einstein’s general relativity as we understand and practise it today.

That is from a “biographical sketch” by Werner Israel (which appears as the last 2 pages of research paper ). It accompanies a republication of Penrose’ classic 1969 review on gravitational collapse. At that time, consensus had been building that black holes really are a thing, see for example §7.9 of   for history. I will write more on this another time.

Penrose is artistic. The review cited above contains a full-page drawing of people on rigid platforms, lowering ropes toward an event horizon. It makes a fun background slide during a talk! He also drew some optical illusion “impossible figures”, and corresponded with artist M. C. Escher. He worked on tiling problems, accessible to anyone, yet important mathematically and with physical applications too (quasicrystals). Penrose’ “conformal diagrams” depict spacetime in a way which clearly illustrates its overall structure. His graphical notation for tensors has spread into other fields including quantum computing.

Penrose is a mathematician, so it is natural to wonder how many other mathematicians have won the physics Nobel Prize. Max Born is one example apparently, though he was also a physicist. On the other hand, string theorist Ed Witten is said to be the only physicist to win a Fields Medal, the preeminent prize in mathematics. Penrose has broad interests, and seems to know a lot of physics, based on the topics in his 1100 page The Road to Reality (2007).

I do hear critique of Penrose from quantum physicists, specifically about his model with Lajos Diósi. Indeed one recent paper, coauthored by Diósi curiously enough, says the theory is largely ruled out experimentally. I will trust the consensus of specialists over the lone genius outside their main field. A different claim by Penrose and collaborators concerns circles in the cosmic microwave background, as evidence for his “conformal cyclic cosmology” model. But astrophysicists are skeptical (I heard John Barrow asked about this at one conference). I’ll lean towards Penrose when it comes to a theoretical general relativistic cosmology, but not for statistical analysis of observational data from our universe. Penrose is also criticised for his views on consciousness. Israel is euphemistic: “His views on gravitational interactions as a trigger for quantum state reduction, and on the non-algorithmic character of human intelligence have generated much discussion.”

Yet, with the above acknowledged, we should focus on Penrose’ strengths and main areas of expertise. Israel calls him a “wholly original non-conformist”, which I would not have picked from his demeanour, but helps explain the combination of his towering strengths along with more speculative ideas. One of the many things I have omitted is Penrose’ twistor theory. Personally, I am still trying to understand spinors, a prior concept, but the fact an individual can come up with their own quantum gravity theory which is admired by their peers is a huge achievement. I look forward to reading part of The Road to Reality sometime.

Notes on Bricmont 2016, “Making sense of quantum mechanics”

The book Making sense of quantum mechanics (2016) overviews de Broglie-Bohmian mechanics, and examines its broader implications for quantum mechanics as a whole. I found it a gripping read. The author, philosopher-physicist Jean Bricmont, makes clear and mostly-convincing arguments, refuting many misconceptions about the meaning of quantum mechanics. I was led to the book by the online Stanford Encyclopedia of Philosophy, which recommended it as “a very good discussion”.

The de Broglie-Bohm (dBB) theory uses the wavefunction determined by Schrödinger’s equation, as in ordinary quantum mechanics (QM). It also assumes particles have definite positions and velocities at all times. These trajectories follow the probability current determined from the wavefunction. This contrasts with the standard interpretation of QM, where particles have no definite properties until a measurement or observation is made (except for eigenstates). Both interpretations make identical predictions about the outcomes of experiments, hence there is no experimental test that can distinguish between them. However, the conceptual implications are very important.

In §7, Bricmont sets up a populist history of QM, where Einstein and Schrödinger are dismissed as out of touch, and von Neumann and John Bell prove `hidden variables’ theories cannot exist. Then a cheeky dismissal: “all of the above is historically wrong.” Rather, Einstein was more concerned with non-locality than indeterminism. The view that QM is not complete should not be dismissed, but is a respectable position. Also von Neumann’s proof is overstated, and the community didn’t check it (see Pinch 1977  for the “sorry history”). Yet Bell “saw the impossible done” in the dBB model, and became its strongest proponent. Hence he clearly didn’t think it contradicted his theorem: Bell’s theorem doesn’t rule out hidden variables, only local hidden variables, or something weirder (?).

The Heisenberg uncertainty principle only concerns measurement outcomes, hence does not conflict with dBB (§5.1.8). Bricmont points out “all measurements can in the end be reduced to position measurements.” For example, a Stern-Gerlach device measures spin by whether particles move up or down. Similarly, momentum can be measured by comparing the position at two different times (§5.1.4). Yet even in dBB, many properties including spin cannot have hidden variables (§5.3.4).

Challenges for the theory include uniqueness, locality, and relativity. Concerning uniqueness, there are stochastic theories with random trajectories, and also an infinite number of theories with deterministic trajectories. While all concur with experiment, dBB is claimed as the most natural (§5.4.1). Concerning locality, Bricmont states that since Bell showed “the world is nonlocal, then the nonlocality of the de Broglie–Bohm theory is a quality, not a defect.” (§7.8) “Moreover, the nonlocality is of the right type… to reproduce Bell’s results, but not more, where `more’ might be a nonlocal theory allowing the transmission of messages.” (§5.2.1) But non-locality is a problem for relativity. The “nonlocal causal connections proven by Bell” occur instantaneously in QM, but in relativity simultaneity is relative, so which instant should be used? However this is a problem for quantum physics generally, not just for dBB: (§5.2.2)

…the problem of a genuine Lorentz invariance… in the face of EPR–Bell experiments is probably the biggest problem that theoretical physics faces today…

It is “the deepest unrecognized problem”, at least (§5.4.1; c.f. §8.4). One attempt at a solution is to introduce a preferred foliation (§5.2.2). [I have an idea on this, but it is early days…] At least `delayed choice’ experiments are not an issue, because in dBB “there is no sense in which our present choices affect the past.” (§5.1.4) There do exist Bohmian quantum field theories, though uniqueness is a challenge (§5.2.2).

Bricmont provocatively claims dBB “is a theory, while ordinary quantum mechanics is not” (§5.1.9); it is “not a physical theory” since it only predicts measurement outcomes (§5.3.5). Apparently, many philosophers require a scientific theory to be explanatory as well as descriptive. This includes realists (§3.3). Hence Bricmont calls dBB “the missing theory behind the quantum algorithm.” (§5.3.5) The Copenhagen interpretation, which emphasises the outcome of experiments, is influenced by positivism. However a strong “version of `logical positivism’… is almost universally rejected by philosophers of science nowadays (in part, because of the imprecision of the word `observable’)…” (§7.8; c.f. §8.4). This was news to me. Personally, I put much effort into conceptual understanding, so it is affirming to learn that trends are at least not opposed to this. I appreciate that dBB offers an underlying mechanism behind QM. But this does not imply it is the only deep explanation of the principles of QM, most obviously if `reality’ does not in fact work this way.  🙂

The mere existence of dBB refutes some popular claims about QM, says Bricmont: that QM ends determinism, that observers are special, and that QM can’t be understood. Yet “its main virtue is to clarify our ideas.” (§5.4.2) Bell wrote:

Should it not be taught, not as the only way, but as an antidote to the prevailing complacency? To show us that vagueness, subjectivity, and indeterminism, are not forced on us by experimental facts, but by deliberate theoretical choice?

Personally, my biggest motivation for learning de Broglie-Bohm theory was its definite velocities. In relativity, physical measurements depend on the observer. An observer’s velocity determines how to split spacetime, along with any tensors on it, into separate space and time parts. However the quantum hydrodynamics formulation also involves velocities, and is closer to the mainstream interpretation of QM than dBB (§5.4.1 cites some references). I also wonder how gravity might couple to each approach. dBB naturally suggests a particle’s exact location might gravitate, whereas the hydrodynamics view might suggest the entire wavefunction gravitates. Then, the theories would predict different experimental outcomes after all. Either way, quantum mechanics is now feeling less mysterious and more accessible, so Bell and Bricmont would be pleased.

Quantum and relativity are not completely dissimilar

The historical connections between relativity and quantum mechanics are stronger than I had realised.

Most physicists have heard of Louis de Broglie, the Nobel Prize winner who was an inspiration for the celebrated Schrödinger equation, the basic equation in quantum mechanics. However he achieved more than just the de Broglie wavelength (which one lecturer described as “the distance at which the wavelike nature of particles becomes apparent”). By requiring consistency with relativity (in technical terms, Lorentz covariance), de Broglie waves with a group velocity v must have a phase velocity c^2/v, where c is the speed of light. Hence de Broglie waves, which are a precursor of Schrödinger’s wavefunction, are solidly grounded in relativity. Tsamparlis 2019  §18.11.2 motivates them clearly.

Another point, which is well known, is that Schrödinger first tried a relativistic wave equation: the Klein-Gordon equation. Not having success, he settled for the Schrödinger equation, which is a non-relativistic limit (slow speeds). Today, the Klein-Gordon equation is viewed as an accurate description of some particles, but subject to strong limitations. Greiner 1990  §1 is an oft-cited textbook here. [Edit: I am referring to the single-particle interpretation.] (Personally, I wonder if some of those limitations can be pushed back…)

Another connection is that Einstein made greater contributions to quantum mechanics than most people had realised. It is relativity that Einstein is justly famous for, however he should also be credited as being one of the founders of quantum physics, apparently. (I must read the popular history book Einstein and the quantum .) OK, so his Nobel prize was actually for a quantum effect. But the typical view has been that Einstein didn’t really understand quantum physics, and that he was out of touch: clinging to an obsolete view of reality. But now some are revising this view, claiming Einstein’s concerns about non-locality, determinism, and whether quantum mechanics is complete or needs additions, are respectable intellectual positions. Irrespective of whether one agrees with him, Einstein had a unique and insightful perspective.

Historically, special relativity was published in 1905, whereas quantum mechanics was developed in the 1920s, so it is not unexpected the former influenced the latter. (Today, the influence should be a two-way street, of course.) The historical connections mentioned above suggest the two theories are more similar than I had realised, or at least, less dissimilar. This gives me increased hope that quantum physics and general relativity can be more fully reconciled, which has been the physics dream for a century now.

Relative velocity in general relativity

Suppose we have two 4-velocity vectors \mathbf u and \mathbf v at the same point in curved spacetime. (This avoids complications such as parallel transport. Physically, think of the two objects as not necessarily overlapping, but close enough that we can neglect curvature etc.) We can calculate the relative velocity each determines of the other, which is not \mathbf u-\mathbf v.

Consider firstly inertial frames in Minkowski spacetime. Using coordinates (t,x,y,z) corresponding to some observer \mathbf v, the components of a different observer \mathbf u satisfy:

    \[u^\mu = \frac{dx^\mu}{d\tau} = \frac{dt}{d\tau}\frac{dx^\mu}{dt} = \gamma(1,\beta_x,\beta_y,\beta_z).\]

Here \tau is \mathbf u‘s proper time, \gamma := -\mathbf u\cdot\mathbf v := -g_{\mu\nu}u^\mu v^\nu is the Lorentz factor as I have discussed previously, and the \beta_i are the relative speeds in the coordinate directions. This calculation is inspired by Tsamparlis 2019  §6.2.

With a view to generalisation, we re-express the displayed formula above using vectors in place of coordinate components: \mathbf u = \gamma(\mathbf v+\mathbf u_\textrm{rel}). This is more elegant, and explicitly tensorial. The reader may find better notation than \mathbf u_\textrm{rel}, but this is the relative velocity of \mathbf u from \mathbf v‘s frame. Rearranging,

    \[\boxed{\mathbf u_\textrm{rel} = \gamma^{-1}\mathbf u - \mathbf v.}\]

This vector lies in the local 3-space of \mathbf v, since \mathbf v\cdot\mathbf u_\textrm{rel} = 0, so in particular \mathbf u_\textrm{rel} is spatial. It has length \beta, which is the overall relative speed, and satisfies \gamma = (1-\beta^2)^{-1/2}. If you want, for \beta\ne 0 there is also a decomposition \mathbf u_\textrm{rel} = \beta\hat{\mathbf n}, where \hat{\mathbf n} is a unit vector. Conversely, the relative velocity of \mathbf v with respect to \mathbf u is \mathbf v_\textrm{rel} = \gamma^{-1}\mathbf v - \mathbf u. This also has length \beta, but lies in \mathbf u‘s 3-space. Why is \mathbf u_\textrm{rel} \ne -\mathbf v_\textrm{rel} unlike in Newtonian physics, aside from the trivial case \mathbf u = \mathbf v? They are in different frames (Tsamparlis §6.4). But with the appropriate Lorentz boost map, such an identity is recovered (Jantzen+ 1992  §4).

All the vector formulae above transfer unchanged to curved spacetime, for 4-velocities at the same event, including worldlines with acceleration. This can be justified using local inertial coordinates. While the formulae do appear in the literature (Jantzen+ 1992; Bini 2014  §6, etc), the topic of observer measurements in general is not widely promoted. I recall two separate conversations with senior relativists who were unfamiliar with use of the Lorentz factor in a curved spacetime context.

In contrast, one quantity which should not be naively ported across from special relativity is acceleration. In curved spacetime, the 4-acceleration \nabla_{\mathbf u}\mathbf u of a particle requires the covariant derivative, which depends on curvature. Relative acceleration between observers is more complicated, as it depends on one’s choice of affine connection, for which there are various natural options. For instance, Fermi-Walker transport, or co-rotation with an observer’s frame, see e.g. Jantzen, Carini & Bini (Jantzen+ 1992; Jantzen+ 1995 ; Jantzen+ 2013  draft).

[Finally, the expression dt/d\tau = \gamma given near the start bothered me at first, because time-dilation is mutual, so one might equally argue a case for \gamma^{-1}. But the key point is, the derivative occurs along the direction of \mathbf u, not \mathbf v. Another way to check the expression is to write dt/d\tau = dt(\mathbf u) = -\mathbf v^\flat(\mathbf u) = \gamma. This is a contraction of the 1-form dt with the vector \mathbf u, as I explained previously. The “flat” symbol just means dt is the 1-form dual to -\mathbf v. Conversely, along \mathbf v we have d\tau/dt = -\mathbf u^\flat(\mathbf v) = \gamma, and this is not a contradiction!]