Reflections On Relativity

  • Uploaded by: José Del Solar
  • 0
  • 0
  • February 2021
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Reflections On Relativity as PDF for free.

More details

  • Words: 265,656
  • Pages: 660
Loading documents preview...
Reflections on Relativity Preface 1. First Principles 1.1 Experience and Spacetime 1.2 Systems of Reference 1.3 Inertia and Relativity 1.4 The Relativity of Light 1.5 Corresponding States 1.6 A More Practical Arrangement 1.7 Staircase Wit 1.8 Another Symmetry 1.9 Null Coordinates

1 3 9 16 23 33 44 58 65 72

2. A Complex of Phenomena 2.1 The Spacetime Interval 2.2 Force Laws and Maxwell's Equations 2.3 The Inertia of Energy 2.4 Doppler Shift for Sound and Light 2.5 Stellar Aberration 2.6 Mobius Transformations of the Night Sky 2.7 The Sagnac Effect 2.8 Refraction Between Moving Media 2.9 Accelerated Travels 2.10 The Starry Messenger 2.11 Thomas Precession

81 88 98 111 118 131 140 151 159 173 183

3. Several Valuable Suggestions 3.1 Postulates and Principles 3.2 Natural and Violent Motions 3.3 De Mora Luminis 3.4 Stationary Paths 3.5 A Quintessence of So Subtle a Nature 3.6 The End of My Latin 3.7 Zeno and the Paradox of Motion 3.8 A Very Beautiful Day 3.9 Constructing the Principles

192 202 209 218 224 230 238 245 251

4. Weighty Arguments 4.1 Immovable Spacetime 4.2 Inertial and Gravitational Separations 4.3 Free-Fall Equations 4.4 Force, Curvature, and Uncertainty 4.5 Conventional Wisdom 4.6 The Field of All Fields 4.7 The Inertia of Twins

256 265 270 274 280 292 297

4.8

The Breakdown of Simultaneity

301

5. Extending the Principle 5.1 Vis Inertiae 5.2 Tensors, Contravariant and Covariant 5.3 Curvature, Intrinsic and Extrinsic 5.4 Relatively Straight 5.5 Schwarzschild Metric from Kepler's 3rd Law 5.6 The Equivalence Principle 5.7 Riemannian Geometry 5.8 The Field Equations

308 313 323 340 349 354 359 370

6. Ist Das Wirklich So? 6.1 An Exact Solution 6.2 Anomalous Precession 6.3 Bending Light 6.4 Radial Paths in a Spherically Symmetrical Field 6.5 Intersecting Orbits 6.6 Ideal Clocks in Arbitrary Motion 6.7 Acceleration in Schwarzschild Coordinates 6.8 Sources in Motion

382 389 400 408 413 419 425 431

7. Cosmology 7.1 Is the Universe Closed? 7.2 The Formation and Growth of Black Holes 7.3 Falling Into and Hovering Near A Black Hole 7.4 Curled-Up Dimensions 7.5 Packing Universes In Spacetime 7.6 Cosmological Coherence 7.7 Boundaries and Symmetries 7.8 Global Interpretations of Local Experience

438 449 458 468 472 478 486 493

8. The Secret Confidence of Nature 8.1 Kepler, Napier, and the Third Law 8.2 Newton's Cosmological Queries 8.3 The Helen of Geometers 8.4 Refractions On Relativity 8.5 Scholium 8.6 On Gauss' Mountains 8.7 Strange Meeting 8.8 Who Invented Relativity? 8.9 Paths Not Taken

505 510 518 521 531 536 541 548 559

9. The Relativistic Topology 9.1 In The Neighborhood 9.2 Up To Diffeomorphism 9.3 Higher-Order Metrics 9.4 Spin and Polarization 9.5 Entangled Events 9.6 Von Neumann's Postulate and Bell's Freedom 9.7 The Gestalt of Determinism 9.8 Quaedam Tertia Natura Abscondita 9.9 Locality and Temporal Asymmetry 9.10 Spacetime Mediation of Quantum Interactions

567 576 579 584 587 593 599 602 606 611

Conclusion

618

Appendix:

Mathematical Miscellany

Bibliography

620 633

1.1 From Experience to Spacetime I might revel in the world of intelligibility which still remains to me, but although I have an idea of this world, yet I have not the least knowledge of it, nor can I ever attain to such knowledge with all the efforts of my natural faculty of reason. It is only a something that remains when I have eliminated everything belonging to the senses… but this something I know no further… There must here be a total absence of motive - unless this idea of an intelligible world is itself the motive… but to make this intelligible is precisely the problem that we cannot solve. Immanuel Kant We ordinarily take for granted the existence through time of objects moving according to fixed laws in three-dimensional space, but this is a highly abstract model of the objective world, far removed from the raw sense impressions that comprise our actual experience. This model may be consistent with our sense impressions, but it certainly is not uniquely determined by them. For example, Ptolemy and Copernicus constructed two very different conceptual models of the heavens based on essentially the same set of raw sense impressions. Likewise Weber and Maxwell synthesized two very different conceptual models of electromagnetism to account for a single set of observed phenomena. The fact that our raw sense impressions and experiences are (at least nominally) compatible with widely differing concepts of the world has led some philosophers to suggest that we should dispense with the idea of an "objective world" altogether, and base our physical theories on nothing but direct sense impressions, all else being merely the products of our imaginations. Berkeley expressed the positivist identification of sense impressions with objective existence by the famous phrase "esse est percipi" (to be is to be perceived). However, all attempts to base physical theories on nothing but raw sense impressions, avoiding arbitrary conceptual elements, invariably founder at the very start, because we have no sure means of distinguishing sense impressions from our thoughts and ideas. In fact, even the decision to make such a distinction represents a significant conceptual choice, one that is not strictly necessary on the basis of experience. The process by which we, as individuals, learn to recognize sense impressions induced by an external world, and to distinguish them from our own internal thoughts and ideas, is highly complicated, and perhaps ultimately inexplicable. As Einstein put it (paraphrasing Kant) “the eternal mystery of the world is its comprehensibility”. Nevertheless, in order to examine the epistemological foundations of any physical theory, we must give some consideration to how the elements of the theory are actually derived from our raw sense impressions, without automatically interpreting them in conventional terms. On the other hand, if we suppress every pre-conceived notion, including ordinary rules of reasoning,

we can hardly hope to make any progress. We must choose a level of abstraction deep enough to give a meaningful perspective, but not so deep that it can never be connected to conventional ideas. As an example of a moderately abstract model of experience, we might represent an idealized observer as a linearly ordered sequence of states, each of which is a function of the preceding states and of a set of raw sense impressions from external sources. This already entails two profound choices. First, it is a purely passive model, in the sense that it does not invoke volition or free will. As a result, all conditional statements in this model must be interpreted only as correlations (as discussed more fully in section 3.2), because without freedom it is meaningless to talk about the different consequences of alternate hypothetical actions. Second, by stipulating that the states are functions of the preceding but not the subsequent states we introduce an inherent directional asymmetry to experience, even though the justification for this is far from clear. Still another choice must be made as to whether the sequence of states and experiences is continuous or discrete. In either case we can parameterize the sequence by a variable λ, and for the sake of definiteness we might represent each state S(λ) and the corresponding sense impressions E(λ) by strings of binary bits. Now, because of the mysterious comprehensibility of the world, it may happen that some functions of S are correlated with some functions of E. (Since this is a passive model by assumption, we cannot assert anything more than statistical correlations, because we do not have the freedom to arbitrarily vary S and determine the resulting E, but in principle we could still passively encounter enough variety of states and experiences to infer the most prominent correlations.) These most primitive correlations are presumably “hard-wired” into higherlevel categories of senses and concepts (i.e., state variables), rather than being sorted out cognitively. In terms of these higher-level variables we might find that over some range of λ the sense impressions E(λ) are strictly correlated with three functions θ, ϕ, α of the state S(λ), which change only incrementally from one state to the next. Also, we may find that E is only incrementally different for incremental differences in θ, ϕ, α (independent of the prior values of those functions), and that this is the smallest and simplest set of functions with this property. Finally, suppose the sense impressions corresponding to a given set of values of the state functions are identical if the values of those functions are increased or decreased by some constant. This describes roughly how an abstract observer might infer an orientation space along with the associated modes of interaction. In conventional terms, the observer infers the existence of external objects which induce a particular set of sense impressions depending on the observer’s orientation. (Of course, this interpretation is necessarily conjectural; there may be other, perhaps more complex, interpretations that correspond as well or better with the observer’s actual sequence of experiences.) At some point the observer may begin to perceive deviations from the simple three-variable orientation model, and find it necessary to adopt a more complicated conceptual model in order to accommodate the sequence of sense impressions. It remains true that the simple orientation model applies over sufficiently small ranges of states, but the sense impressions corresponding to each orientation may vary as a function of three additional

state variables, which in conventional terms represent the spatial position of the observer. Like the orientation variables, these translation variables, which we might label x, y, and z, change only incrementally from one state to the next, but unlike the orientation variables there is no apparent periodicity. Note that the success of this process of induction relies on a stratification of experiences, allowing the orientation effects to be discerned first, more or less independent of the translation effects. Then, once the orientation model has been established, the relatively small deviations from it (over small ranges of the state variable) could be interpreted as the effects of translatory motion. If not for this stratification (either in magnitude or in some other attribute), it might never be possible to infer the distinct sources of variation in our sense impressions. (On a more subtle level, the detailed metrical aspects of these translation variables will also be found to differ from those of the orientation variables, but only after quantitative units of measure and coordinates have been established.) Another stage in the development of our hypothetical observer might be prompted by the detection of still more complicated variations in the experiential attributes of successive states. The observer may notice that while most of the orientation space is consistent with a fixed position, some particular features of their sense impressions do not maintain their expected relations to the other features, and no combination of the observer’s translation and orientation variables can restore consistency. The inferred external objects of perception can no longer be modeled based on the premise that their relations with respect to each other are unchanging. Significantly, the observer may notice that some features vary as would be expected if the observer’s own positional state had changed in one way, whereas other features vary as would be expected if the observer’s positions had changed in a different way. From this recognition the observer concludes that, just as he himself can translate through the space, so also can individual external objects, and the relations are reciprocal. Thus, to each object we now assign an independent set of translation coordinates for each state of the observer. In so doing we have made another important conceptual choice, namely, to regard "external objects" as having individual identities that persist from one state to the next. Other interpretations are possible. For example, we could account for the apparent motion of objects by supposing that one external entity simply ceases to exist, and another similar entity in a slightly different position comes into existence. According to this view, there would be no such thing as motion, but simply a sequence of arrangements of objects with some similarities. This may seem obtuse, but according to quantum mechanics it actually is not possible to unambiguously map the identities of individual elementary particles (such as electrons) from one event to another (because their wave functions overlap). Thus the seemingly innocuous assumption of continuous and persistent identities for material objects through time is actually, on some level, demonstrably false. However, on the macroscopic level, physical objects do seem to maintain individual identities, or at least it is possible to successfully model our sense impressions based on the assumption of persistent identities (because the overlaps between wave functions are negligible), and this success is the justification for introducing the concept of motion for the objects of experience.

The conceptual model of our hypothetical observer now involves something that we may call distance, related to the translational state variables, but it’s worth noting that we have no direct perception of distances between ourselves and the assumed external objects, and even less between one external object and another. We have only our immediate sense impressions, which are understood to be purely local interactions, involving signals of some kind impinging on our senses. We infer from these signals a conceptual model of space and time within which external objects reside and move. This model actually entails two distinct kinds of extent, which we may call distance and length. An object, consisting of a locus of sense impressions that maintains a degree of coherence over time, has a spatial length, as do the paths that objects may follow in their motions, but the conceptual model of space also allows us to conceive of a distance between two objects, defined as the length of the shortest possible path between them. The task of quantifying these distances, and of relating the orientation variables with the translation variables, then involves further assumptions. Since this is a passive model, all changes are strictly known only as a function of the single state variable, but we imagine other pseudo-independent variables based on the observed correlations. We have two means of quantifying spatial distances. One is by observing the near coincidence of one or more stable entities (measuring rods) with the interval to be quantified, and the other is to observe the change in the internal state variable as an object of stable speed moves from one end of the interval to the other. Thus we can quantify a spatial interval in terms of some reference spatial interval, or in terms of the associated temporal interval based on some reference state of motion. We identify these references purely by induction based on experience. Combining the rotational symmetries and the apparent translational distances that we infer from our primary sense impressions, we conventionally arrive at a conception of the external world that is, in some sense, the dual of our subjective experience. In other words, we interpret our subjective experience as a one-dimensional temporally-ordered sequence of events, whereas we conceive of "the objective world now" corresponding to a single perceived event as a three-dimensional expanse of space as illustrated below:

In this way we intuitively conceive of time and space as inherently perpendicular dimensions, but complications arise if we posit that each event along our subjective path resides in, and is an element of, an objective world. If the events along any path are discrete, then we might imagine a simple sequence of discrete "instantaneous worlds":

One difficulty with this arrangement is that it isn't clear how (or whether) these worlds interact with each other. If we regard each "instant" as a complete copy of the spatial universe, separate from every other instant, then there seems to be no definite way to identify an object in one world with "the same" object in another, particularly considering qualitatively identical objects such as electrons. If we have two electrons assigned the labels A and B in one instant of time, and if we find two electrons in the next instant of time, we have no certain way of deciding which of them was the "A" electron from the previous instant. (In fact, we cannot even map the spatial locations of one instant to "the same" locations in any other instant.) This illustrates how the classical concept of motion is necessarily based on the assumption of persistent identities of objects from one instant to another. Since it does seem possible (at least in the classical realm) to organize our experiences in terms of individual objects with persistent and unambiguous identities over time, we may be led to suspect that the sequence of existence of an individual or object in any one instant must be, in some sense, connected to or contiguous with its existence in neighboring instants. If these objects are the constituents of "the world", this suggests that space itself at any "instant" is continuous with the spaces of neighboring instants. This is important because it implies a definite connectivity between neighboring world-spaces, and this, as we'll see, places a crucial constraint on the relativity of motion. Another complication concerns the relative orderings of world-instants along different paths. Our schematic above implied that the "instantaneous worlds" are well-ordered in the sense that they are encountered in the same order along every individual's path, but of course this need not be the case. For example, we could equally well imagine an arrangement in which the "instantaneous worlds" are skewed, so that different individuals encounter them in different orders, as illustrated below.

The concept of motion assumes the world can be analyzed in two different ways, first as

the union of a set of mutually exclusive "events", and second as a set of "objects" each of which participates in an ordered sequence of events. In addition to this ordering of events encountered by each individual object, we must also assume both a co-lateral ordering of the events associated with different objects, and a transverse ordering of events from one object to another. These three kinds of orderings are illustrated schematically below.

This diagram suggests that the idea of motion is actually quite complex, even in this simple abstract model. Intuitively we regard motion as something like the derivative of the spatial "position" with respect to "time", but we can't even unambiguously define the distance between two worldlines, because it depends on how we correlate the temporal ordering along one line to the temporal ordering along the other. Essentially our concept of motion is overly ambitious, because we want it to express the spatial distance from the observer to the object for each event along the observer's worldline, but the intervals from one worldline to another are not confined to the worldlines themselves, so we have no definite way of assigning those intervals to events along our worldline. The best we can do is correlate all the intervals from a particular point on the observer's worldline to the object's worldline. When we considered everything in terms of the sense impressions of just a single observer this was not an issue, since only one parameterization was needed to map the experiences of that observer, interpreted solipsistically. Any convenient parameterization was suitable. When we go on to consider multiple observers and objects we can still allow each observer to map his experiences and internal states using the most convenient terms of reference (which will presumably include his own state-index as the temporal coordinate), but now the question arises as to how all these private coordinate systems are related to each other. To answer this question we need to formalize our parameterizations into abstract systems of coordinates, and then consider how the coordinates of any given event with respect to one system are related to the coordinates of the same event with respect to another system. This is discussed in the next section. Considering how far removed from our raw sense impressions is our conceptual model of the external world, and how many unjustified assumptions and interpolations are involved in its construction, it’s easy to see why some philosophers have advocated the rejection of all conceptual models. However, the fact remains that the imperative to reconcile our experience with some model of an objective external world has been one of the most important factors guiding the development of physical theories. Even in

quantum mechanics, arguably the field of physics most resistant to complete realistic reconciliation, we still rely on the "correspondence principle", according to which the observables of the theory must conform to the observables of classical realistic models in the appropriate limits. Naturally our interpretations of experience are always provisional, being necessarily based on incomplete induction, but conceptual models of an objective world have proven (so far) to be indispensable. 1.2 Systems of Reference Any one who will try to imagine the state of a mind conscious of knowing the absolute position of a point will ever after be content with our relative knowledge. James Clerk Maxwell, 1877 There are many theories of relativity, each of which can be associated with some arbitrariness in our descriptions of events. For example, suppose we describe the spatial relations between stationary particles on a line by assigning a real-valued coordinate to each particle, such that the distance between any two particles equals the difference between their coordinates. There is a degree of arbitrariness in this description due to the fact that all the coordinates could be increased by some arbitrary constant without affecting any of the relations between the particles. Symbolically this translational relativity can be expressed by saying that if x is a suitable system of coordinates for describing the relations between the particles, then so is x + k for any constant k. Likewise if we describe the spatial relations between stationary particles on a plane by assigning an ordered pair of real-valued coordinates to each particle, such that the squared distance between any two particles equals the sum of the squares of the differences between their respective coordinates, then there is a degree of arbitrariness in the description (in addition to the translational relativity of each individual coordinate) due to the fact that we could rotate the coordinates of every particle by an arbitrary constant angle without affecting any of the relations between the particles. This relativity of orientation is expressed symbolically by saying that if (x,y) is a suitable system of coordinates for describing the positions of particles on a plane, then so is (axby, bx+ay) where a2 + b2 = 1. These relativities are purely formal, in the sense that they are tautological consequences of the premises, regardless of whether they have any physical applicability. Our first premise was that it’s possible to assign a single real-valued coordinate to each particle on a line such that the distance between any two particles equals the difference between their coordinates. If this premise is satisfied, the invariance of relations under coordinate transformations from x to x + k follows trivially, but if the pairwise distances between three given particles were, say, 5, 3, and 12 units, then no three numbers could be assigned to the particles such that the pairwise differences equal the distances. This shows that the n(n1)/2 pairwise distances between n particles cannot be independent of each other if those distances can be encoded unambiguously by just n coordinates in one

dimension or, more generally, by kn coordinates in k dimensions. A suitable system of coordinates in one dimension exists only if the distances between particles satisfy a very restrictive condition. Letting d(A,B) denote the signed distance from A to B, the condition that must be satisfied is that for every three particles A,B,C we have d(A,B) + d(B,C) + d(C,A) = 0. Of course, this is essentially the definition of co-linearity, but we have no a priori reason to expect this definition to have any applicability in the world of physical objects. The fact that it has wide applicability is a non-trivial aspect of our experience, albeit one that we ordinarily take for granted. Likewise for particles in a region of three dimensional space the premise that we can assign three numbers to each particle such that the squared distance between any two particles equals the sum of the squares of the differences between their respective coordinates is true only under a very restrictive condition, because there are only 3n degrees of freedom in the n(n1)/2 pairwise distances between n particles. Just as we found relativity of orientation for the pair of spatial coordinates x and y, we also find the same relativity for each of the pairs x,z and y,z in three dimensional space. Thus we have translational relativity for each of the four coordinates x,y,z,t, and we have rotational relativity for each pair of spatial coordinates (x,y), (x,z), and (y,z). This leaves the pairs of coordinates (x,t), (y,t) and (z,t). Not surprisingly we find that there is an analogous arbitrariness in these coordinate pairs, which can be expressed (for the x,t pair) by saying that the relations between the instances of particles on a line as a function of time are unaffected if we replace the x and t coordinates with ax – bt and –bx + at respectively, where a2 – b2 = 1. These transformations (rotations in the x,t plane through an imaginary angle), which characterize the theory of special relativity, are based on the premise that it is possible to assign pairs of values, x and t, to each instance of each particle on the x axis such that the squared spacetime distance equals the difference between the squares of the differences between the respective coordinates. Each of the above examples represents an invariance of physically measurable relations under certain classes of linear transformations. Extending this idea, Einstein’s general theory of relativity shows how the laws of physics, suitably formulated, are invariant under an even larger class of transformations of space and time coordinates, including non-linear transformations, and how these transformations subsume the phenomena of gravity. In general relativity the metrical properties of space and time are not constant, so the simple premises on which we based the primitive relativities described above turn out not to be satisfied globally. However, it remains true that those simple premises are satisfied locally, i.e., over sufficiently small regions of space and time, so they continue to be of fundamental importance. As mentioned previously, the relativities described above are purely formal and tautological, but it turns out that each of them is closely related to a non-trivial physical symmetry. There exists a large class of identifiable objects whose lengths maintain a fixed proportion to each other under the very same set of transformations that characterize the relativities of the coordinates. In other words, just as we can translate the coordinates on the x axis without affecting the length of any object, we also find a large

class of objects that can be individually translated along the x axis without affecting their lengths. The same applies to rotations and boosts. Such changes are physically distinct from purely formal shifts of the entire coordinate system, because when we move individual objects we are actually changing the relations between objects, since we are moving only a subset of all the coordinated objects. (Also, moving an object from one stationary position to another requires acceleration.) Thus for each formal arbitrariness in the system of coordinates there exists a physical symmetry, i.e., a large class of entities whose extents remain in constant proportions to each other when subjected individually to the same transformations. We refer to these relations as physical symmetries rather than physical invariances, because (for example) we have no basis for asserting that the length of a solid object or the duration of a physical process is invariant under changes in position, orientation or state of motion. We have no way of assessing the truth of such a statement, because our measures of length and duration are all comparative. We can say only that the spatial and temporal extents of all the “stable” physical entities and processes are affected (if at all) in exactly the same proportion by changes in position, orientation, and state of motion. Of course, given this empirical fact, it is often convenient to speak as if the spatial and temporal extents are invariant, but we shouldn’t forget that, from an epistemological standpoint, we can assert only symmetry, not invariance. In his original presentation of special relativity in 1905 Einstein took measuring rods and clocks as primitive elements, even though he realized the weakness of this approach. He later wrote of the special theory It is striking that the theory introduces two kinds of physical things, i.e., (1) measuring rods and clocks, and (2) all other things, e.g., the electromagnetic field, the material point, etc. This, in a certain sense, is inconsistent; strictly speaking, measuring rods and clocks should emerge as solutions of the basic equations (objects consisting of moving atomic configurations), not, as it were, as theoretically self-sufficient entities. The procedure was justified, however, because it was clear from the very beginning that the postulates of the theory are not strong enough to deduce from them equations for physical events sufficiently complete and sufficiently free from arbitrariness to form the basis of a theory of measuring rods and clocks. This is quite similar to the view he expressed many years earlier …the solid body and the clock do not in the conceptual edifice of physics play the part of irreducible elements, but that of composite structures, which may not play any independent part in theoretical physics. But it is my conviction that in the present stage of development of theoretical physics these ideas must still be employed as independent ideas; for we are still far from possessing such certain knowledge of theoretical principles as to be able to give exact theoretical constructions of solid bodies and clocks. The first quote is from his Autobiographical Notes in 1949, whereas the second is from

his essay on Geometry and Experience published in 1921. It’s interesting how little his views had changed during the intervening 28 years, despite the fact that those years saw the advent of quantum mechanics, which many would say provided the very theoretical principles underlying the construction of solid bodies and clocks that Einstein felt had been lacking. Whether or not the principles of quantum mechanics are adequate to justify our conceptions of reference lengths and time intervals, the characteristic spatial and temporal extents of quantum phenomena are used today as the basis for all such references. Considering the arbitrariness of absolute coordinates, one might think our spatiotemporal descriptions could be better expressed in purely relational terms, such as by specifying only the mutual distances (minimum path lengths) between objects. Nevertheless, the most common method of description is to assign absolute coordinates (three spatial and one temporal) to each object, with reference to an established system of coordinates, while recognizing that the choice of coordinate systems is to some extent arbitrary. The relations between objects are then inferred from these absolute (thought somewhat arbitrary) coordinates. This may seem to be a round-about process, but there are several reasons for using absolute coordinate systems to encode the relations between objects, rather than explicitly specifying the relations themselves. One reason is that this approach enables us to take advantage of the efficiency made possible by the finite dimensionality of space. As discussed in Section 1.1, if there were no limit to the dimensionality of space, then we would expect a set of n particles to have n(n1)/2 independent pairwise spatial relations, so to explicitly specify all the distances between particles would require n1 numbers for each particle, representing the distances to each of the other particles. For a large number of particles (to say nothing of a potentially infinite number) this would be impractical. Fortunately the spatial relations between the objects of our experience are not mutually independent. The nth particle essentially adds only three (rather than n1) degrees of freedom to the relational configuration. In physical terms this restriction can be clearly seen from the fact that the maximum number of mutually equidistant particles in D-dimensional space is D+1. Experience teaches us that in our physical space we can arrange four, but not five or more, particles such that they are all mutually equidistant, so we conclude that our space has three dimensions. Historically the use of absolute coordinates rather than explicit relations may also have been partly due to the fact that analytic geometry and Cartesian coordinates were invented (by Fermat, Descartes and others) at almost the same time that the new science of mechanics needed them, just as tensor analysis was invented, three hundred years later, at the very moment when it was needed to facilitate the development of general relativity. (Of course, such coincidences are not accidental; contrivances requiring new materials tend to be invented soon after the material becomes available.) The coordinate systems of Descartes were not merely efficient, they were also consistent with the ancient Aristotelian belief (also held by Descartes) that there is no such thing as empty space or vacuum, and that continuous substance permeates the universe. In this context we cannot even contemplate explicitly specifying each individual distance between substantial

points, because space is regarded as a continuum of substance. For Aristotle and Descartes, every spatial extent is a measure of the length of some substance, not a pure distance between particles as contemplated by atomists. In this sense we can say that the continuous absolute coordinate systems inherited by modern science from Aristotle and Descartes are a remnant of the Cartesian natural philosophy. Another, perhaps more compelling, reason for the adoption of abstract coordinate systems in the descriptions of physical phenomena was the need to account for acceleration. As Newton explained with the example of a “spinning pail”, the mutual relations between a set of material particles in an instant are not adequate to fully characterize a physical situation – at least not if we are considering only a small subset of all the particles in the universe. (Whether the mutual relations would be adequate if all the matter in the universe was taken into account is an open question.) In retrospect, there were other possible alternatives, such as characterizing not just the relations between particles at a specific instant, but over some temporal span of existence, but this would have required the unification of spatial and temporal measures, which did not occur until much later. Originally the motions of objects were represented simply by allowing the spatial coordinates of each persistent object to be continuous single-valued functions of one real variable, the time coordinate. Incidentally, one consequence of the use of absolute coordinates is that it automatically entails a breaking of the alleged translational symmetry. We said previously that the coordinate system x could be replaced by x + k for any real number k, implying that every real value of k is in some sense equally suitable. However, from a strictly mathematical point of view there does not exist a uniform distribution over the real numbers, so this form of representation does not exactly entail the perfect symmetry of position in an infinite space, even if the space is completely empty. The set of all combinations of values for the three spatial coordinates and one time coordinate is assumed to give a complete coordination not only of the spatial positions of each entity at each time, but of all possible spatial positions at all possible times. Any definite set of space and time coordinates constitutes a system of reference. There are infinitely many distinct ways in which such coordinates can be assigned, but they are not entirely arbitrary, because we limit the range of possibilities by requiring contiguous physical entities to be assigned contiguous coordinates. This imposes a definite structure on the system, so it is more than merely a set of labels; it represents the most primitive laws of physics. One way of specifying an entire model of a world consisting of n (classical) particles would be to explicitly give the 3n functions xj(t), yj(t), zj(t) for j = 1 to n. In this form, the un-occupied points of space would be irrelevant, since only the actual paths of actual physical entities have any meaning. In fact, it could be argued that only the intersections of these particles have physical significance, so the paths followed by the particles in between their mutual intersections could be regarded as merely hypothetical. Following this approach we might end up with a purely combinatorial specification of discrete interactions, with no need for the notion of a continuous physical space within which

entities reside and move. However, the hypothesis that physical objects have continuous positions as functions of time with respect to a specified system of reference has proven to be extremely useful, especially for purposes of describing simple laws by which the observable interactions can be efficiently described and predicted. An important class of physical laws that make use of the full spatio-temporal framework consists of laws that are expressed in terms of fields. A field is regarded as existing at each point within the system of coordinates, even those points that are not occupied by a material particle. Therefore, each continuous field existing throughout time has, potentially, far more degrees of freedom than does a discrete particle, or even infinitely many discrete particles. Arguably, we never actually observe fields, were merely observe effects attributed to fields. It’s ironic that we can simplify the descriptions of particles by introducing hypothetical entities (fields) with far more degrees of freedom, but the laws governing the behavior of these fields (e.g., Maxwell’s equations for the electromagnetic field) along with symmetries and simple boundary conditions suffice to constrain the fields so that actually do provide a simplification. (Fields also provide a way of maintaining conservation laws for interactions “at a distance”.) Whether the usefulness of the concepts of continuous space, time, and fields suggests that they possess some ontological status is debatable, but the concepts are undeniably useful. These systems of reference are more than simple labeling. The numerical values of the coordinates are intended to connote physical properties of order and measure. In fact, we might even suppose that the sequence of states of all particles are uniformly parameterized by the time coordinate of our system of reference, but therein lies an ambiguity, because it isn't clear how the temporal states of one particle are to be placed in correspondence with the temporal states of another. Here we must make an important decision about how our model of the world is to be constructed. We might choose to regard the totality of all entities as comprising a single element in a succession of universal temporal states, in which case the temporal correspondence between entities is unambiguous. In such a universe the temporal coordinate induces a total ordering of events, which is to say, if we let the symbol  denote temporal precedence or equality, then for every three events a,b,c we have (i) (ii) (iii) (iv)

aa if a  b and b  a, then a = b if a  b and b  c, then a  c either a  b or b  a

However, this is not the only possible choice. We might choose instead to regard the temporal state of each individual particle as an independent quantity, bearing in mind that orderings of the elements of a set are not necessarily total. For example, consider the subsets of a flat plane, and the ordering induced by the inclusion relation . Obviously the first three axioms of a total ordering are satisfied, because for any three subsets a,b,c of the plane we have (i) a  a , (ii) if a  b and b  a, then a = b, and (iii) if a  b and b  c, then a  c. However, the fourth axiom is not satisfied, because it's entirely possible to have two sets neither of which is included in the other. An ordering of this type is

called a partial ordering, and we should allow for the possibility that the temporal relations between events induce a partial rather than a total ordering. In fact, we have no a priori reason to expect that temporal relations induce even a partial ordering. It is safest to assume that each entity possesses its own temporal state, and let our observations teach us how those states are mutually related, if at all. (Similar caution should be applied when modeling the relations between the spatial states of particles.) Given any system of space and time coordinates we can define infinitely many others such that speeds are preserved. This represents an equivalence relation, and we can then define a reference frame as an equivalence class of coordinate systems such that the speed of each object has the same value in terms of each coordinate system in that class. Thus within a reference frame we can speak of the speed of an object, without needing to specify any particular coordinate system. Of course, just as our coordinate systems are generally valid only locally, so too are the reference frames. Purely kinematic relativity contains enough degrees of freedom that we can simply define our systems of reference (i.e., coordinate systems) to satisfy the additivity of velocity. In other words, we can adopt velocity additivity as a principle, and this is essentially what scientists had tacitly done since ancient times. The great insight of Galileo and his successors was that this principle is inadequate to single out the physically meaningful reference systems. A new principle was necessary, namely, the principle of inertia, to be discussed in the next section. 1.3 Inertia and Relativity These or none must serve for reasons, and it is my great happiness that examples prove not rules, for to confirm this opinion, the world yields not one example. John Donne In his treatise "On the Revolution of Heavenly Spheres" Copernicus argued for the conceivability of a moving Earth by noting that ...every apparent change in place occurs on account of the movement either of the thing seen or of the spectator, or on account of the necessarily unequal movement of both. No movement is perceptible relatively to things moved equally in the same direction - I mean relatively to the thing seen and the spectator. This is a purely kinematical conception of relativity, like that of Aristarchus, based on the idea that we judge the positions (and changes in position) of objects only in relation to the positions of other objects. Many of Copernicus’s contemporaries rejected the idea of a moving Earth, because we do not directly “sense” any such motion. To answer this objection, Galileo developed the concept of inertia, which he illustrated by “thought experiment” involving the behavior of objects inside a ship which is moving at some

constant speed in a straight line. He pointed out that ... among things which all share equally in any motion, [that motion] does not act, and is as if it did not exist... in throwing something to your friend, you need throw it no more strongly in one direction than in another, the distances being equal... jumping with your feet together, you pass equal spaces in every direction... Thus Galileo's approach was based on a dynamical rather than a merely kinematic analysis, because he refers to forces acting on bodies, asserting that the dynamic behavior of bodies is homogeneous and isotropic in terms of (suitably defined) measures in any uniform state of motion. This soon led to the modern principle of inertial relativity, although Galileo himself seems never to have fully grasped the distinction between accelerated and unaccelerated motion. He believed, for example, that circular motion was a natural state that would persist unless acted upon by some external agent. This shows that the resolution of dynamical behavior into inertial and non-inertial components which we generally take for granted today - is more subtle than it may appear. As Newton wrote: ...the whole burden of philosophy seems to consist in this: from the phenomena of motions to infer the forces of nature, and then from these forces to deduce other phenomena... Newton’s doctrine implicitly assumes that forces can be inferred from the motions of objects, but establishing the correspondence between forces and motions is not trivial, because the doctrine is, in a sense, circular. We infer “the forces of nature” from observed motions, and then we account for observed motions in terms of those forces. This assumes we can distinguish between forced and unforced motion, but there is no a priori way of making such a distinction. For example, the roughly circular motion of the Moon around the Earth might suggest the existence of a force (universal gravitation) acting between these two bodies, but it could also be taken as an indication that circular motion is a natural form of unforced motion, as Galileo believed. Different definitions of unforced motion lead to different sets of implied “forces of nature”. The task is to choose a definition of unforced motion that leads to the identification of a set of physical forces that gives the most intelligible decomposition of phenomena. By indirect reasoning, the natural philosophers of the seventeenth century eventually arrived at the idea that, in the complete absence of external forces, an object would move uniformly in a straight line, and that, therefore, whenever we observe an object whose speed or direction of motion is changing, we can infer that an external force – proportional to the rate of change of motion – is acting upon that object. This is the principle of inertia, the most successful principle ever proposed for organizing our knowledge of the natural world. Notice that it refers to how a free object “would” move, because no object is completely free from all external forces. Thus the conditions of this fundamental principle, as stated, are never actually met, which highlights the subtlety of Newton’s doctrine, and the aptness of his assertion that it comprises “the whole burden of philosophy”. Also, notice that the principle of inertia does not discriminate between different states of uniform motion in straight lines, so it automatically entails a principle of relativity of dynamics, and in fact

the two are essentially synonymous. The first explicit statement of the modern principle of inertial relativity was apparently made by Pierre Gassendi, who is most often remembered today for reviving the ancient Greek doctrine of atomism. In the 1630's Gassendi repeated many of Galileo's experiments with motion, and interpreted them from a more abstract point of view, consciously separating out gravity as an external influence, and recognizing that the remaining "natural states of motions" were characterized not only by uniform speeds (as Galileo had said) but also by rectilinear paths. In order to conceive of inertial motion, it is necessary to review the whole range of observable motions of material objects and imagine those motions if the effects of all known external influences were removed. From this resulting set of ideal states of motion, it is necessary to identify the largest possible "equivalence class" of relatively uniform and rectilinear motions. These motions and configurations then constitute the basis for inertial measurements of space and time, i.e., inertial coordinate systems. Naturally inertial motions will then necessarily be uniform and rectilinear with respect to these coordinate systems, by definition. Shortly thereafter (1644), Descartes presented the concept of inertial motion in his "Principles of Philosophy": Each thing...continues always in the same state, and that which is once moved always continues to move...and never changes unless caused by an external agent... all motion is of itself in a straight line...every part of a body, left to itself, continues to move, never in a curved line, but only along a straight line. Similarly, in Huygens' "The Motion of Colliding Bodies" (composed in the mid 1650's but not published until 1703), the first hypothesis was that Any body already in motion will continue to move perpetually with the same speed in a straight line unless it is impeded. Ultimately Newton incorporated this principle into his masterpiece, "Philosophiae Naturalis Principia Mathematica" (The Mathematical Principles of Natural Philosophy), as the first of his three “laws of motion" 1) Every body continues in its state of rest, or of uniform motion in a right line, unless it is compelled to change that state by the forces impressed upon it. 2) The change of motion is proportional to the motive force impressed, and is made in the direction of the right line in which that force is impressed. 3) To every action there is always opposed an equal and opposite reaction; or, the mutual actions of two bodies upon each other are always equal, and directed to contrary parts. These “laws” expresses the classical mechanical principle of relativity, asserting equivalence between the conditions of "rest" and "uniform motion in a right line". Since no distinction is made between the various possible directions of uniform motion, the

principle also implies the equivalence of uniform motion in all directions in space. Thus, if everything in the universe is a "body" in the sense of this law, and if we stipulate rules of force (such as Newton's second and third laws) that likewise do not distinguish between bodies at rest and bodies in uniform motion, then we arrive at a complete system of dynamics in which, as Newton said, "absolute rest cannot be determined from the positions of bodies in our regions". Corollary 5 of the Newton’s Principia states The motions of bodies included in a given space are the same among themselves, whether that space is at rest or moves uniformly forwards in a straight line without circular motion. Of course, this presupposes that the words "uniformly" and "straight" have unambiguous meanings. Our concepts of uniform speed and straight paths are ultimately derived from observations of inertial motions, so the “laws of motion” are to some extent circular. These laws were historically expressed in terms of inertial coordinate systems, which are defined as the coordinate systems in terms of which these laws are valid. In other words, we define an inertial coordinate system as a system of space and time coordinates in terms of which inertia is homogeneous and isotropic, and then we announce the “laws of motion”, which consist of the assertion that inertia is homogeneous and isotropic with respect to inertial coordinate systems. Thus the “laws of motion” are true by definition. Their significance lies not in their truth, which is trivial, but in their applicability. The empirical fact that there exist systems of inertial coordinates is what makes the concept significant. We have no a priori reason to expect that such coordinate systems exist, i.e., that the forces of nature would resolve themselves so coherently on this (or any other finite) basis, but they evidently do. In fact, it appears that not just one such coordinate system exists (which would be remarkable enough), but that infinitely many of them exist, in all possible states of relative motion. To be precise, the principle of relativity asserts that for any material particle in any state of motion there exists an inertial coordinate system in terms of which the particle is (at least momentarily) at rest. It’s important to recognize that Newton’s first law, by itself, is not sufficient to identify the systems of coordinates in terms of which all three laws of motion are satisfied. The first law serves to determine the shape of the coordinate axes and inertial paths, but it does not fully define a system of inertial coordinates, because the first law is satisfied in infinitely many systems of coordinates that are not inertial. The system of oblique xt coordinates illustrated below is an example of such a system.

The two dashed lines indicate the paths of two identical objects, both initially at rest with respect to these coordinates and propelled outward from the origin by impulses forces of equal magnitude (acting against each other). Every object not subject to external forces moves with uniform speed in a straight line with respect to this coordinate system, so Newton's First Law of motion is satisfied, but the second law clearly is not, because the speeds imparted to these identical objects by equal forces are not equal. In other words, inertia is not isotropic with respect to these coordinates. In order for Newton's Second Law to be satisfied, we not only need the coordinate axes to be straight and uniformly graduated relative to freely moving objects, we need the space axes to be aligned in time such that mechanical inertia is the same in all spatial directions (so that, for example, the objects whose paths are represented by the two dashed lines in the above figure have the same speeds). This effectively establishes the planes of simultaneity of inertial coordinate systems. In an operational sense, Newton's Third Law is also involved in establishing the planes of simultaneity for an inertial coordinate system, because it is only by means of the Third Law that we can actually define "equal forces" as the forces necessary to impart equal "quantities of motion" (to use Newton’s phrase). Of course, this doesn't imply that inertial coordinate systems are the "true" systems of reference. They are simply the most intuitive, convenient, and readily accessible systems, based on the inertial behavior of material objects. In addition to contributing to the definition of an inertial coordinate system, the third law also serves to establish a fundamental aspect of the relationships between relatively moving inertial coordinate systems. Specifically, the third law implies (requires) that if the spatial origin of one inertial coordinate system is moving at velocity v with respect to a second inertial coordinate system, then the spatial origin of the second system is moving at velocity v with respect to the first. This property is sometimes called reciprocity, and is important for the various derivations of the Lorentz transformation to be presented in subsequent sections. Based on the definition of an inertial coordinate system, and the isotropy of inertia with respect to such coordinates, it follows that two identical objects, initially at rest with respect to those coordinates and exerting a mutual force on each other, recoil by equal distances in equal times (in accord with Newton’s third law). Assuming the lengths of stable material objects are independent of their spatial positions and orientations (spatial homogeneity and isotropy), it follows that we can synchronize distant clocks with identical particles ejected with equal forces from the mid-point between the clocks. Of course, this operational definition of simultaneity is not new. It is precisely what Galileo described in his illustration of inertial motion onboard a moving ship. When he wrote that an object thrown with equal force will reach equal distances [in the same time], he was implicitly defining simultaneity at separate locations on the basis of inertial isotropy. This is crucial to understanding the significance of inertial coordinate systems. The requirement for a particular object to be at rest with respect to the system suffices only to determine the direction of the "time axis", i.e., the loci of constant spatial position. Galileo and his successors realized (although they did not always explicitly state) that it is also necessary to specify the loci of constant temporal position, and this is achieved by choosing coordinates in such a way that mechanical inertia is isotropic. (This means the

inertia of an object does not depend on any absolute reference direction in space, although it may depend on the velocity of the object. It is sufficient to say the resistance to acceleration of a resting object is the same in all spatial directions.) Conceptually, to establish a complete system of space and time coordinates based on inertial isotropy, imagine that at each point in space there is an identically constructed cannon, and all these cannons are at rest with respect to each other. At one particular point, which we designate as the origin of our coordinates, is a clock and numerous identical cannons, each pointed at one of the other cannons out in space. The cannons are fired from the origin, and when a cannonball passes one of the external cannons it triggers that external cannon to fire a reply back to the origin. Each cannonball has identifying marks so we can correlate each reply with the shot that triggered it, and with the identity of the replying cannon. The ith reply event is assigned the time coordinate ti = [treturn(i)  tsend(i)]/2 seconds, and it is assigned space coordinates xi, yi, zi based on the angular direction of the sending cannon and the radial distance ri = ti cannon-seconds. This procedure would have been perfectly intelligible to Newton, and he would have agreed that it yields an inertial coordinate system, suitable for the application of his three laws of motion. Naturally given one such system of coordinates, we can construct infinitely many others by simple spatial re-orientation of the space axes and/or translation of the spatial or temporal axes. All such transformations leave the speed of every object unchanged. An equivalence class of all such inertial coordinate systems is called an inertial reference frame. For characterizing the mutual dynamical states of two material bodies, the associated inertial rest frames of the bodies are more meaningful than the mere distance between the bodies, because any inertial coordinate system possesses a fixed spatial orientation with respect to any other inertial coordinate system, enabling us to take account of tangential motion between bodies whose mutual distance is not changing. For this reason, the physically meaningful "relative velocity of two material bodies" is best defined as their reciprocal states of motion with respect to each others' associated inertial rest frame coordinates. The principle of relativity does not tell us how two relatively moving systems of inertial coordinates are related to each other, but it does imply that this relationship can be determined empirically. We need only construct two relatively moving systems of inertial coordinates and compare them. Based on observations of coordinate systems with relatively low mutual speeds, and with the limited precision available at the time, Galileo and Newton surmised that if (x,t) is an inertial coordinate system then so is (x’,t’), where

and v is the mutual speed between the origins of the two systems. This implies that relative speeds are simply additive. In other words, if a material object B is moving at the speed v in terms of inertial rest frame coordinates of A, and if an object C is moving in the same direction at the speed u in terms of inertial rest frame coordinates of B, then C is moving at the speed v + u in terms of inertial rest frame coordinates of A. This

conclusion may seem plausible, but it's important to realize that we are not free to arbitrarily adopt this or any other transformation and speed composition rule for the set of inertial coordinate systems, because those systems are already fully defined (up to insignificant scale factors) by the requirements for inertia to be homogeneous and isotropic and for momentum to be conserved. These properties suffice to determine the set of inertial coordinate systems and (therefore) the relationships between them. Given these conditions, the relationship between relatively moving inertial coordinate systems, whatever it may be, is a matter of empirical fact. Of course, inertial isotropy is not the only possible basis for constructing spacetime coordinate systems. We could impose a different constraint to determine the loci of constant temporal position, such as a total temporal ordering of events. However, if we do this, we will find that mechanical inertia is generally not isotropic in terms of the resulting coordinate systems, so the usual symmetrical laws of mechanics will not be valid in terms of those coordinate systems (at least not if restricted to ponderable matter). Indeed this was the case for the ether theories developed in the late 19th century, as discussed in subsequent sections. Such coordinate systems, while extremely awkward, would not be logically inconsistent. The choices we make to specify a coordinate system and to resolve spacetime intervals into separate spatial and temporal components are to some extent conventional, provided we are willing to disregard the manifest symmetry of physical phenomena. But since physics consists of identifying and understanding the symmetries of nature, the option of disregarding those symmetries does not appeal to most physicists. By the end of the nineteenth century a new class of phenomena involving electric and magnetic fields had been incorporated into physics, and the concept of inertia was found to be applicable to these phenomena as well. For example, Maxwell’s equations imply that a pulse of light conveys momentum. Hence the principle of inertia ought to apply to electromagnetism as well as to the motions of material bodies. In his 1905 paper “On the Electrodynamics of Moving Bodies” Einstein adopted this more comprehensive interpretation of inertia, basing the special theory of relativity on the proposition that The laws by which the states of physical systems undergo changes are not affected, whether these changes of state be referred to the one or the other of two systems of [inertial] coordinates in uniform translatory motion. This is nearly identical to Newton’s Corollary 5. It’s unfortunate that the word "inertial" was omitted, because, as noted above, uniform translatory motion is not sufficient to ensure that a system of coordinates is actually an inertial coordinate system. However, Einstein made it clear that he was indeed talking about inertial coordinate systems when he previously characterized them as coordinate systems “in which the equations of Newtonian mechanics hold good”. Admittedly this is a somewhat awkward assertion in the context of Einstein’s paper, because one of the main conclusions of the paper is that the equations of Newtonian mechanics do not precisely “hold good” with respect to inertial coordinate systems. Recognizing this inconsistency, Sommerfeld added a footnote in subsequent published editions of Einstein’s paper, qualifying the statement about

Newtonian mechanics holding good “to the first approximation”, but this footnote does not really clarify the situation. Fundamentally, the class of coordinate systems that Einstein was trying to identify (the inertial coordinate systems) are those in terms of which inertia is homogeneous and isotropic, so that free objects move at constant speed in straight lines, and the force required to accelerate an object from rest to a given speed is the same in all directions. As discussed above, these conditions are just sufficient to determine a coordinate system in terms of which the symmetrical equations of mechanics hold good, but without pre-supposing the exact form of those equations. Since light (i.e., an electromagnetic wave) carries momentum, and the procedure for constructing an inertial coordinate system described previously was based on the isotropy of momentum, it is reasonable to expect that pulses of light could be used in place of cannonballs, and we should arrive at essentially the same class of coordinate systems. In his 1905 paper this is how Einstein described the construction of inertial coordinate systems, implicitly asserting that the propagation of light is isotropic with respect to the same class of coordinate systems in terms of which mechanical inertia is isotropic. In this respect it might seem as if he was treating light as a stream of inertial particles, and indeed his paper on special relativity was written just after the paper in which he introduced the concept of photons. However, we know that light is not exactly like a stream of material particles, especially because we cannot conceive of light being at rest with respect to any system of inertial coordinates. The way in which light fits into the framework of inertial coordinate systems is considered in the next section. We will find that although the principle of relativity continues to apply, and the definition of inertial coordinate systems remains unchanged, the relationship between relatively moving systems of inertial coordinate systems must be different than what Galileo and Newton surmised. 1.4 The Relativity of Light According to the theory of emission, the transmission of energy [of light] is effected by the actual transference of light-corpuscles… According to the theory of undulation, there is a material medium which fills the space between two bodies, and it is by the action of contiguous parts of this medium that the energy is passed on… James Clerk Maxwell Light is arguably the phenomenon of nature with which we have the most conscious experience, by means of our sense of vision, and yet throughout most of human history very little seems to have been known about how vision works. Interestingly, from the very beginning there were at least two distinct concepts of light, existing side by side, as can be seen in some of the earliest known writings. For example, the description of creation in the biblical book of Genesis says light was created on the first day, and yet the sun, moon, and stars were not created until the fourth day “to give light upon the earth”. Evidently the word “light” is being used to signify two different things on the first and

fourth days. For another example, Plato argued in Timaeus that there are two kinds of “fire” involved in our sense of vision, one coming from inside ourselves, emanating as visual rays from our eyes to make contact with distant objects, and another, which he called “daylight”, that (when present) surrounds the visual rays from our eyes and facilitates the conveyance of the visual images. These two kinds of “fire” correspond roughly with the later scholastic concepts of lux and lumen. The word lux was used to signify our visual sensations, whereas the word lumen referred to an external agent (such as light from the sun) that somehow participates in our sense of vision. There was also, in ancient times, a competing theory of vision, according to which all objects naturally emit whole “images” (eidola) of themselves in small packets, and these enter our souls by way of our eyes. To account for our inability to see at night, it was thought that light from the sun or moon struck the objects and caused them to emit their images. This model of vision still entailed two distinct kinds of light: the facilitating illumination from the sun or moon, and the eidola emitted by ordinary objects. This somewhat awkward conception of vision was improved by Ibn al-Haitham and later by Kepler, who argued that it is not necessary to assume whole objects emit multiple copies of themselves; we can simply consider each tiny part of an object as the source of rays emanating in all directions, and a sub-set of these rays intersecting in the eye can be reassembled into an image of the object. Until the end of the 17th century there was no evidence to indicate that rays of light propagated at a finite speed, and they were often assumed to be instantaneous. Only in 1689 with Roemer’s observations of the moons of Jupiter, and even more convincingly in 1728 with Bradley’s discovery of stellar aberration, did it become clear that the rays of lumen propagate through space with a characteristic finite speed. This suggested that light, and the energy it conveys, must have some mode of existence during the interval of time between its emission and its absorption. Hence light became an entity or process in itself, rather than just a relation between entities, but again there were two competing notions as to the mode of existence. Two different analogies were conceived, based on the behavior of ordinary material substances. Some thought light could be regarded as a stream of material corpuscles moving through empty space, whereas other believed light consists of undulations or waves in a pervasive material medium. Each of these analogies was consistent with some of the attributes of light, but neither could be reconciled fully with all the attributes. For example, if light consists of material corpuscles, then according to Galilean relativity there should be an inertial reference frame with respect to which light is at rest in a vacuum, whereas in fact we never observe light in a vacuum to be at rest, nor even noticeably slow, with respect to any inertial reference frame. On the other hand, if light is a wave propagating through a material medium, then the constituent parts of that medium should, according to Galilean relativity, behave inertially, and in particular should have a definite rest frame, whereas we find that light propagates best through regions (vacuum) in which there is no detectable material with a definite rest frame, and again we cannot conceive of light at rest in any inertial frame. Thus the behavior of light defies realistic representation in terms of the behavior of material substances within the framework of Galilean space and time, even if we consider just the classical attributes, let alone quantum phenomena.

By the end of the 19th century the inadequacy of both of the materialistic analogies for explaining the behavior of light had become acute, because there was strong evidence that light exhibits two seemingly mutually exclusive properties. First, Maxwell showed how light can be regarded as a propagating electromagnetic wave, and as such the speed of propagation is obviously independent of the speed of the source. Second, numerous experiments showed that light propagates at the same speed in all directions relative to the source, just as we would expect for streams of inertial corpuscles. Hence some of the attributes of light seemed to unequivocally support an emission theory, while others seemed just as unequivocally to support a wave theory. In retrospect it’s clear that there was an underlying confusion regarding the terms of description, i.e., the systems of inertial coordinates, but this was far from clear at the time. One of the first clues to unraveling the mystery was found in 1887, when Woldemar Voigt made a remarkable discovery concerning the ordinary wave equation. Recall that the wave equation for a time-dependent scalar field ϕ(x,t) in one dimension is

where u is the propagation speed of the wave. This equation was first studied by Jean d'Alembert in the 18th century, and it applies to a wide range of physical phenomena. In fact it seems to represent a fundamental aspect of the relationship between space, time, and motion, transcending any particular application. Traditionally it was considered to be valid only for a coordinate system x,t with respect to which the wave medium (presumed to be an inertial substance) is at rest and has isotropic properties, because if we apply a Galilean transformation to these coordinates, the wave equation is not satisfied with respect to the transformed coordinates. However, Galilean transformations are not the most general possible linear transformations. Voigt considered the question of whether there is any linear transformation that leaves the wave equation unchanged. The general linear transformation between (X,T) and (x,t) is of the form

for constants A,B,C,D. If we choose units of space and time so that the acoustic speed u equals 1, the wave equation in terms of (X,T) is simply 2ϕX2 = 2ϕ/T2. To express this equation in terms of the transformed (x,t) coordinates, recall that the total differential of ϕ can be written in the form

Also, at any constant T, the value of ϕ is purely a function of X, so we can divide through the above equation by dX to give

Taking the partial derivative of this with respect to X then gives

Since partial differentiation is commutative, this can be written as

Substituting the prior expression for ϕ/dX and carrying out the partial differentiations gives an expression for 2ϕ/X2 in terms of partials of ϕ with respect to x and t. Likewise we can derive an expression for 2ϕ/T2. Substituting into the wave equation gives

This is equivalent to the condition that ϕ(X,T) is a solution of the wave equation with respect to the X,T coordinates. Since the mixed partial generally varies along a path of constant second partial with respect to x or t, it follows that a necessary and sufficient condition for ϕ(x,t) to also be a solution of the wave equation in terms of the x,t coordinates is that the constants A,B,C,D of our linear transformation satisfy the relations

Furthermore, the differential of the space transformation is dx = AdX + BdT, so an increment with dx = 0 satisfies dX/dT = -B/A. This represents the velocity at which the spatial origin of the x,t coordinates is moving relative to the X,T coordinates. We will refer to this velocity as v. We also have the inverse transformation from (X,T) to (x,t):

Proceeding as before, the differential of this space transformation gives dx/dt = B/D for the velocity of the spatial origin of the X,T coordinates with respect to the x,t coordinates, and this must equal v. Therefore we have B = Av = Dv, and so A = D. It follows from the condition imposed by the wave equation that B = C, so both of these equal Av. Our transformation can then be written in the form

The same analysis shows that the perpendicular coordinates y and z of the transformed system must be given by

In order to make the transformation formula for x agree with the Galilean transformation, Voigt chose A = 1, so he did not actually arrive at the Lorentz transformation, but nevertheless he had shown roughly how the wave equation could actually be relativistic – just like the dynamic behavior of inertial particles – provided we are willing to consider a transformation of the space and time coordinates that differs from the Galilean transformation. Had he considered the inverse transformation

he might have noticed that the determinant is A2(1v2), so to make this equal to 1 we must have A = 1/(1v2)1/2, which not only implies y = Y and z = Z, but also makes the transformation formally identical to its inverse. In other words, he would have arrived at a completely relativistic framework for the wave equation. However, this was not Voigt’s objective, and he evidently regarded the transformed coordinates x, y, z and t as merely a convenient parameterization for purposes of calculation, without attaching any greater significance to them. Voigt’s transformation was the first hint of how a wavelike phenomenon could be compatible with the principle of relativity, which (as summarized in the preceding section) is that there exist inertial coordinate systems in terms of which free motions are linear, inertia is isotropic, and every material object is instantaneously at rest with respect to one of these systems. None of this conflicts with the observed behavior of light, because the motion of light is observed to be both linear and isotropic with respect to inertial coordinate systems. The fact that light is not at rest with respect to any system of inertial coordinates does not conflict with the principle of relativity if we agree that light is not a material object. The incompatibility of light with the Galilean framework arises not from any conflict with the principle of relativity, but from the tacitly adopted empirical conclusion that two relatively moving systems of inertial coordinates are related to each other by Galilean transformations, so that the composition of co-linear speeds is simply additive. As

discussed in the previous section, we aren't free to impose this assumption on the class of inertial coordinate systems, because they are fully determined by the requirement for inertia to be homogeneous and isotropic. There are no more adjustable parameters (aside from insignificant scale factors), so the composition of velocities with respect to relatively moving inertial coordinate systems is a matter to be determined empirically. Recall from the previous section that, on the basis of slowly moving reference frames, Galileo and Newton had inferred that the composition of speeds was simply additive. In other words, if a material object B is moving at the speed v in terms of inertial rest frame coordinates of a material object A, and if an object C is moving in the same direction at the speed u in terms of inertial rest frame coordinates of B, then Newton found that object C has the speed v + u in terms of the inertial rest frame coordinates of A. Toward the end of the nineteenth century, more precise observations revealed that is not quite correct. It was found that the speed of object C in terms of inertial rest frame coordinates of A is not v + u, but rather (v + u)/(1 + uv/c2), where c is the speed of light in a vacuum. Obviously these conclusions would be identical if the speed of light was infinitely great, which was still considered a real possibility in Galileo's day. Many people, including Descartes, regarded rays of light as instantaneous. Even Newton's Opticks, published in 1704, made allowances for the possibility that "light be propagated in an instant" (although Newton himself was persuaded by Roemer's observations that light has a finite speed). Hence it can be argued that the principles of Galileo and Einstein are essentially identical in both form and content. The only difference is that Galileo assessed the propagation of light to be "if not instantaneous then extraordinarily fast", and thus could neglect the term uv/c2, especially since he restricted his considerations to the movements of material objects, whereas subsequently it became clear that the speed of light has a finite value, and it was necessary to take account of the uv/c2 term when attempting to incorporating the motions of light and high-speed particles into the framework of mechanics. The empirical correspondence between inertial isotropy and lightspeed isotropy can be illustrated by a simple experiment. Three objects, A, B, and C, at rest with respect to each other can be arranged so that one of them is at the midpoint between the other two (the midpoint having been determined using standard measuring rods at rest with respect to those objects). The two outer objects, A and C, are equipped with identical clocks, and the central object, B, is equipped with two identical cannons. Let the two cannons in the center be fired simultaneously in opposite directions toward the two outer objects, and then at a subsequent time let object B emit a flash of light. If the arrivals of the cannonball and light coincide at A, then they also coincide at C, signifying that the propagation of light is isotropic with respect to the same system of coordinates in terms of which mechanical inertia is isotropic, as illustrated in the figure below.

The fact that light emitted from object B propagates isotropically with respect to B's inertial rest frame might seem to suggest that light can be treated as an inertial object within the Galilean framework, just like cannon-balls. However, we also find that if the light is emitted at the same time and place from an object D that is moving with respect to B (as shown in the figure above), the light's speed is still isotropic with respect to B's inertial rest frame. Now, this might seem to suggest that light is a disturbance in a material medium in which the objects A,B,C just happen to be at rest, but this is ruled out by the fact that it applies regardless of the state of (uniform) motion of those objects. Naturally this implies that the flash of light propagates isotropically with respect to the inertial rest coordinates of object D as well. To demonstrate this, we could arrange for two other bodies, denoted by E and F, to be moving at the same speed as D, and located an equal distance from D in opposite directions. Then we could fire two identically constructed cannons (at rest with respect to D) in opposite directions, toward E and F. The results are illustrated below.

The cannons are fired from D when it crosses the x axis, and the cannon-balls strike E and F at the events marked α and β, coincident with the arrival of the light pulse from D. Obviously the time axis for the inertial rest frame coordinates of object D is the worldline of D itself (rather than the original "t" axis shown on the figure). In addition, since inertial coordinates are defined such that mechanical inertia is isotropic, it follows that the cannon-balls fired from identical cannons at rest with D are moving with equal and opposite speeds with respect to D's inertial rest coordinates, and since E and F are at equal distances from D, it also follows that the events a and b are simultaneous with respect to the inertial rest coordinates of D. Hence, not only is the time axis of D's rest frame slanted with respect to B's time axis, the spatial axis of D's rest frame is equally slanted with respect to B's spatial axis. Several other important conclusions can be deduced from this figure. For example, with respect to the original x,t coordinate system, the speeds of the cannon-balls from D are not given by simply adding (or subtracting) the speed of the cannon-balls with respect to D's rest frame to (or from) the speed of D with respect to the x,t coordinates. Since momentum is explicitly conserved, this implies that the inertia of a body increases with it's velocity (i.e., kinetic energy), as is discussed in more detail in Section 2.3. We should also note that although the speed of light is isotropic with respect to any inertial spacetime coordinates, independent of the motion of the source, it is not correct to say that the light itself is isotropic. The relationship between the frequency (and energy) of the light with respect to the rest frame of the emitting body and the frequency (and energy) of the light with respect to the rest frame of the receiving body does depend on the relative velocity between those two massive bodies (as discussed in Chapter 2.4). Incidentally, notice that we can rule out the possibility of object B and D dragging the light medium along with them, because they are moving through the same region of space at the same time, and they can't both be dragging the same medium in opposite directions. This is in contrast to the case of (for example) acoustic pressure waves in a material substance, because in that case a recognizable material substance determines the unique isotropic frame, whereas in the case of light we're unable to identify any definite material medium, so the medium has no definite rest frame. The first person to discern the true relationship between relatively moving systems of inertial coordinate systems was Hendrik Antoon Lorentz. Not surprisingly, he arrived at this conception in a rather indirect and laborious way, and didn't immediately recognize that the class of coordinate systems he had discovered (and which he called "local coordinate" systems) were none other than Galileo's inertial coordinate systems. Incidentally, although Lorentz and Voigt knew and corresponded with each other, Lorentz apparently was not aware of Voigt’s earlier work on coordinate transformations that leave the wave equation invariant, and so that work had no influence on Lorentz’s search for coordinate systems in terms of which Maxwell’s equations are invariant. Unlike Voigt, Lorentz derived the transformation in two separate stages. He first developed the "local time" coordinate, and only years later came to the conclusion (after, but independently of, Fitzgerald) that a "contraction" of spatial length was also necessary in order to account

for the absence of second-order effects in Michelson's experiment. Lorentz began with the absolute ether frame coordinates t and x, in terms of which every event can be assigned a unique space-time position (t,x), and then he considered a system moving with the velocity v in the positive x direction. He applied the traditional Galilean transformation to assign a new set of coordinates to every event. Thus an event with ether-frame coordinates t,x is assigned the new coordinates x" = x  vt and t" = t. Then he tentatively proposed an additional transformation that must be applied to x",t" in order to give coordinates in terms of which Maxwell's equations apply in their standard form. Lorentz was not entirely clear about the physical significance of these “local” coordinates, but it turns out that all physical phenomena conform to the same isotropic laws of physics when described in terms of these coordinates. (Lorentz's notation made use of the parameter β = 1/γ = 1/(1v2)1/2 and another constant which he later determines to be 1.) Taking units such that c = 1, his equations for the local coordinates x' and t' in terms of the Galilean coordinates which we are calling x" and t" are

Recall that the traditional Galilean transformation is x" = x  vt and t" = t, so we can make these substitutions to give the complete transformation from the original ether rest frame coordinates x,t to the local coordinates moving with speed v

These effective coordinates enabled Lorentz to explain how two relatively moving observers, each using his own local system of coordinates, both seem to remain at the center of expanding spherical light waves originating at their point of intersection, as illustrated below

The x and x' axes represent the respective spatial coordinates (say, in the east/west

direction), and the t and t' axes represent the respective time coordinates. One observer is moving through time along the t axis, and the other has some relative westward velocity as he moves through time along the t' axis. The two observers intersect at the event labeled O, where they each emit a pulse of light. Those light pulses emanate away from O along the dotted lines. Subsequently the observer moving along the t axis finds himself at C, and according to his measures of space and time the outward going light waves are at E and W at that same instant, which places him at the midpoint between them. On the other hand, the observer moving along t' axis finds himself at point c, and according to his measures of space and time the outward going light waves are at e and w at this instant, which implies that he is at the midpoint between them. Thus Lorentz discovered that by means of the "fictitious" coordinates x',t' it was possible to conceive of a class of relatively moving coordinate systems with respect to which the speed of light is invariant. He went beyond Voigt in the realization that the existence of this class of coordinate systems ensures the appearance of relativity, at least for optical phenomena, and yet, like Voigt, he still tended to regard the "local coordinates" as artificial. Having been derived specifically for electromagnetism, it was not clear that the same transformations should apply to all physical phenomena, including inertia, gravity, and whatever forces are responsible for the stability of matter – at least not without simply hypothesizing this to be the case. However, Lorentz was dissatisfied with the proliferation of hypotheses that he had made in order to arrive at this theory. The same criticism was made in a contemporary review of Lorentz's work by Poincare, who chided him with the remark "hypotheses are what we lack least". The most glaring of these was the hypothesis of contraction, which seemed distinctly "ad hoc" to most people, including Lorentz himself originally, but gradually he came to realize that the contraction hypothesis was not as unnatural as it might seem. Surprising as this hypothesis may appear at first sight, yet we shall have to admit that it is by no means far-fetched, as soon as we assume that molecular forces are also transmitted through the ether, like the electric and magnetic forces… He set about trying to show (admittedly after the fact) that the Fitzgerald contraction was to be expected based on what he called the Molecular Force Hypothesis and his theorem of Corresponding States, as discussed in the next section. 1.5 Corresponding States It would be more satisfactory if it were possible to show by means of certain fundamental assumptions - and without neglecting terms of any order - that many electromagnetic actions are entirely independent of the motion of the system. Some years ago I already sought to frame a theory of this kind. I believe it is now possible to treat the subject with a better result. H. A. Lorentz

In 1889 Oliver Heaviside deduced from Maxwell’s equations that the electric and magnetic fields on a spherical surface of radius r surrounding a uniformly moving electric charge e are radial and circumferential respectively, with magnitudes

where θ is the angle relative to the direction of motion with respect to the stationary frame of reference. (We have set c = 1 for clarity.) The left hand equation implies that, in comparison with a stationary charge, the electric field strength at a distance r from a moving charge is less by a factor of 1v2 in the direction of motion, and greater by a factor of 1/(1v2)1/2 in the perpendicular directions. Thus the strength of the electric field of a moving charge is anisotropic. These equations imply that

which Heaviside recognized as the convection potential, i.e., the scalar field whose gradient is the total electromagnetic force on a co-moving charge at that relative position. This scalar is invariant under Lorentz transformations, and it follows from the above formula that the cross-section of surfaces of constant potential are described by

This is the equation of an ellipse, so Heaviside’s formulas imply that the surfaces of constant potential are ellipsoids, shortened in the direction of motion by the factor (1 v2)1/2. From the modern perspective the contraction of characteristic lengths in the direction of motion is an immediate corollary of the fact that Maxwell’s equations are Lorentz covariant, but at the time the idea of anisotropic changes in length due to motion was regarded as a distinct and somewhat unexpected attribute of electromagnetic fields. It wasn’t until 1896 that Searle explicitly pointed out that Heaviside’s formulas imply the contraction of surfaces of constant potential into ellipsoids, but already in 1889 it seems that Heaviside’s findings had prompted an interesting speculation as to the deformation of stable material objects in uniform motion. George Fitzgerald corresponded with Heaviside, and learned of the anisotropic variations in field strengths for a moving charge, and this was at the very time when he was struggling to understand the null result of the latest Michelson and Morley ether drift experiment (performed in 1887). It occurred to Fitzgerald that the null result would be explained if the material comprising Michelson’s apparatus contracts in the direction of

motion by the factor (1v2)1/2, and moreover that this contraction was not entirely implausible, because, as he wrote in a brief letter to the American journal Science in 1889 We know that electric forces are affected by the motion of the electrified bodies relative to the ether and it seems a not improbable supposition that the molecular forces are affected by the motion and that the size of the body alters consequently. A few years later (1892) Lorentz independently came to the same conclusion, and proceeded to explain in detail how the variations in the electromagnetic field implied by Maxwell’s equations actually result in a proportional contraction of matter – at least if we assume the forces responsible for the stability of matter are affected by motion in the same way as the forces of electromagnetism. This latter assumption Lorentz called the “molecular force hypothesis”, admitting that he had no real justification for it (other than the fact that it accounted for Michelson’s null result). On the basis of this hypothesis, Lorentz showed that the description of the equilibrium configuration of a uniformly moving material object in terms of its “local coordinates” is identical to the description of the same object at absolute rest in terms of the ether rest frame coordinates. He called this the theorem of corresponding states. To illustrate, consider a small bound spherical configuration of matter at rest in the ether. We assume the forces responsible for maintaining the spherical structure of this particle are affected by uniform motion through the ether in exactly the same way as are electromagnetic forces, which is to say, they are covariant with respect to Lorentz transformations. These forces may propagate at any speed (at or below the speed of light), but it is most convenient for descriptive purposes to consider forces that propagate at precisely the speed of light (in terms of the fixed rest frame coordinates of the ether), because this automatically ensures Lorentz covariance. A wave emanating from the geometric center of the particle at the speed c would expand spherically until reaching the radius of the configuration, where we can imagine that it is reflected and then contracts spherically back to a point (like a spatial filter) and re-expands on the next cycle. This is illustrated by the left-hand cycle below.

Only two spatial dimensions are shown in this figure. (In four-dimensional spacetime each shell is actually a sphere.) Now, if we consider an intrinsically identical

configuration of matter in uniform motion relative to the putative rest frame of the ether, and if the equilibrium shape is maintained by forces that are Lorentz covariant, just as is the propagation of electromagnetic waves, then it must still be the case that an electromagnetic wave can expand from the center of the configuration to the perimeter, and be reflected back to the center in a coherent pattern, just as for the stationary configuration. This implies that the absolute shape of the configuration must change from a sphere to an ellipsoid, as illustrated by the right-hand figure above. The spatial size of the particle in terms of the ether rest frame coordinates is just the intersection of a horizontal time slice with the region swept out by the perimeter of the configuration. For any given characteristic particle, since there is no motion relative to the ether in the transverse direction, the size in the transverse direction must be unaffected by the motion. Thus the widths of the configurations in the "y" direction in the above figures are equal. The figure below shows more detailed side and top views of one cycle of a stationary and a moving particle (with motions referenced to the rest frame of the putative ether).

It's understood that these represent corresponding states, i.e., intrinsically identical equilibrium configurations of matter, whose spatial shapes are maintained by Lorentz covariant forces. In each case the geometric center of the configuration progresses from point A to point B in the respective figure. The right-hand configuration is moving with a speed v in the positive x direction. It can be shown that the transverse sizes of the configurations are equal if the projected areas of the cross-sectional side views (the lower figures) are equal. Thus, light emanating from point A of the moving particle extends a distance 1/λ to the left and a distance λ to the right, where λ is a constant function of v. Specifically, we must have

where we have set c = 1 for clarity. The leading edge of the shaft swept out by the

moving shell crosses the x axis at a distance λ(1v) from the center point A, which implies that the object's instantaneous spatial extent from the center to the leading edge is only

Likewise it's easy to see that the elapsed time (according to the putative ether rest frame coordinates) for one cycle of the moving particle, i.e., from point A to point B, is simply

compared with an elapsed time of 2 for the same particle at rest. Hence we unavoidably arrive at Fitzgerald's length contraction and Lorentz's local time dilation for objects in motion with respect to the x,y,t coordinates, provided only that all characteristic spatial and temporal intervals associated with physical entities are maintained for forces that are Lorentz covariant. The above discussion did not invoke Maxwell’s equations at all, except to the extent that those equations suggested the idea that all the fundamental forces are Lorentz covariant. Furthermore, we have so far omitted consideration of one very important force, namely, the force of inertia. We assumed the equilibrium configurations of matter were maintained by certain forces, but if we consider oscillating configurations, we see that the periodic shapes of such configurations depend not only on the binding force(s) but also on the inertia of the particles. Therefore, in order to arrive at a fully coherent theorem of corresponding states, we must assume that inertia itself is Lorentz covariant. As Lorentz wrote in his 1904 paper …the proper relation between the forces and the accelerations will exist… if we suppose that the masses of all particles are influenced by a translation to the same degree as the electromagnetic masses of the electrons. In other words, we must assume the inertial mass (resistance to acceleration) of every particle is Lorentz covariant, which implies that the mass has transverse and longitudinal components that vary in a specific way when the particle is in motion. Now, it was known that some portion of a charged object’s resistance to acceleration is due to self-induction, because a moving charge constitutes an electric current, which produces a magnetic field, which resists changes in the current. Not surprisingly, this resistance to acceleration is Lorentz covariant, because it is a purely electromagnetic effect. At one time it was thought that perhaps all mass (even of electrically neutral particles) might be electromagnetic in origin, and some even hoped that gravity and the unknown forces governing the stability of matter would also someday be shown to be electromagnetic, leading to a totally electromagnetic world view. (Ironically, at this same time, others were trying to maintain the mechanical world view, by seeking to explain the phenomena of

electromagnetism in terms of mechanical models.) If in fact all physical effects are ultimately electromagnetic, one could plausibly argue that Lorentz had succeeded in developing a constructive account of relativity, based on the known properties of electromagnetism. Essentially this would have resolved the apparent conflict between the Galilean relativity of mechanics and Lorentzian relativity of electromagnetism, by asserting that there is no such thing as mechanics, there is only electromagnetism. Then, since electromagnetism is Lorentz covariant, it would follow that everything is Lorentz covariant. However, it was already known (though perhaps not well known) when Lorentz wrote his paper in 1904 that the electromagnetic world view is not tenable. Poincare pointed this out in his 1905 Palermo paper, in which he showed that the assumption of a purely electromagnetic electron was self-consistent only with the degenerate solution of no charge density at all. Essentially, the linearity of Maxwell’s equations implies that they can not possibly yield stable bound configurations of charge. Poincare wrote We must then admit that, in addition to electromagnetic forces, there are also nonelectromagnetic forces or bonds. Therefore, we need to identify the conditions that these forces or bonds must satisfy for electron equilibrium to be undisturbed by the [Lorentz] transformation. In the remainder of this remarkable paper, Poincare derives general conditions that Lorentz covariant forces must satisfy, and considers in particular the force of gravity. The most significant point is that Poincare had recognized that Lorentz had reached the limit of his constructive approach, and instead he (Poincare) was proceeding not to deduce the necessity of relativity from the phenomena of electromagnetism or gravity, but rather to deduce the necessary attributes of electromagnetism and gravity from the principle of relativity. In this sense it is fair to say that Poincare originated a theory of relativity in 1905 (simultaneously with Einstein). On the other hand, both Poincare and Lorentz continued to espouse the view that relativity was only an apparent fact, resulting from the circumstance that our measuring instruments are necessarily affected by absolute motion in the same way as are the things being measured. Thus they believed that the speed of light was actually isotropic only with respect to one single inertial frame of reference, and it merely appeared to be isotropic with respect to all the others. Of course, Poincare realized full well (and indeed was the first to point out) that the Lorentz transformations form a group, and the symmetry of this group makes it impossible, even in principle, to single out one particular frame of reference as the true absolute frame (in which light actually does propagate isotropically). Nevertheless, he and Lorentz both argued that there was value in maintaining the belief in a true absolute rest frame, and this point of view has continued to find adherents down to the present day. As a historical aside, Oliver Lodge claimed that Fitzgerald originally suggested the deformation of bodies as an explanation of Michelson’s null result …while sitting in my study at Liverpool and discussing the matter with me. The suggestion bore the impress of truth from the first.

Interestingly, Lodge interpreted Fitzgerald as saying not that objects contract in the direction of motion but that they expand in the transverse direction. We saw in the previous section how Voigt’s derivation of the Lorentz transformation left the scale factor undetermined, and the evaluation of this factor occupied a surprisingly large place in the later writings of Lorentz, Poincare, and Einstein. In his book The Ether of Space (1909) Lodge provided an explanation for why he believed the effect of motion should be a transverse expansion rather than a longitudinal contraction. He wrote When a block of matter is moving through the ether of space its cohesive forces across the line of motion are diminished, and consequently in that direction it expands… Lodge’s reliability is suspect, since he presents this as an “explanation” not only of Fitzgerald’s suggestion but also of Lorentz’s theory, which it definitely is not. But more importantly, Lodge’s misunderstanding highlights one of the drawbacks of conceiving of the deformation effect as arising from variations in electromagnetic forces. In order to give a coherent account of phenomena, the lengths of objects must vary in exactly the same proportion as the distances between objects. It would be quite strange to suppose that the transverse distances between (neutral and widely separated) objects would increase by virtue of being set in motion along parallel lines. In fact, it is not clear what this would even mean. If three or more objects were set in parallel motion, in which direction would they be deflected? And what could be the cause of such a deflection? Neutral objects at rest exert a small attractive force on each other (due to gravity), but diminishing this net force of cohesion would obviously not cause the objects to repel each other. Oddly enough, if Lodge had focused on the temporal instead of the spatial effects of motion, his reasoning would have approximated a valid justification for time dilation. This justification is often illustrated in terms two mirror in parallel motion, with a pulse of light bouncing between them. In this case the motion of the mirrors actually does diminish the frequency of bounces, relative to the stationary ether frame, because the light must travel further between each reflection. Thus the time intervals “expand” (i.e., dilate). Given this time dilation of the local moving coordinates, it’s fairly obvious that there must be a corresponding change in the effective space coordinate (since spatial lengths are directly related to time intervals by dx = vdt). In other words, if an observer moves at speed v relative to the ground, and passes over an object of length L at rest on the ground, the length of the object as assessed by the moving observer is affected by his measure of time. Since he is moving at speed v, the length of the object is vdt, where dt is the time it takes him to traverse the length of the object – but which "dt" will he use? Naturally if he bases his length estimate on the measure of the time interval recorded on a ground clock, he will have dt = L/v, so he will judge the object to be v(L/v) = L units in length. However, if he uses his own effective time as indicated on his own co-moving transverse light clock, he will have dt' = dt (1v2)1/2, so the effective length is v[(L/v)(1 v2)1/2] = L(1v2)1/2. Thus, effective length contraction (and no transverse expansion) is logically unavoidable given the effective time dilation.

It might be argued that we glossed over an ambiguity in the above argument by considering only light clocks with pulses moving transversely to the motion of the mirrors, giving the relation dt' = dt(1v2)1/2. If, instead, we align the axis between the mirrors with the direction of travel, we get dt’ = dt(1v2), so it might seem we have an ambiguous measure of local time, and therefore an ambiguous prediction of length contraction since, by the reasoning given above, we would conclude that an object of rest-length L has the effective length L(1v2). However, this fails to account for the contraction of the longitudinal distance between the mirrors (when they are arranged along the axis of motion). Since by construction the speed of light is c in terms of the local coordinates for the clock, the very same analysis that implies length contraction for objects moving relative to the ether rest frame coordinates also implies the same contraction for objects moving relative to the new local coordinates. Thus the clock is contracted in the longitudinal direction relative to the ground's coordinates by the same factor that objects on the ground are contracted in terms of the moving coordinates. The amount of spatial contraction depends on the amount of time dilation, which depends on the amount of spatial contraction, so it might seem as if the situation is indeterminate. However, all but one of the possible combinations are logically inconsistent. For example, if we decided that the clock was shortened by the full longitudinal factor of (1 v2), then there would be no time dilation at all, but with no time dilation there would be no length contraction, so this is self-contradictory. The only self-consistent arrangement that reconciles each reference frame's local measures of longitudinal time and length is with the factor (1v2)1/2 applied to both. This also agrees with the transverse time dilation, so we have isotropic clocks with respect to the local (i.e., inertial) coordinates of any uniformly moving frame, and by construction the speed of light is c with respect to each of these systems of coordinates. This is illustrated by the figures below, showing how the spacetime pattern of reflecting light rays imposes a skew in both the time and the space axes of relatively moving systems of coordinates.

A slightly different approach is to notice that, according to a "transverse" light clock, we have the partial derivative ∂t/∂T = 1/(1v2)1/2 along the absolute time axis, i.e., the line X = 0. Integrating gives t = (T  f(X))/(1v2)1/2 where f(x) is an arbitrary function of X. The question is: Does there exist a function f(X) that will yield physical relativity? If such a function exists, then obviously the resulting coordinates are the ones that will be adopted

as the rest frame by any observer at rest with respect to them. Such a function does indeed exist, namely, f(X) = vX, which gives t = (TvX)/(1v2)1/2. To show reciprocity, note that X = vT along the t axis, so we have t = T(1v2)/(1v2)1/2, which gives T = t/(1 v2)1/2 and so ∂T/∂t = 1/(1v2)1/2. As we've seen, this same transformation yields relativity in the longitudinal direction as well, so there does indeed exist, for any object in any state of motion, a coordinate system with respect to which all optical phenomena are isotropic, and as a matter of empirical fact this is precisely the same class of systems invoked by Galileo's principle of mechanical relativity, the inertial systems, i.e., coordinate systems with respect to which mechanical inertia is isotropic. Lorentz noted that the complete reciprocity and symmetry between the "true" rest frame coordinates and each of the local effective coordinate systems may seem surprising at first. As he said in his Leiden lectures in 1910 The behavior of measuring rods and clocks in translational motion, when viewed superficially, gives rise to a remarkable paradox, which on closer examination, however, vanishes. The seeming paradox arises because the Lorentz transformation between two relatively moving systems of inertial coordinates (x,t) and (X,T) implies ∂t/∂T = ∂T/∂t, and there is a temptation to think this implies (dt)2 = (dT)2. Of course, this “paradox” is based on a confusion between total and partial derivatives. The parameter t is a function of both X and T, and the expression ∂t/∂T represents the partial derivative of t with respect to T at constant X. Likewise T is a function of both x and t, and the expression ∂T/∂t represents the partial derivative of T with respect to t at constant x. Needless to say, there is nothing logically inconsistent about a transformation between (x,t) and (X,T) such that (t/T)X equals (T/t)x, so the “paradox” (as Lorentz says) vanishes. The writings of Lorentz and Poincare by 1905 can be assembled into a theory of relativity that is operationally equivalent to the modern theory of special relativity, although lacking the conceptual clarity and coherence of the modern theory. Lorentz was justifiably proud of his success in developing a theory of electrodynamics that accounted for all the known phenomena, explaining the apparent relativity of these phenomena, but he was also honest enough to acknowledge that the success of his program relied on unjustified hypotheses, the most significant of which was the hypothesis that inertial mass is Lorentz covariant. To place Lorentz’s achievement in context, recall that toward the end of the 19th century it appeared electromagnetism was not relativistic, because the property of being relativistic was equated with being invariant under Galilean transformations, and it was known that Maxwell’s equations (unlike Newton’s laws of mechanics) do not possess this invariance. Lorentz, prompted by experimental results, discovered that Maxwell’s equations actually are relativistic, in the sense of his theorem of corresponding states, meaning that there are relatively moving coordinate systems in terms of which Maxwell’s equations are still valid. But these systems are not related by Galilean transformations, so it still appeared that mechanics (presumed to be Galilean covariant) and electrodynamics were not mutually relativistic, which meant it ought to be possible to discern second-order effects of absolute motion by exploiting the difference

between the Galilean covariant of mechanics and Lorentz covariance of electromagnetism. However, all experiments refuted this expectation. In other words, it was found empirically that electromagnetism and mechanics are mutually relativistic (at least to second order). Hence the only possible conclusion is that either the known laws of electromagnetism or the known laws of mechanics must be subtly wrong. Either the correct laws of electromagnetism must really be Galilean covariant, or else the correct laws of inertial mechanics must really be Lorentz covariant. At this point, in order to “save the phenomena”, Lorentz simply assumed that inertial mass is Lorentz covariant. Of course, he had before him the example of self-induction of charged objects, leading to the concept of electromagnetic mass, which is manifestly Lorentz covariant, but, as Poincare observed, it is not possible (and doesn’t even make sense) for the intrinsic mass of elementary particles to be electromagnetic in origin. Hence the hypothesis of Lorentz covariance for inertia (and therefore inertial mechanics) is not a “constructive” deduction; it is not even implied by the molecular force hypothesis (because there is no reason to suppose that anything analogous to “self-induction” of the unknown molecular forces is ultimately responsible for inertia); it is simply a hypothesis, motivated by empirical facts. This does not diminish Lorentz’s achievement, but it does undercut his comment that “Einstein simply postulates what we have deduced… from the fundamental equations of the electromagnetic field”. In saying this, Lorentz overlooked that fact that the Lorentz covariance of mechanical inertia cannot be deduced from the equations of electromagnetism. He simply postulated it, no less than Einstein did. Much of the confusion over whether Lorentz deduced or postulated his results is due to confusion between the two aspects of the problem. First, it was necessary to determine that Maxwell’s equations are Lorentz covariant. This was in fact deduced by Lorentz from the laws themselves, consistent with his claim. But in order to arrive at a complete theory of relativity (and in particular to account for the second-order null results) it is also necessary to determine that mechanical inertia (and molecular forces, and gravity) are all Lorentz covariant. This proposition was not deduced by Lorentz (or anyone else) from the laws of electromagnetism, nor could it be, because it does not follow from those laws. It is merely postulated, just as we postulate the conservation of energy, as an organizing principle, justified by it’s logical cogency and empirical success. As Poincare clearly explained in his Palermo paper, the principle of relativity itself emerges as the only reliable guide, and this is as true for Lorentz’s approach as it is for Einstein’s, the main difference being that Einstein recognized this principle was not only necessary, but also that it obviated the detailed assumptions as to the structure of matter. Hence, even with regard to electromagnetism (let alone mechanics) Lorentz could write in the 1915 edition of his Theory of Electrons that If I had to write the last chapter now, I should certainly have given a more prominent place to Einstein’s theory of relativity, by which the theory of electromagnetic phenomena in moving systems gains a simplicity that I had not been able to attain.

Nevertheless, as mentioned previously, Lorentz and Poincare both continued to espouse the merits of the absolute interpretation of relativity, although Poincare’s seemed to regard the distinction as merely conventional. For example, in a 1912 lecture he said The new conception … according to which space and time are no longer two separate entities, but two parts of the same whole, which are so intimately bound together that they cannot be easily separated… is a new convention [that some physicists have adopted]… Not that they are constrained to do so; they feel that this new convention is more comfortable, that’s all; and those who do not share their opinion may legitimately retain the old one, to avoid disturbing their ancient habits. Between ourselves, let me say that I feel they will continue to do so for a long time still. Sadly, Poincare died just two months later, but his prediction has held true, because to this day the “ancient habits” regarding absolute space and time persist. There are today scientists and philosophers who argue in favor of what they see as Lorentz’s constructive approach, especially as a way of explaining the appearance of relativity, rather than merely accepting relativity in the same way we accept (for example) the principle of energy conservation. However, as noted above, the constructiveness of Lorentz’s approach begins and ends with electromagnetism, the rest being conjecture and hypothesis, so this argument in favor of the Lorentzian view is misguided. But setting this aside, is there any merit in the idea that the absolutist approach effectively explains the appearance of relativity? To answer this question, we must first clearly understand what precisely is to be explained when one seeks to “explain” relativity. As discussed in section 1.2, we are presented with many relativities in nature, such as the relativity of spatial orientation. It’s important to bear in mind that this relativity does not assert that the equilibrium lengths of solid objects are unaffected by orientation; it merely asserts that all such lengths are affected by orientation in exactly the same proportion. It’s conceivable that all solid objects are actually twice as long when oriented toward (say) the Andromeda galaxy than when oriented perpendicular to that direction, but we have no way of knowing this. Hence if we begin with the supposition that all objects are twice as long when pointed toward Andromeda, we could deduce that all lengths will appear to be independent of orientation, because they are all affected equally. But have we thereby “explained” the apparent isotropy of spatial lengths? Not at all, because the thing to be explained is the symmetry, i.e., why the lengths of all solid configurations, whether consisting of gold or wood, maintain exactly the same proportions, independent of their spatial orientations. The Andromeda axis theory does not explain this physical symmetry. Instead, it explains something different, namely, why the Andromeda axis theory appears to be false even though it is (by supposition) true. This is certainly a useful (indeed, essential) explanation for anyone who accepts, a priori, the truth of the Andromeda axis theory, but otherwise it is of very limited value. Likewise if we accept absolute Galilean space and time as true concepts, a priori, then it is useful to understand why nature may appear to be Minkowskian, even though it is

really (by supposition) Galilean. But what is the basis for the belief in the Galilean concept of space and time, as distinct from the Minkowskian concept, especially considering that the world appears to be Minkowskian? Most physicists have concluded that there is no good answer to this question, and that it’s preferable to study the world as it appears to be, rather than trying to rationalize “ancient habits”. This does not imply a lack of interest in a deeper explanation for the effective symmetries of nature, but it does suggest that such explanations are most likely to come from studying those effective symmetries themselves, rather than from rationalizing why certain pre-conceived universal asymmetries would be undetectable.

1.6 A More Practical Arrangement It is known that Maxwell’s electrodynamics – as usually understood at the present time – when applied to moving bodies, leads to asymmetries which do not appear to be inherent in the phenomena. A. Einstein, 1905 It's often overlooked that Einstein began his 1905 paper "On the Electrodynamics of Moving Bodies" by describing a system of coordinates based on a single absolute measure of time. He pointed out that we could assign time coordinates to each event ...by using an observer located at the origin of the coordinate system, equipped with a clock, who coordinates the arrival of the light signal originating from the event to be timed and traveling to his position through empty space. This is equivalent to Lorentz's conception of "true" time, provided the origin of the coordinate system is at "true" rest. However, for every frame of reference except the one at rest with the origin, these coordinates would not constitute an inertial coordinate system, because inertia would not be isotropic in terms of these coordinates, so Newton's laws of motion would not even be quasi-statically valid. Furthermore, the selection of the origin is operationally arbitrary, and, even if the origin were agreed upon, there would be significant logistical difficulties in actually carrying out a coordination based on such a network of signals. Einstein says "We arrive at a much more practical arrangement by means of the following considerations". In his original presentation of special relativity Einstein proposed two basic principles, derived from experience. The first is nothing other than Galileo's classical principle of inertial relativity, which asserts that for any material object in any state of motion there exists a system of space and time coordinates, called inertial coordinates, with respect to which the object is instantaneously at rest and inertia is homogeneous and isotropic (the latter being necessary for Newton's laws of motion to hold at least quasi-statically). However, as discussed in previous sections, this principle alone is not sufficient to give a useful basis for evaluating physical phenomena. We must also have knowledge of how

the description of events with respect to one system of inertial coordinates is related to the description of those same events with respect to another, relatively moving, system of coordinates. Rather than simply assuming a relationship based on some prior metaphysical conception of space and time, Einstein realized that the correct relationship between relatively moving systems of inertial coordinates could only be determined empirically. He noted "the unsuccessful attempts to discover any motion of the earth relatively to the 'light medium", and since we define motion in terms of inertial coordinates, these experiments imply that the propagation of light is isotropic in terms of the very same class of coordinate systems for which mechanical inertia is isotropic. On the other hand, all the experimental results that are consolidated into Maxwell's equations imply that the propagation speed of light (with respect to any inertial coordinate system) is independent of the state of motion of the emitting source. Einstein’s achievement was to explain clearly how these seemingly contradictory facts of experience may be reconciled. As an aside, notice that isotropy with respect to inertial coordinates is what we would expect if light was a stream of inertial corpuscles (as suggested by Newton), whereas the independence of the speed of light from the motion of its source is what we would expect if light was a wave phenomenon. This is the same dichotomy that we encounter in quantum mechanics, and it's not coincidental that Einstein wrote his seminal paper on light quanta almost simultaneously with his paper on the electrodynamics of moving bodies. He might actually have chosen to combine the two into a single paper discussing general heuristic considerations arising from the observed properties of light, and the reconciliation of the apparent dichotomy in the nature of light as it is usually understood. From the empirical facts that (a) light propagates isotropically with respect to every system of inertial coordinates (which is essentially just an extension of Galileo's principle of relativity), and that (b) the speed of propagation of light with respect to any system of inertial coordinates is independent of the motion of the emitting source, it follows that the speed of light in invariant with respect to every system of inertial coordinates. From these facts we can deduce the correct relationship between relatively moving systems of inertial coordinates. To establish the form of the relationships between this "more practical" class of coordinate systems (i.e., the class of inertial coordinate systems), Einstein notes that if x,y,z,t is a system of inertial coordinates, and a pulse of light is emitted from location x0 along the x axis at time t0 toward a distant location x1, where it arrives and is reflected at time t1, and if this reflected pulse is received back at location x2 (the same as x0) at time t2 then t1 = (t0 + t2)/2. In other words, since light is isotropic with respect to the same class of coordinate systems in which mechanical inertia is isotropic, the light pulse takes the same amount of time, (t2  t1)/2, to travel each way when expressed in terms of any system of inertial coordinates. By the same reasoning the spatial distance between the emission and reflection events is x1 – x0 = c(t2  t1)/2. Naturally the invariance of light speed with respect to inertial coordinates is implicit in the principles on which special relativity is based, but we must not make the mistake of

thinking that this invariance is therefore tautological, or merely an arbitrary definition. Inertial coordinates are not arbitrary, and they are definable without explicit reference to the phenomenon of light. The real content of Einstein's principles is that light is an inertial phenomenon (despite its wavelike attributes). The stationary ether of posited by Lorentz did not interact mechanically with ordinary matter at all, and yet we know that light conveys momentum to material objects. The coupling between the supposed ether and ordinary matter was always problematic for ether theories, and indeed for any classical wavelike theory of light. Einstein’s paper on the photo-electric effect was a crucial step in recognizing the localized ballistic aspects of electromagnetic radiation, and this theme persists, just under the surface, of his paper on electrodynamics. Oddly enough, the clearest statement of this insight came only as an afterthought, appearing in Einstein's second paper on relativity in 1905, in which he explicitly concluded that "radiation carries inertia between emitting and absorbing bodies". The point is that light conveys not only momentum, but inertia. For example, after a body has absorbed an elementary pulse of light, it has not only received a “kick” from the momentum of the light, but the internal inertia (i.e., the inertial mass) of the body has actually increased. Once it is posited that light is inertial, Galileo's principle of relativity automatically implies that light propagates isotropically from the source, regardless of the source's state of uniform motion. Consequently, if we elect to use space and time coordinates in terms of which light speed is not isotropic (which we are certainly free to do), we will necessarily find that no inertial processes are isotropic. For example, we will find that two identical marbles expelled from a tube in opposite directions by an explosive charge located between them will not fly away at equal speeds, i.e., momentum will not be conserved. Conversely, if we use ordinary mechanical inertial processes together with the conservation of momentum (and if we decline to assign any momentum or reaction to unobservable and/or immovable entities), we will necessarily arrive at clock synchronizations that are identical with those given by Einstein's light rays. Thus, Einstein's "more practical arrangement" is based on (and ensures) isotropy not just for light propagation, but for all inertial phenomena. If a uniformly moving observer uses pairs of identical material objects thrown with equal force in opposite directions to establish spaces of simultaneity, he will find that his synchronization agrees with that produced by Einstein's assumed isotropic light rays. The special attribute of light in this regard is due to the fact that, although light is inertial, it has no mass of its own, and therefore no rest frame. It can be regarded entirely as nothing but an interaction along a null interval between two massive bodies, the emitter and absorber. From this follows the indefinite metric of spacetime, and light's seemingly paradoxical combination of wavelike and inertial properties. (This is discussed more fully in Section 9.11.) It's also worth noting that when Einstein invoked the operational definitions of time and distance based on light propagation, he commented that "we assume this definition of synchronization is free from contradictions, and possible for any number of points". This is crucial for understanding why a set of definitions based on the propagation of light is tenable, in contrast with a similar set of definitions based on non-inertial signals, such as

acoustical waves or postal messages. A set of definitions based on any non-inertial signal can't possibly preserve inertial isotropy. Of course, a signal requiring an ordinary material medium for its propagation would obviously not be suitable for a universal definition of time, because it would be inapplicable across regions devoid of that substance. Moreover, even if we posited an omni-present substance, a signal consisting of (or carried by) any material substance would be unsuitable because such objects do not exhibit any particular fixed characteristic of motion, as shown by the fact that they can be brought to rest with respect to some inertial system of reference. Furthermore, if there exist any signals faster than those on which we base our definitions of temporal synchronization, those definitions will be easily falsified. The fact that Einstein's principles are empirically viable at all, far from being vacuous or tautological, is actually somewhat miraculous. In fact, if we were to describe the kind of physical phenomenon that would be required in order for us to have a consistent capability of defining a coherent basis of temporal synchronization for spatially separate events, clearly it could be neither a material object, nor a disturbance in a material medium, and yet it must exhibit some fixed characteristic quality of motion that exceeds the motion of any other object or signal. We hardly have any right to expect, a priori, that such phenomenon exists. On the other hand, it could be argued that Einstein's second principle is just as classical as his first, because sight has always been the de facto arbiter of simultaneity (as well as of straightness, as in "uniform motion in a straight line"). Even in Galileo's day it was widely presumed that vision was instantaneous, so it automatically was taken to define simultaneity. (We review the historical progress of understanding the speed of light in Section 3.3.) The difference between this and the modern view is not so much the treatment of light as the means of defining simultaneity, but simply the realization that light propagates at a finite speed, and therefore the spacetime manifold is only partially ordered. The derivation of the Lorentz transformation presented in Einstein's 1905 paper is formally based on two empirically-based propositions, which he expressed as follows: 1. The laws by which the conditions of physical systems change are independent of which of two coordinate systems in homogeneous translational movement relative to each other these changes in status are referred. 2. Each ray of light moves in "the resting" coordinate system with the definite speed c, independently of whether this ray of light is emitted from a resting or moving body. Here speed = (optical path) / (length of time), where "length of time" is to be understood in the sense of the definition in § l. In the first of these propositions we are to understand that the “coordinate systems” are all such that Newton’s laws of motion hold good (in a suitable limiting sense), as alluded to at the beginning of the paper’s §l. This is crucial, because without this stipulation, the proposition is false. For example, coordinate systems related by Galilean transformations are “in homogeneous translational movement relative to each other”, and yet the laws by which physical systems change (e.g., Maxwell’s equations) are manifestly not independent of the choice of such coordinate systems. So the restriction to coordinate

systems in terms of which the laws of mechanics hold good is crucial. However, once we have imposed this restriction, the proposition becomes tautological, at least for the laws of mechanics. The real content of Einstein’s first “principle” is therefore the assertion that the other laws of physics (e.g., the laws of electrodynamics) hold good in precisely the same set of coordinate systems in terms of which the laws of mechanics hold good. (This is also the empirical content of the failure of the attempts to detect the Earth’s absolute motion through the electromagnetic ether.) Thus Einstein’s first principle simply reasserts Galileo’s claim that all effects of uniform rectilinear motion can be “transformed away” by a suitable choice coordinate systems. It might seems that Einstein’s second principle is implied by the first, at least if Maxwell's equations are regarded as laws governing the changes of physical systems, because Maxwell's equations prescribe the speed of light propagation independent of the source's motion. (Indeed, Einstein alluded to this very point at the beginning of his 1905 paper on the inertia of energy.) However, it’s not clear a priori whether Maxwell’s equations are valid in terms of relatively moving systems of coordinates, nor whether the permissivity of the vacuum is independent of the frame of reference in terms of which it is evaluated. Moreover, as discussed above, by 1905 Einstein already doubted the absolute validity of Maxwell's equations, having recently completed his paper on the photo-electric effect which introduced the idea of photons, i.e., light propagating as discrete packets of energy, a concept which cannot be represented as a solution of Maxwell's linear equations. Einstein also realized that a purely electromagnetic theory of matter based on Maxwell's equations was impossible, because those equations by themselves could never explain the equilibrium of electric charge that constitutes a charged particle. "Only different, nonlinear field equations could possibly accomplish such a thing." This observation shows how unjustified was the "molecular force hypothesis" of Lorentz, according to which all the forces of nature were assumed to transform exactly as do electromagnetic forces as described by Maxwell's linear equations. Knowing that the molecular forces responsible for the equilibrium of charged particles must necessarily be of a fundamentally different character than the forces of electromagnetism, and certainly knowing that the stability of matter may not even have a description in the form of a continuous field theory at all, it's clear that Lorentz's hypothesis has no constructive basis, and is simply tantamount to the adoption of Einstein’s two principles. Thus, Einstein's contribution was to recognize that "the bearing of the Lorentz transformation transcended its connection with Maxwell's equations and was concerned with the nature of space and time in general". Instead of basing special relativity on an assumption of the absolutely validity of Maxwell's equations, Einstein based it on the particular characteristic exhibited by those equations, namely Lorentz invariance, that he intuited was the more fundamental principle, one that could serve as an organizing principle analogous to the conservation of energy in thermodynamics, and one that could encompass all physical laws, even if they turned out to be completely dissimilar to Maxwell's equations. Remarkably, this has turned out to be the case. Lorentz invariance is a key aspect of the modern theory of quantum electrodynamics, which replaced Maxwell’s equations.

Of course, just as Einstein’s first principle relies on the restriction to coordinate systems in which the laws of mechanics hold good, his second principle relies crucially on the requirement that time intervals are “to be understood in the sense of the definition given in §1”. And, again, once this condition is recognized, the principle itself becomes tautological, although in this case the tautology is complete. The second principle states that light always propagates at the speed c, assuming we define the time intervals in accord with §1, which defines time intervals as whatever they must be in order for the speed of light to be c. This unfortunately has led some critics to assert that special relativity is purely tautological, merely a different choice of conventions. Einstein’s presentation somewhat obscures the real physical content of the theory, which is that mechanical inertia and the propagation speed of light are isotropic and invariant with respect to precisely the same set of coordinate systems. This is a non-trivial fact. It then remains to determine how these distinguished coordinate systems are related to each other. Although Einstein explicitly highlighted just two principles as the basis of special relativity in his 1905 paper (consciously patterned after the two principle of thermodynamics), his derivation of the Lorentz transformation also invoked “the properties of homogeneity that we attribute to space and time” to establish the linearity of the transformations. In addition, he tacitly assumed spatial isotropy, i.e., that there is no preferred direction in space, so the intrinsic properties of ideal rods and clocks do not depend on their spatial orientations. Lastly, he assumed memorylessness, i.e., that the extrinsic properties of rods and clocks may be functions of their current positions and states of motion, but not of their previous positions or states of motion. This last assumption is needed to exclude the possibility that every elementary particle may somehow "remember" its entire history of accelerations, and thereby "know" its present absolute velocity relative to a common fixed reference. (Einstein explicitly listed these extra assumptions in an exposition written in 1920. He may have gained an appreciation of the importance of the independence of measuring rods and clocks from their past history after considering Weyl’s unified field theory, which Einstein rejected precisely because it violated this premise.) The actual detailed derivation of the Lorentz transformation presented in Einstein’s 1905 paper is somewhat obscure and circuitous, but it’s worthwhile to follow his reasoning, partly for historical interest, and partly to contrast it with the more direct and compelling derivations that will be presented in subsequent sections. Following Einstein’s original derivation, we begin with an inertial (and Cartesian) coordinate system called K, with the coordinates x, y, z, t, and we posit another system of inertial coordinates denoted as k, with the coordinates ξ, η, ζ, τ. The spatial axes of these two systems are aligned, and the spatial origin of k is moving in the positive x direction with speed v in terms of K. We then consider a particle at rest in the k system, and note that for such a particle the x and t coordinates (i.e., the coordinates in terms of the K system) are related by x’ = x – vt for some constant x’. We also know the y and z coordinates of such a particle are constant. Hence each stationary spatial position in the k

system corresponds to a set of three constants (x’,y,z), and we can also assign the time coordinate t to each event. Interestingly, the system of variables x’,y,z,t constitute a complete coordinate system, related to the original system K by a Galilean transformation x’ = x-vt, y’=y, z’=z, t’=t. Thus, just as Lorentz did in 1892, Einstein began by essentially applying a Galilean transformation to the original “rest frame” coordinates to give an intermediate system of coordinates, although Einstein’s paper makes it clear that this is not an inertial coordinate system. Now we consider the values of the τ coordinate of the k system as a function of x’,y,z,t for any stationary point in the k system. Suppose a pulse of light is emitted from the origin of the k system in the positive x direction at time τ0, it reaches the point corresponding to x’,y,z at time τ1, where it is reflected, arriving back at the origin of the k system at time τ2. This is depicted in the figure below.

Recall that the ξηζτ coordinates are defined as inertial coordinates, meaning that inertia is homogeneous and isotropic in terms of these coordinates. Also, all experimental evidence (such as all "the unsuccessful attempts to discover any motion of the earth relatively to the 'light medium'") indicates that the speed of light is isotropic in terms of any inertial coordinate system. Therefore, we have τ1 = (τ0 + τ 2)/2, so the τ coordinate as a function of x’,y,z,t satisfies the relation

Differentiating both sides with respect to the parameter x’, we get (using the chain rule)

Now, it should be noted here that the partial derivatives are being evaluated at different points, so we would not, in general, be justified in treating them interchangeably. However, Einstein has stipulated that the transformation equations are linear (due to homogeneity of space and time), so the partial derivatives are all constants and unique (for any given v). Simplifying the above equation gives

At this point, Einstein alludes to analogous reasoning for the y and z directions, but doesn’t give the details. Presumably we are to consider a pulse of light emanating from the origin and reflecting at a point x’ = 0, y, z = 0, and returning to the origin. In this case the isotropy of light propagation in terms of inertial coordinates implies

In this equation we have made use of the fact that the y component of the speed of the light pulse (in terms of the K system) as it travels in either direction between these points, which are stationary in the k system, is (c2 – v2)1/2. Differentiating both sides with respect to y, we get

and therefore ∂τ/∂y = 0. The same reasoning shows that ∂τ/∂z = 0. Now the total differential of τ(x’,y,z,t) is, by definition

and we know the partial derivatives with respect to y and z are zero, and the partial derivatives with respect to x’ and t are in a known ratio, so for any given v we can write

where a(v) is as yet an undetermined function. Incidentally, Einstein didn’t write this expression in terms of differentials, but he did state that he was “letting x’ be infinitesimally small”, so he was essentially dealing with differentials. On the other hand, the distinction between differentials and finite quantities matters little in this context, because the relations are linear, and hence the partial derivatives are constants, so the differentials can be trivially integrated. Thus we have

Einstein then used this result to determine the transformation equations for the spatial coordinates. The ξ coordinate of a pulse of light emitted from the origin in the positive x direction is related to the τ coordinate by ξ = cτ (since experience has shown that light propagates with the speed c in all directions when expressed in terms of any system of inertial coordinates). Substituting for τ from the preceding formula gives, for the ξ coordinate of this light pulse, the expression

We also know that, for this light pulse, the parameters t and x’ are related by t = x’/(c-v), so we can substitute for t in the above expression and simplify to give the relation between ξ and x’ (both of which, we remember, are constants for any point at rest in k)

We can choose x’ to be anything we like, so this represents the general relation between these two parameters. Similarly the η coordinate of a pulse of light emanating from the origin in the η direction is

but in this case we have x’ = 0 and, as noted previously, t = y/(c2v2)1/2, so we have

and by the same token

If we define the function

and substitute x – vt for x’, the preceding results can be summarized as

At this point Einstein observes that a sphere of light expanding with the speed c in terms of the unprimed coordinates transforms to a sphere of light expanding with speed c in terms of the double-primed coordinates. In other words,

As Einstein says, this “shows that our two fundamental principles are compatible”, i.e., it is possible for light to propagate isotropically with respect to two relatively moving systems of inertial coordinates, provided we allow the possibility that the transformation from one inertial coordinate system to another is not exactly as Galileo and Newton surmised. To complete the derivation of the Lorentz transformation, it remains to determine the function ϕ(v). To do this, Einstein considers a two-fold application of the transformation, once with the speed v in the positive x direction, and then again with the speed v in the negative x direction. The result should be the identity transformation, i.e., we should get back to the original coordinate system. (Strictly speaking, this assumes the property of “memorylessness”.) It’s easy to show that if we apply the above transformation twice, once with parameter v and once with parameter –v, each coordinate is ϕ(v)ϕ(v) times the original coordinate, so we must have

Finally, Einstein concludes by “inquiring into the signification of ϕ(v)”. He notes that a segment of the η axis moving with speed v perpendicular to its length (i.e., in the positive x direction) has the length y = η/ϕ(v) in terms of the K system coordinates, and by “reasons of symmetry” (i.e., spatial isotropy) this must equal η/ϕ(v), because it doesn’t matter whether this segment of the y axis is moving in the positive or the negative x direction. Consequently we have ϕ(v) = ϕ(v), and therefore ϕ(v) = 1, so he arrives at the Lorentz transformation

This somewhat laborious and awkward derivation is interesting in several respects. For one thing, one gets the impression that Einstein must have been experimenting with various methods of presentation, and changed his nomenclature during the drafting of the paper. For example, at one point he says “a is a function ϕ(v) at present unknown”, but subsequently a(v) and ϕ(v) are defined as different functions. At another point he defines x’ as a Galilean transform of x (without explicitly identifying it as such), but subsequently uses the symbol x’ as part of the inertial coordinate system resulting from the two-fold application of the Lorentz transformation. In addition, he somewhat tacitly makes use of the invariance of the light-like relation x2 + y2 = c2t2 in his derivation of the transformation equations for the y coordinate, but doesn’t seem to realize that he could just as well have invoked the invariance of x2 + y2 + z2 = c2t2 to make short work of the entire derivation. Instead, he presents this invariance as a consequence of the transformation equations – despite the fact that he has tacitly used the invariance as the basis of the derivation (which of course he was entitled to do, since that invariance simply expresses his “light principle”). Perhaps not surprisingly, some readers have been confused as to the significance of the functions a(v) and ϕ(v). For example, in a review of Einstein’s paper, A. I. Miller writes Then, without prior warning Einstein replaced a(v) with ϕ(v)/(1(v/c)2)1/2… But why did Einstein make this replacement? It seems as if he knew beforehand the correct form of the set of relativistic transformations… How did Einstein know that he had to make [this substitution] in order to arrive at those space and time transformations in agreement with the postulates of relativity? This suggests a misunderstanding, because the substitution in question is purely formal, and has no effect on the content of the equations. The transformations that Einstein had derived by that point, prior to replacing a(v), were already consistent with the postulates of relativity (as can be verified by substituting them into the Minkowski invariant). It is simply more convenient to express the equations in terms of ϕ(v), which is the entire coefficient of the transformations for y and z. One naturally expects this coefficient to equal unity. Even aside from the inadvertent changes in nomenclature, Einstein’s derivation is undeniably clumsy, especially in first applying what amounts to a Galilean transformation, and then deriving the further transformation needed to arrive at a system of inertial coordinates. It’s clear that he was influenced by Lorentz’s writings, even to the point of using the same symbol β for the quantity 1/(1(v/c)2)1/2, which Lorentz used in his 1904 paper. (Oddly enough, many years later Einstein wrote to Carl Seelig that in 1905 he had known only of Lorentz’s 1895 paper, but not his subsequent papers, and none of Poincare’s papers on the subject.) In a review article published in 1907 Einstein had already adopted a more economical derivation, dispensing with the intermediate Galilean system of coordinates, and making direct use of the lightlike invariant expression, similar to the standard derivation presented in most introductory texts today. To review this now standard derivation,

consider (again) Einstein’s two systems of inertial coordinates K and k, with coordinates denoted by (x,y,z,t) and (ξ,η,ζ,τ) respectively, and oriented so that the x and ξ axes coincide, and the xy plane coincides with the ξη plane. Also, as before, the system k is moving in the positive x direction with fixed speed v relative to the system K, and the origins of the two systems momentarily coincide at time t = τ = 0. According to the principle of homogeneity, the relationship between the two sets of coordinates must be linear, so there must be constants A1 and A2 (for a given v) such that ξ = A1x + A2 t. Furthermore, if an object is stationary relative to k, and if it passes through the point (x,t) = (0,0), then it's position in general satisfies x = vt, from the definition of velocity, and the ξ coordinate of that point with respect to the k system is 0. Therefore we have ξ = A1(vt) + A2 t = 0. Since this must be true for non-zero t, we must have A1 v + A2 = 0, and so A2 = A1 v. Consequently, there is a single constant A (for any given v) such that ξ = A(xvt). Similarly there must be constants B and C such that η = By and ζ = Cz. Also, invoking isotropy and homogeneity, we know that τ is independent of y and z, so it must be of the form τ = Dx + Et for some constants D and E (for a given v). It only remains to determine the values of the constants A, B, C, D, and E in these expressions. Suppose at the instant when the spatial origins of K and k coincide a spherical wave of light is emitted from their common origin. At a subsequent time t in the first frame of reference the sphere of light must be the locus of points satisfying the equation

and likewise, according to our principles, in the second frame of reference the spherical wave at time τ must be the locus of points described by

Substituting from the previous expressions for the k coordinates into this equation, we get

Expanding these terms and rearranging gives

The assumption that light propagates at the same speed in both frames of reference implies that a simultaneous spherical shell of light in one frame is also a simultaneous spherical shell of light in the other frame, so the coefficients of equation (3) must be proportional to the coefficients of equation (1). Strictly speaking, the constant of

proportionality is arbitrary, representing a simple re-scaling, so we are free to impose an additional condition, namely, that the transformation with parameter +v followed by the transformation with parameter –v yields the original coordinates, and by the isotropy of space these two transformations, which differ only in direction, must have the same constant of proportionality. Thus the corresponding coefficients of equations (1) and (3) must not only be proportional, they must be equal, so we have

Clearly we can take B = C = 1 (rather than 1, since we choose not to reflect the y and z directions). Dividing the 4th of these equations by 2, we're left with the three equations in the three unknowns A, D, and E:

Solving the first equation for A2 and substituting this into the 2nd and 3rd equations gives

Solving the first for E and substituting into the 2nd gives a single quadratic equation in D, with the roots

Substituting this into either of the previous equations and solving the resulting quadratic for E gives

Note that the equations require opposite signs for D and E. Now, for small values of v/c we expect to find E approaching +1 (as in Galilean relativity), so we choose the positive root for E and the negative root for D. Finally, from the relation A2  c2 D2 = 1 we get

and again we select the positive root. Consequently we have the Lorentz transformation

Naturally with this transformation we can easily verify that

so this quantity is the squared "absolute distance" from the origin to the point with K coordinates (x,y,z,t) and the corresponding k coordinates (ξηζτ), which confirms that the absolute spacetime interval between two points is the same in both frames. Notice that equations (1) and (2) already implied this relation for null intervals. In other words, the original premise was that if x2 + y2 + z2  c2t2 equals zero, then ξ2 + η2 + ζ2  c2τ2 also equals zero. The above reasoning show that a consequence of this premise is that, for any arbitrary real number s2, if x2 + y2 + z2  c2t2 equals s2, then ξ2 + η2 + ζ2  c2τ2 also equals s2. Therefore, this quadratic form represents an absolute invariant quantity associated with the interval from the origin to the event (x,y,z,t). 1.7 Staircase Wit Henceforth space by itself, and time by itself, are doomed to fade away into mere shadows, and only a kind of union of the two will preserve an independent reality. H. Minkowski, 1908 In retrospect, it's easy to see that the Galilean notion of space and time was not free of conceptual difficulties. In 1908 Minkowski delivered a famous lecture in which he argued that the relativistic phenomena described by Lorentz and clarified by Einstein might have been inferred from first principles long before, if only more careful thought had been given to the foundations of classical geometry and mechanics. He pointed out that special relativity arises naturally from the reconciliation of two physical symmetries that we individually take for granted. One is spatial isotropy, which asserts the equivalence of all physical phenomena under linear transformations such as x’ = ax – by, y’ = bx + ay, z’ = z, t’ = t, where a2 + b2 = 1. It’s easy to verify that transformations of this type leave all quantities of the form x2 + y2 + z2 invariant. The other is Galilean relativity, which asserts the equivalence of all physical phenomena under transformations such as x’ = x – vt, y’ = y, z’ = z, t’ = t, where v is a constant. However, these transformations obviously do not

leave the quantity x2 + y2 + z2 invariant, because they involve the time coordinate as well as the space coordinates. In addition, we notice that the rotational transformations maintain the orthogonality of the coordinate axes, whereas the lack of an invariant measure for the Galilean transformations prevents us from even assigning a definite meaning to “orthogonality” between the time and space coordinates. Since the velocity transformations leave the laws of physics unchanged, Minkowski reasoned, they ought to correspond to some invariant physical quantity, and their determinants ought to be unity. Clearly the invariant must involve the time coordinate, and hence the units of space and time must be in some fixed non-singular relation to each other, with a conversion factor that we can normalize to unity. Also, since we cannot go backwards in time, the space axis must not be rotated in the same direction as the time axis by a velocity transformation, so the velocity transformations ought to be of the form x’ = ax – bt, y’=y, z’=z, t’ = bx – at, where a2 – b2 = 1. Combining this with the requirements b/a = v, we arrive at the transformation

which leaves invariant the quantity x2 + y2 + z2 – t2. The rotational transformations also leave this same quantity invariant, so this appears to be the most natural (and almost the only) way of reconciling the observed symmetries of physical phenomena. Hence from simple requirements of rational consistency we could have arrived at the Lorentz transformation. As Minkowski said Such a premonition would have been an extraordinary triumph for pure mathematics. Well, mathematics, though it now can display only staircase wit, has the satisfaction of being wise after the event... to grasp the far-reaching consequences of such a metamorphosis of our concept of nature. Needless to say, the above discussion is just a rough sketch, intended to show only the outline of an argument. It seems likely that Minkowski was influenced by Klein’s Erlanger program, which sought to interpret various kinds of geometry in terms of the invariants under a specific group of transformations. It is certainly true that we are led toward the Lorentz transformations as soon as we consider the group of velocity transformations and attempt to identify a physically meaningful invariant corresponding to these transformations. However, the preceding discussion glossed over several important considerations, and contains several unstated assumptions. In the following, we will examine Minkowski’s argument in more detail, paying special attention to the physical significance of each assertion along the way, and elaborating more fully the rational basis for concluding that there must be a definite relationship between the measures of space and time. For any system of mutually orthogonal spatial coordinates x,y,z, (assumed linear and homogeneous) let the positions of the two ends of a given spatially extended physical entity be denoted by x1,y1,z1 and x2,y2,z2, and let L2 denote the sum of the squares of the component differences. In other words

Experience teaches us that, for a large class of physical entities (“solids”), we can shift and/or re-orient the entity (relative to the system of coordinates), changing the individual components, but the sum of the squares of the component differences remains unchanged. The invariance of this quantity under re-orientations is called spatial isotropy. It’s worth emphasizing that the invariance of s2 under these operations applies only if the x, y, and z coordinates are mutually orthogonal. The spatial isotropy of physical entities implies a non-trivial unification of orthogonal measures. Strictly speaking, each of the three terms on the right side of (1) should be multiplied by a coefficient whose units are the squared units of s divided by the squared units of x, y, or z respectively. In writing the equation without coefficients, we have tacitly chosen units of measure for x, y, and z such that the respective coefficients are 1. In addition, we tacitly assumed the spatial coordinates of the two ends of the physical entity had constant values (for a given position and orientation), but of course this assumption is valid only if the entities are stationary. If an object is in motion (relative to the system of coordinates), then the coordinates of its endpoints are variable functions of time, so instead of the constant x1 we have a function x1(t), and likewise for the other coordinates. It’s natural to ask whether the symmetry of equation (1) is still applicable to objects in motion. Clearly if we allow the individual coordinate functions to be evaluated at unequal times then the symmetry does not apply. However, if all the coordinate functions are evaluated for the same time, experience teaches us that equation (1) does apply to objects in motion. This is the second of our two commonplace symmetries, the apparent fact that the sum of the squares of the orthogonal components of the spatial interval between the two ends of a solid entity is invariant for all states of uniform motion, with the understanding that the coordinates are all evaluated at the same time. To express this symmetry more precisely, let x1,y1,z1 denote the spatial coordinates of one end of a solid physical entity at time t1, and let x2,y2,z2 denote the spatial coordinates of the other end at time t2. Then the quantity expressed by equation (1) is invariant for any position, orientation, and state of uniform motion provided t1 = t2. However, just as the spatial part of the symmetry is not valid for arbitrary spatial coordinate systems, the temporal part is not valid for arbitrary time coordinates. Recall that the spatial isotropy of the quantity expressed by equation (1) is valid only if the space coordinates x,y,z are mutually orthogonal. Likewise, the combined symmetry covering states of uniform motion is valid only if the time component t is mutually orthogonal to each of the space coordinates. The question then arises as to how we determine whether coordinate axes are mutually orthogonal. We didn’t pause to consider this question when we were dealing only with the three spatial coordinates, but even for the three space axes the question is not as trivial as it might seem. The answer relies on the concept of “distance” defined by the quantity s in equation (1). According to Euclid, two lines intersecting at the point P are

perpendicular if and only if each point of one line is equidistant from the two points on the other line that are equidistant from P. Unfortunately, this reasoning involves a circular argument, because in order to determine whether two lines are orthogonal, we must evaluate distances between points on those lines using an equation that is valid only if our coordinate axes are orthogonal. By this reasoning, we could conjecture that any two obliquely intersecting lines are orthogonal, and then use equations (1) with coordinates based on those lines to confirm that they are indeed orthogonal according to Euclid’s definition. But of course the physical objects of our experience would not exhibit spatial isotropy in terms of these coordinates. This illustrates that we can only establish the physical orthogonality of coordinate axes based on physical phenomena. In other words, we construct orthogonal coordinate axes operationally, based on the properties of physical entities. For example, we define an orthogonal system of coordinates in such a way that a certain spatially extended physical entity is isotropic. Then, by definition, this physical entity is isotropic with respect to these coordinates, so again the reasoning is circular. However, the physical significance of these coordinates and the associated spatial isotropy lies in the empirical fact that all other physical entities (in the class of “solids”) exhibit spatial isotropy in terms of this same system of coordinates. Next we need to determine a time axis that is orthogonal to each of the space axes. In common words, this amounts to synchronizing the times at spatially separate locations. Just as in the case of the spatial axes, we can establish physically meaningful orthogonality for the time axis only operationally, based on some reference physical phenomena. As we’ve seen, orthogonality between two lines is determined by the distances between points on those lines, so in order to determine a time axis orthogonal to a space axis we need to evaluate “distances” between points that are separated in time as well as in space. Unfortunately, equation (1) defines distances only between points at the same time. Evidently to establish orthogonality between space and time axes we need a physically meaningful measure of space-time distance, rather than merely spatial distance. Another physical symmetry that we observe in nature is the symmetry of temporal translation. This refers to the fact that for a certain class of physical processes the duration of the process is independent of the absolute starting time. In other words, letting t1 and t2 denote the times of the two ends of the process, the quantity

is invariant under translation of the starting time t1. This is exactly analogous to the symmetry of a class of physical objects under spatial translations. However, we have seen that the spatial symmetries are valid only if the time coordinates t1 and t2 are the same, so we should recognize the possibility that the physical symmetry expressed by the invariance of (2) is valid only when the spatial coordinates of events 1 and 2 are the same. Of course, this can only be determined empirically. Somewhat surprisingly, common experience suggests that the values of τ2 for a certain class of physical processes actually are invariant even if the spatial positions of events 1 and 2 are different… at least to within the accuracy of common observation and for differences in positions that are

not too great. Likewise we find that, for just about any time axis we choose, such that some material object is at rest in terms of the coordinate system, the spatial symmetries indicated by equation (1) apply, at least within the accuracy of common observation and for objects that are not moving too rapidly. This all implies that the ratio of spatial to temporal units of distance is extremely great, if not infinite. If the ratio is infinite, then every time axis is orthogonal to every space axis, whereas if it is finite, any change of the direction of the time axis requires a corresponding change of the spatial axes in order for them to remain mutually perpendicular. The same is true of the relation between the space axes themselves, i.e., if the scale factor between (say) the x and the y coordinates was infinite, then those axes would always be perpendicular, but since it is finite, any rotation of the x axis (about the z axis) requires a corresponding rotation of the y axis in order for them to remain orthogonal. It is perhaps conceivable that the scale factor between space and time could be infinite, but it would be very incongruous, considering that the time axis can have spatial components. Also, taking equations (1) and (2) separately, we have no means of quantifying the absolute separation between two non-simultaneous events. The spatial separation between non-simultaneous events separated by a time increment Δt is totally undefined, because there exist perfectly valid reference frames in which two non-simultaneous events are at precisely the same spatial location, and other frames in which they are arbitrarily far apart. Still, in all of those frames (according to Galilean relativity), the time interval remains Δt. Thus, there is no definite combined spatial and temporal separation – despite the fact that we clearly intuit a definite physical difference between our distance from "the office tomorrow" and our distance from "the Andromeda galaxy tomorrow". Admittedly we could postulate a universal preferred reference frame for the purpose of assessing the complete separations between events, but such a postulate is entirely foreign to the logical structure of Galilean space and time, and has no operational significance. So, we are led to suspect that there is a finite (though perhaps very large) scale factor c between the units of space and time, and that the physical symmetries we’ve been discussing are parts of a larger symmetry, comprehending the spatial symmetries expressed by (1) and the temporal symmetries expressed by (2). On the other hand, we do not expect spacelike intervals and timelike intervals to be directly conformable, because we cannot turn around in time as we can in space. The most natural supposition is that the squared spacelike intervals and the squared timelike intervals have opposite signs, so that they are mutually “imaginary” (in the numerical sense). Hence our proposed invariant quantity for a suitable class of repeatable physical processes extending uniformly from event 1 to event 2 is

(This is the conventional form for spacelike intervals, whereas the negative of this quantity, denoted by τ2, is used to signify timelike intervals.) This quantity is invariant under any combination of spatial rotations and changes in the state of uniform motion, as well as simple translations of the origin in space and/or time. The algebraic group of all transformations (not counting reflections) that leave this quantity invariant is called the

Poincare group, in recognition of the fact that it was first described in Poincare’s famous “Palermo” paper, dated July 1905. Equation (3) is not positive-definite, which means that even though it is a squared quantity it may have a negative value, and of course it vanishes along the path of a light pulse. Noting that squared times and squared distances have opposite signs, Minkowski remarked that Thus the essence of this postulate may be clothed mathematically in a very pregnant manner in the mystic formula

On this basis equation (3) can be re-written in a way that is formally symmetrical in the space and time coordinates, but of course the invariant quantity remains non-positivedefinite. The significance of this “mystic formula” continues to be debated, but it does provide an interesting connection to quantum mechanics, to be discussed in Section 9.9. As an aside, note that measurements of physical objects in various orientations are not sufficient to determine the “true” lengths in any metaphysical absolute sense. If all physical objects were, say, twice as long when oriented in one particular absolute direction than in the perpendicular directions, and if this anisotropy affected all physical phenomena equally, we could never detect it, because our rulers would be affected as well. Thus, when we refer to a physical symmetry (such as the isotropy of space), we are referring to the fact that all physical phenomena are affected by some variable (such as spatial orientation) in exactly the same way, not that the phenomena bear any particular relationship with some metaphysical standard. From this perspective we can see that the Lorentzian approach to “explaining” the (apparent) symmetries of space-time does nothing to actually explain those symmetries; it is simply a rationalization of the discrepancy between those empirical symmetries and an a priori metaphysical standard that does not possess those symmetries. In any case, we’ve seen how a slight (for most purposes) modification of the relationship between inertial coordinate systems leads to the invariant quantity

For any fixed value of the constant c, we will denote by Gc the group of transformations that leave this quantity unchanged. If we let c go to infinity, the temporal increment dt must be invariant, leaving just the original Euclidean group for the spatial increments. Thus the space and time components are de-coupled, in accord with Galilean relativity. Minkowski called this limiting case G , and remarked that Since Gc is mathematically much more intelligible than G , it looks as though the thought might have struck some mathematician, fancy-free, that after all, as a matter of fact, natural phenomena do not possess invariance with the group G, but rather with the group Gc, with c being finite and determinate, but in ordinary units of measure extremely great.

Minkowski is here clearly suggesting that Lorentz invariance might have been deduced from a priori considerations, appealing to mathematical "intelligibility" as a criterion for the laws of nature. Einstein himself eschewed the temptation to retroactively deduce Lorentz invariance from first principles, choosing instead to base his original presentation of special relativity on two empirically-founded principles, the first being none other than the classical principle of relativity, and the second being the proposition that the speed of light is the same with respect to any system of inertial coordinates, independent of the motion of the source. This second principle often strikes people as arbitrary and unwarranted (rather like Euclid's "fifth postulate", as discussed in Section 3.1), and there have been numerous attempts to deduce it from some more fundamental principle. For example, it's been argued that the light speed postulate is actually redundant to the relativity principle itself, since if we regard Maxwell's equations as fundamental laws of physics, and we regard the permeability µ0 and permittivity ε0 of the vacuum as invariant constants of those laws in any uniformly moving frame of reference, then it follows that the speed of light in a vacuum is c = with respect to every uniformly moving system of coordinates. The problem with this line of reasoning is that Maxwell's equations are not valid when expressed in terms of an arbitrary uniformly moving system of coordinates. In particular, they are not invariant under a Galilean transformation despite the fact that systems of coordinates related by such a transformation are uniformly moving with respect to each other. (Maxwell himself recognized that the equations of electromagnetism, unlike Newton's equations of mechanics, were not invariant under Galilean "boosts"; in fact he proposed various experiments to exploit this lack of invariance in order to measure the "absolute velocity" of the Earth relative to the aluminiferous ether. See Section 3.3 for one example.) Furthermore, we cannot assume, a priori, that µ0 and ε0 are invariant with respect to changes in reference frame. Actually µ0 is an assigned value, but ε0 must be measured, and the usual means of empirically determining ε0 involve observations of the force between charged plates. Maxwell clearly believed these measurements must be made with the apparatus "at rest" with respect to the ether in order to yield the true and isotropic value of ε0. In sections 768 and 769 of Maxwell’s Treatise he discussed the ratio of electrostatic to electromagnetic units, and predicted that two parallel sheets of electric charge, both moving in their own planes in the same direction with velocity c (supposing this to be possible) would exert no net force on each other. If Maxwell imagined himself moving along with these charged plates and observing no force between them, he obviously did not expect the laws of electrostatics to be applicable. (This is analogous to Einstein’s famous thought experiment in which he imagined moving along side a relatively “stationary” pulse of light.) According to Maxwell's conception, if measurements of ε0 are performed with an apparatus traveling at some significant fraction of the speed of light, the results would not only differ from the result at rest, they would also vary depending on the orientation of the plates relative to the direction of the absolute velocity of the apparatus. Of course, the efforts of Maxwell and others to devise empirical methods for measuring the absolute rest frame (either by measuring anisotropies in the speed of light or by

detecting variations in the electromagnetic properties of the vacuum) were doomed to failure, because even though it's true that the equations of electromagnetism are not invariant under Galilean transformations, it is also true that those equations are invariant with respect to every system of inertial coordinates. Maxwell (along with everyone else before Einstein) would have regarded those two propositions as logically contradictory, because he assumed inertial coordinate systems are related by Galilean transformations. Einstein was the first to recognize that this is not so, i.e., that relatively moving inertial coordinate systems are actually related by Lorentz transformations. Maxwell's equations are suggestive of the invariance of c only because of the added circumstance that we are unable to physically identify any particular frame of reference for the application of those equations. (Needless to say, the same is not true of, for example, the Navier-Stokes equation for a material fluid medium.) The most readily observed instance of this inability to single out a unique reference frame for Maxwell's equations is the empirical invariance of light speed with respect to every inertial system of coordinates, from which we can infer the invariance of ε0. Hence attempts to deduce the invariance of light speed from Maxwell's equations are fundamentally misguided. Furthermore, as discussed in Section 1.6, we know (as did Einstein) that Maxwell's equations are not fundamental, since they don't encompass quantum photo-electric effects (for example), whereas the Minkowski structure of spacetime (representing the invariance of the local characteristic speed of light) evidently is fundamental, even in the context of quantum electrodynamics. This strongly supports Einstein's decision to base his kinematics on the light speed principle itself. (As in the case of Euclid's decision to specify a "fifth postulate" for his theory of geometry, we can only marvel in retrospect at the underlying insight and maturity that this decision reveals.) Another argument that is sometimes advanced in support of the second postulate is based on the notion of causality. If the future is to be determined by (and only by) the past, then (the argument goes) no object or information can move infinitely fast, and from this restriction people have tried to infer the existence of a finite upper bound on speeds, which would then lead to the Lorentz transformations. One problem with this line of reasoning is that it's based on a principle (causality) that is not unambiguously selfevident. Indeed, if certain objects could move infinitely fast, we might expect to find the universe populated with large sets of indistinguishable particles, all of which are really instances of a small number of prototypes moving infinitely fast from place to place, so that they each occupy numerous locations at all times. This may sound implausible until we recall that the universe actually is populated by apparently indistinguishable electrons and protons, and in fact according to quantum mechanics the individual identities of those particles are ambiguous in many circumstances. John Wheeler once seriously toyed with the idea that there is only a single electron in the universe, weaving its way back and forth through time. Admittedly there are problems with such theories, but the point is that causality and the directionality of time are far from being straightforward principles. Moreover, even if we agree to exclude infinite speeds, i.e., that the composition of any two finite speeds must yield a finite speed, we haven't really accomplished anything, because the Galilean composition law has this same property. Every real number is

finite, but it does not follow that there must be some finite upper bound on the real numbers. More fundamentally, it's important to recognize that the Minkowski structure of spacetime doesn't, by itself, automatically rule out speeds above the characteristic speed c (nor does it imply temporal asymmetry). Strictly speaking, a separate assumption is required to rule out "tachyons". Thus, we can't really say that Minkowskian spacetime is prima facie any more consistent with causality than is Galilean spacetime. A more persuasive argument for a finite upper bound on speeds can be based on the idea of locality, as mentioned in our review of the shortcomings of the Galilean transformation rule. If the spatial ordering of events is to have any absolute significance, in spite of the fact that distance can be transformed away by motion, it seems that there must be some definite limit on speeds. Also, the continuity and identity of objects from one instant to the next (ignoring the lessons of quantum mechanics) is most intelligible in the context of a unified spacetime manifold with a definite non-singular connection, which implies a finite upper bound on speeds. This is in the spirit of Minkowski's 1908 lecture in which he urged the greater "mathematical intelligibility" of the Lorentzian group as opposed to the Galilean group of transformations. For a typical derivation of the Lorentz transformation in this axiomatic spirit, we may begin with the basic Galilean program of seeking to identify coordinate systems with respect to which physical phenomena are optimally simple. We have the fundamental principle that for any material object in any state of motion there exists a system of space and time coordinates with respect to which the object is instantaneously at rest and Newton's laws of inertial motion hold good (at least quasi-statically). Such a system is called an inertial rest frame coordinate system of the object. Let x,t denote inertial rest frame coordinates of one object, and let x',t' denote inertial rest frame coordinates of another object moving with a speed v in the positive x direction relative to the x,t coordinates. How are these two coordinate systems related? We can arrange for the origins of the coordinate systems to coincide. Also, since these coordinate systems are defined such that an object in uniform motion with respect to one such system must be in uniform motion with respect to all such systems, and such that inertia isotropic, it follows that they must be linearly related by the general form x' = Ax + Bt and t' = Cx + Dt, where A,B,C,D are constants for a given value of v. The differential form of these equations is dx' = Adx + Bdt and dt' = Cdx + Ddt. Now, since the second object is stationary at the origin of the x',t' coordinates, it's position is always x' = 0, so the first transformation equation gives 0 = Adx + Bdt, which implies dx/dt = B/A = v and hence B = Av. Also, if we solve the two transformation equations for x and t we get (ADBC)x = Dx'  Bt', (ADBC)t = Cx' + A. Since the first object is moving with velocity v relative to the x',t' coordinates we have v = dx'/dt' = B/D, which implies B = Dv and hence A = D. Furthermore, reciprocity demands that the determinant AD  BC = A2 + vAC of the transformation must equal unity, so we have C = (1A2)/(vA). Combining all these facts, a linear, reciprocal, unitary transformation from one system of inertial coordinates to another must be of the form

It only remains to determine the value of A (as a function of v), which we can do by fixing the quantity in the square brackets. Letting k denote this quantity for a given v, the transformation can be written in the form

Any two inertial coordinate systems must be related by a transformation of this form, where v is the mutual speed between them. Also, note that

Given three systems of inertial coordinates with the mutual speed v between the first two and u between the second two, the transformation from the first to the third is the composition of transformations with parameters kv and ku. Letting x”,t” denote the third system of coordinates, we have by direct substitution

The coefficient of t in the denominator of the right side must be unity, so we have ku = kv, and therefore k is a constant for all v, with units of an inverse squared speed. Also, the coefficient of t in the numerator must be the mutual speed between the first and third coordinate systems. Thus, letting w denote this speed, we have

It’s easy to show that this is the necessary and sufficient condition for the composite transformation to have the required form. Now, if the value of the constant k is non-zero, we can normalize its magnitude by a suitable choice of space and time units, so that the only three fundamentally distinct possibilities to consider are k = -1, 0, and +1. Setting k = 0 gives the familiar Galilean transformation x' = x  vt, t' = t. This is highly asymmetrical between the time and space parameters, in the sense that it makes the transformed space parameter a function of both the space coordinate and the time coordinate of the original system, whereas the

transformed time coordinate is dependent only on the time coordinate of the original system. Alternatively, for the case k = -1 we have the transformation

Letting θ denote the angle that the line from the origin to the point (x,t) makes with the t axis, then tan(θ) = v = dx/dt, and we have the trigonometric identities cos(θ) = 1/(1+v2)1/2 and sin(θ) = v/(1+v2)1/2. Therefore, this transformation can be written in the form

which is just a Euclidean rotation in the xt plane. Under this transformation the quantity (dx)2 + (dt)2 = (dx')2 + (dt')2 is invariant. This transformation is clearly too symmetrical between x and t, because know from experience that we cannot turn around in time as easily as we can turn around in space. The only remaining alternative is to set k = 1, which gives the transformation

Although perfectly symmetrical, this maintains the absolute distinction between spatial and temporal intervals. This can be parameterized as a hyperbolic rotation

and we have the invariant quantity (dx)2  (dt)2 = (dx')2  (dt')2 for any given interval. It's hardly surprising that this transformation, rather than either the Galilean transformation or the Euclidean transformation, gives the actual relationship between space and time coordinate systems with respect to which inertia is directionally symmetrical and inertial motion is linear. From purely formal considerations we can see that the Galilean transformation, given by setting k = 0, is incomplete and has no spacetime invariant, whereas the Euclidean transformation, given by setting k = -1, makes no distinction at all between space and time. Only the Lorentzian transformation, given by setting k = 1, has completely satisfactory properties from an abstract point of view, which is presumably why Minkowski referred to it as "more intelligible". As plausible as such arguments may be, they don't amount to a logical deduction, and one is left with the impression that we have not succeeded in identifying any fundamental principle or symmetry that uniquely selects Lorentzian spacetime rather than Galilean space and time. Accordingly, most writers on the subject have concluded (reluctantly) that Einstein's light speed postulate, or something like it, is indispensable for deriving

special relativity, and that we can be persuaded to adopt such a postulate only by empirical facts. Indeed, later in the same paper where Minkowski exercised his staircase wit, he admitted that "the impulse and true motivation for assuming the group Gc came from the fact that the differential equation for the propagation of light [i.e., the wave equation] in empty space possesses the group Gc", and he referred back to Voigt's 1887 paper (see Section 1.4). Nevertheless, it's still interesting to explore the various rational "intelligibility" arguments that can be put forward as to why space and time must be Minkowskian. A typical approach is to begin with three speeds u,v,w representing the pairwise speeds between three co-linear particles, and to seek a composition law of the form Q(u,v,w) = 0 relating these speeds. It's easy to make the case that it should be possible to uniquely solve this function explicitly for any of the speeds in terms of the other two, which implies that Q must be linear in all three of its arguments. The most general linear function of three variables is Q(u,v,w) = Auvw + Buv + Cuw + Dvw + Eu + Fv + Gw + H where A,B,...H are constants. Treating the speeds symmetrically requires B = C = D and E = F = G. Also, if any two of the speeds is 0 we require the third speed to be 0 (transitivity), so we have H = 0. Also, if any one of the speeds, say u, is 0, then we require v = -w (reciprocity), but with u = 0 and v = -w the formula reduces to -Dv2 + Fv  Gv = 0, and since F = G (= E) this is just Dv2 = 0, so it follows that B = C = D = 0. Hence the most general function that satisfies our requirements of linearity, 3-way symmetry, transitivity, and reciprocity is Q(u,v,w) = Auvw + E(u+v+w) = 0. It's clear that E must be non-zero (since otherwise general reciprocity would not be imposed when any one of the variables vanished), so we can divide this function by E, and let k denote A/E, to give

We see that this k is the same as the one discussed previously. As before, the only three distinct cases are k = -1, 0, and +1. If k = 0 we have the Galilean composition law, and if k = 1 we have the Einsteinian composition law. How are we to decide? In the next section we consider the problem from a slightly different perspective, and focus on a unique symmetry that arises only with k = 1.

1.8 Another Symmetry I cannot quite imagine it possible that any physical meaning be afforded to substitutions of reciprocal radii… It does seem to me that you are very much over-estimating the value of purely formal approaches… Albert Einstein to Felix Klein in 1916

We saw in previous sections that Maxwell’s equations are invariant under Lorentz transformations, as well as translations and spatial rotations. Together these transformations comprise the Poincare group. Of course, Maxwell’s equations are also invariant under spatial and temporal reflections, but it is often overlooked that in addition to all these linear transformations, Maxwell’s equations possess still another symmetry, namely, the symmetry of spacetime inversion. In a sense, an inversion is a kind of reflection about a surface in spacetime, analogous to inversions about circles in projective geometry, the only difference being that the Minkowski interval is used instead of the Euclidean line element. Consider two events E1 and E2 that are null-separated from each other, meaning that the absolute Minkowski interval between them is zero in terms of an inertial coordinate system x,y,z,t. Let s1 and s2 denote the absolute intervals from the origin to these two events (respectively). Under an inversion of the coordinate system about the surface at an absolute interval R from the origin (which may be chosen arbitrarily), each event located on a given ray through the origin is moved to another point on that ray such that its absolute interval from the origin is changed from s to R2/s. Thus the hyperbolic surfaces outside of R are mapped to surfaces inside R, and vice versa. To prove that two events originally separated by a null Minkowski interval are still nullseparated after the coordinates have been inverted, note that the ray from the origin to the event Ej can be characterized by constants αj, βj, γj defined by

In terms of these parameters the magnitude of the interval from the origin to Ej can be written as

The squared interval between E1 and E2 can then be expressed as

where

Since inversion leaves each event on its respective ray, the value of K12 for the inverted coordinates is the same as for the original coordinates, so the only effect on the Minkowski interval between E1 and E2 is to replace s1 and s2 with R2/s1 and R2/s2 respectively. Therefore, the squared Minkowski interval between the two events in terms of the inverted coordinates is

The quantity in parentheses on the right side is just the original squared interval, so if the interval was zero in terms of the original coordinates, it is zero in terms of the inverted coordinates. Thus inversion of a system of inertial coordinates yields a system of coordinates in which all the null intervals are preserved. It was shown in 1910 by Bateman and (independently) Cunningham that this is the necessary and sufficient condition for Maxwell’s equations to be invariant. Incidentally, Einstein was dismissive of this invariance when Felix Klein asked him about it. He wrote I am convinced that the covariance of Maxwell’s formulas under transformation according to reciprocal radii can have no deeper significance; although this transformation retains the form of the equations, it does not uphold the correlation between coordinates and the measurement results from measuring rods and clocks. Einstein was similarly dismissive of Minkowski’s “formal approach” to spacetime at first, but later came to appreciate the profound significance of it. In any case, it’s interesting to note that straight lines in inertial coordinate systems map to straight or hyperbolic paths under inversion. This partly accounts for the fact that, according to the Lorentz-Dirac equations of classical electrodynamics, perfect hyperbolic motion is inertial motion, in the sense that there are free-body solutions describing particles in hyperbolic motion, and a charged particle in hyperbolic motion does not radiate. It’s also interesting that the relativistic formula for composition of two speeds is invariant under inversion of the arguments about the speed c, i.e., replacing each speed v with c2/v. Letting f(u,v) denote the composition of the (co-linear) speeds u and v, and choosing units so that c = 1, we can impose the three requirements

The first two requirements are satisfied by both the Galilean and the Lorentzian composition formulas, but the third requirement is not satisfied by the Galilean formula, because that gives

However, somewhat surprisingly, the relativistic composition function gives

so it does comply with all three requirements. This singles out the composition law with k = 1 from the previous chapter. As indicated by Einstein’s reply to Klein, the physical significance of such inversion symmetries is obscure, and we should also note that the spacetime inversion is not equivalent to the speed inversion, although they are formally very similar. To clarify how this symmetry arises in the relativistic context, recall that we had derived at the end of the previous chapter the relation

where u = v12, v = v23, and w = v31. The symbol vij signifies the speed of the ith particle in terms of the inertial rest frame coordinates of the jth particle. With k = 0 this corresponds to the Galilean speed composition formula, which clearly is not invariant under inversion of any or all of the speeds. For any non-zero value of k, equation (1) can be re-written in the form

Squaring both sides of this equation gives the equality

If we replace each speed with its inversion in this formula, and then multiply through by (uvw)2 / k3 we get

which is equivalent to the preceding formula if and only if

Hence the speed composition formula is invariant under inversion if k = ±1. The case k = 1 is equivalent to the case k = +1 if each speed is taken to be imaginary (corresponding to the use of an imaginary time axis), so without loss of generality we can choose k = +1

with real speeds. There remains, however, the ambiguity introduced by squaring both sides of equation (2), suppressing the signs of the factors. Equation (2) itself, without squaring, is invariant under inversion of any two of the speeds, but the inversion of all three speeds changes the sign of the right side. Thus by squaring both sides of (2) we make it consistent with either of the two complementary relations

The left hand relation is invariant under inversion of any two of the speeds, whereas the right hand relation is invariant under inversion of one or all three of the speeds. The question, then, is why the first formula applies rather than the second. To answer this, we should first point out that, despite the formal symmetry of the quantities u,v,w in these equations, they are not conceptually symmetrical. Two of the quantities are implicitly defined in terms of one inertial coordinate system, and the third quantity is defined in terms of a different inertial coordinate system. In general, there are nine conceptually distinct speeds for three co-linear particles in terms of the three rest frame coordinate systems, namely

where vij is the speed of the ith particle in terms of the inertial rest frame coordinates of the jth particle. By definition we have vii = 0 and by reciprocity we have vij = vji, so the speeds comprise an anti-symmetric array. Thus, although the three speeds v12, v23, v31 are nominally defined in terms of three different systems of coordinates, any two of them can be expressed in terms of a single coordinate system by invoking the reciprocity relation. For example, the three quantities v12, v23, v31 can be expressed in the form v12, v32, v31, which signifies that the first two speeds are both defined in terms of the rest frame coordinates of frame 2. However, the remaining speed does not have a direct expression in terms of that frame, so a composition formula is needed to relate all three quantities. We’ve seen that the relativistic composition formula yields the same value for the third speed (e.g., the speed defined in terms of frame 1) regardless of whether we use the two other speeds (e.g., the speeds defined in terms of frame 2) or their reciprocals. To more clearly exhibit the peculiar 2+1 symmetry of this velocity composition law, note that it can be expressed in multiplicative form as

where vij denotes the speed of object j with respect to object i. Clearly if we replace any two of the speeds with their reciprocals, the relation remains unchanged. On the other hand, if we replace just one or all three of the speeds with their reciprocals, their product

is still unity, but the sign is negated. Thus, one way of expressing the full symmetry of this relation would be to square both sides, giving the result

which is completely invariant under any replacement of one or more speeds with their respective reciprocals. Naturally we can extend the product of factors of the form (1+vij)/(1vij) to any cyclical sequence of relative speeds between any number of co-linear points. It’s interesting to note the progression of relations between the speeds involving one, two, and three particles. The relativity of position is expressed by the identity

for any one particle, and the relativity of velocity can be expressed by the skew symmetry

for any two systems particles. (This was referred to earlier as the reciprocity condition vij = vji.) The next step is to consider the cyclic sum involving three particles and their respective inertial rest frame coordinate systems. This is the key relation, because all higher-order relations can be reduced to this. If acceleration were relative (like position and velocity), we would expect the cyclic symmetry vij + vjk + vki = 0, which is a linear function of all three components. Indeed, this is the Galilean composition formula. However, since acceleration is absolute, it's to be expected that the actual relation is nonlinear in each of the three components. So, instead of vanishing, we need the right side of this sum to be a symmetric function of the terms. The only other odd elementary symmetric function of three quantities is the product of all three, so we're led (again) to the relation

which can be regarded as the law of inertia. Since there is only one odd elementary symmetric function of one variable, and likewise for two variables, the case of three variables is the first for which there exists a non-tautological expression of this form. We may also note a formal correspondence with De Morgan's law for logical statements. Letting sums denote logical ORs (unions), products denote logical ANDs (intersections), and overbars denote logical negation, De Morgan’s law states that

for any three logical variables X,Y,Z. Now, using the skew symmetry property, we can "negate" each velocity on the right hand side of the previous expression to give

From this standpoint the right hand side is analogous to the "logical negation" of the left hand side, which makes the relation analogous to setting the quantity equal to zero. The justification for regarding this relation as the source of inertia becomes more clear in Section 2.3, which describes how the relativistic composition law for velocities accounts for the increasing inertia of an accelerating object. This leads to the view that inertia itself is, in some sense, a consequence of the non-linearity of velocity compositions. Given the composition law u' = (u+v)/(1+uv) for co-linear speeds, what can we say about the transformation of the coordinates x and t themselves under the action of the velocity v? The composition law can be written in the form vuu'+u'u = v, which has a natural factorization if we multiply through by v and subtract 1 from both sides, giving

If u and u' are taken to be the spatio-temporal ratios x/t and x'/t', the above relation can be written in the form

On the other hand, remembering that we can insert the reciprocals of any two of the quantities u, u', v without disturbing the equality, we can take u and u' to be the temporalspatial ratios t/x and t'/x' in (3) to give

These last two equations immediately give

Treating the primed and unprimed frames equivalently, and recalling that v' = v, we see that (4) has a perfectly symmetrical factorization, so we exploit this factorization to give the transformation equations

These are the Lorentz transformations for velocity v in the x direction. The y and z coordinates are unaffected, so we have y' = y and z' = z. From this it follows that the quantity t2 – x2 – y2 – z2 is invariant under a general Lorentz transformation, so we have arrived at the full Minkowski spacetime metric. Now, to determine the full velocity composition law for two systems of aligned coordinates k and K, the latter moving in the positive x direction with velocity v relative to the former, we can without loss of generality make the origins of the two systems both coincide with a point P0 on the subject worldline, and let P1 denote a subsequent point on that worldline with k system coordinates dt,dx,dy,dz. By definition the velocity components of that worldline with respect to k are ux = dx/dt, uy = dy/dt, and uz = dz/dt. The coordinates of P1 with respect to the K system are given by the Lorentz transformation for a simple boost v in the x direction:

where γ = . Therefore, the velocity components of the worldline with respect to the K system are

1.9 Null Coordinates Slight not what’s near through aiming at what’s far. Euripides, 455 BC Initially the special theory of relativity was regarded as just a particularly simple and elegant interpretation of Lorentz's ether theory, but it soon became clear that there is a profound heuristic difference between the two theories, most evident when we consider

the singularity implicit in the Lorentz transformation x' = γ(xvt), t' = γ(tvx), where γ = 1/(1v2)1/2. As v approaches arbitrarily close to 1, the factor γ goes to infinity. If these relations are strictly valid (locally), as all our observations and experiements suggest, then according to Lorentz's view all configurations of objects moving through the absolute ether must be capable of infinite spatial "contractions" and temporal "dilations", without the slightest distortion. This is clearly unrealistic. Hence the only plausible justification for the Lorentzian view is a belief that the Lorentz transformation equations are not strictly valid, i.e., that they must break down at some point. Indeed, this was Lorentz's ultimate justification, as he held to the possibility that absolute speed might, after all, make some difference to the intrinsic relations between physical entities. However, one hundred years after Lorentz's time, there still is no evidence to support his suspicion. To the contrary, all the tremendous advances of the last century in testing the Lorentz transformation "to the nth degree" have consistently confirmed it's exact validity. At some point a reasonable person must ask himself "What if the Lorentz transformation really is exactly correct?" This is a possibility that a neo-etherist cannot permit himself to contemplate - because the absolute physical singularity along light-like intervals implied by the Lorentz transformation is plainly incompatible with any realistic ether - but it is precisely what special relativity requires us to consider, and this ultimately leads to a completely new and more powerful view of causality. The singularity of the Lorentz transformation is most clearly expressed in terms of the underlying Minkowski pseudo-metric. Recall that the invariant space time interval dτ between the events (t,x) and (t+dt, x+dx) is given by (dτ2 = (dt)2  (dx)2 where t and x are any set of inertial coordinates. This is called a pseudo-metric rather than a metric because, unlike a true metric, it doesn't satisfy the triangle inequality, and the interval between distinct points can be zero. This occurs for any interval such that dt = dx, in which case the invariant interval dτ is literally zero. Arguably, it is only in the context of Minkowski spacetime, with its null connections between distinct events, that phenomena involving quantum entanglement can be rationalized. Pictorially, the locus of points whose squared distance from the origin is 1 consists of the two hyperbolas labeled +1 and -1 in the figure below.

The diagonal axes denoted by α and β represents the paths of light through the origin, and the magnitude of the squared spacetime interval along these axes is 0, i.e., the metric is degenerate along those lines. This is all expressed in terms of conventional space and time coordinates, but it's also possible to define the spacetime separations between events in terms of null coordinates along the light-line axes. Conceptually, we rotate the above figure by 45 degrees, and regard the α and β lines as our coordinate axes, as shown below:

In terms of a linear parameterization (α,β) of these "null coordinates" the locus of points at a squared "distance" (dτ2 from the origin is an orthogonal hyperbola satisfying the equation (dτ2 = (dαdβ Since the light-lines α and β are degenerate, in the sense that the absolute spacetime intervals along those lines vanish, the absolute velocity of a worldline, given by the "slope" dβ/dα = 0/0, is strictly undefined. This indeterminacy, arising from the singular null intervals in spacetime, is at the heart of special relativity, allowing for infinitely

many different scalings of the light-line coordinates. In particular, it is natural to define the rest frame coordinates αβ of any worldline in such a way that dα/dβ = 1. This expresses the principle of relativity, and also entails Einstein's second principle, i.e., that the (local) velocity of light with respect to the natural measures of space and time for any worldline is unity. The relationship between the natural null coordinates of any two worldlines is then expressed by the requirement that, for any given interval dτ the components dα,dβ with respect to one frame are related to the components dα',dβ' with respect to another frame according to the equation (dα)(dβ) = (dα')(dβ'). It follows that the scale factors of any two frames Si and Sj are related according to

where vij is the usual velocity parameter (in units such that c = 1) of the origin of Sj with respect to Si. Notice there is no absolute constraint on the scaling of the α and β axes, there is only a relative constraint, so the "gage" of the light-lines really is indeterminate. Also, the scale factors are simply the relativistic Doppler shifts for approaching and receding sources. This accords with the view of the αβ coordinate "grid lines" as the network of light-lines emitted by a strobed source moving along the reference world-line. To illustrate how we can operate with these null coordinate scale relations, let us derive the addition rule for velocities. Given three co-linear unaccelerated particles with the pairwise relative velocity parameters v12, v23, and v13, we can solve the "α scale" relation for v13 to give

We also have

Multiplying these together gives an expression for dα1/dα3, which can be substituted into (1) to give the expected result

Interestingly, although neither the velocity parameter v nor the quantity (1+v)/(1v) is additive, it's easy to see that the parameter ln[(1+v)/(1v)] is additive. In fact, this parameter corresponds to the arc length of the "dτ = constant" hyperbola connecting the two world lines at unit distances from their intersection, as shown by integrating the differential distance along that curve

Since the equation of the hyperbola for dτ = 1 is 1 = dt2  dx2 we have

Substituting this into the previous expression and performing the integration gives

Recalling that dτ2 = dt2  dx2, we have dt + dx = dτ2 / (dt  dx), so the quantity dx + dt can be written as

Hence the absolute arc length along the dτ = 1 surface between two world lines that intersect at the origin with a mutual velocity v is

Naturally the additivity of this logarithmic form implies that the argument is a multiplicative measure of mutual speeds. The absolute interval between the intersection points of the two worldlines with the dτ = 1 hyperbola is

One strength of the conventional pseudo-metrical formalism is that (t,x) coordinates easily generalize to (t,x,y,z) coordinates, and the invariant interval generalizes to (dτ)2 = (dt)2  (dx)2  (dy)2  (dz)2 The generalization of the null (lightlike) coordinates and corresponding invariant is not as algebraically straightforward, but it conveys some interesting aspects of the spacetime structure. Intuitively, an observer can conceive of the absolute interval between himself and some distant future event P by first establishing a scale of radial measure outward on his forward light cone in all directions, and then for each direction evaluate the parameterized null measure along the light cone to the point of intersection with the backward null cone of P. This will assign, to each direction in space, a parameterized

distance from the observer to the backward light cone of P, and there will be (in flat spacetime) two distinguished directions, along which the null measure is maximum or minimum. These are the principle directions for the interval from the observer to E, and the product of the null measures in these directions is invariant. In other words, if a second observer, momentarily coincident with the first but with some relative velocity, determines the null measures along the principle directions to the backward light cone of E, with respect to his own natural parameterization, the product will be the same as found by the first observer. It's often convenient to take the interval to the point P as the time axis of inertial coordinates t,x,y,z, so the eigenvectors of the null cone intersections become singular, and we can simply define the null coordinates u = t + r, v = t  r, where r = (x2+y2+z2)1/2. From this we have t = (u+r)/2 and r = (uv)/2 along with the corresponding differentials dt = (du+dv)/2 and dr = (dudv)/2. Making these substitutions into the usual Minkowski metric in terms of polar coordinates

we have the Minkowski line element in terms of angles and null coordinates

These coordinates are often useful, but we can establish a more generic system of null coordinates in 3+1 dimensional spacetime by arbitrarily choosing four non-parallel directions in space from an observer at O, and then the coordinates of any timelike separated event are expressed as the four null measures radially in those directions along the forward null cone of O to the backward null cone of P. This provides enough information to fully specify the interval OP. In terms of the usual orthogonal spacetime coordinates, we specify the coordinates (T,X,Y,Z) of event P relative to the observer O at the origin in terms of the coordinates of four events I1, I2, I3, I4 on the intersection of the forward null cone of O and the backward null cone of P. If ti,xi,yi,zi denote the conventional coordinates of Ii, then we have ti2 = xi2 + yi2 + zi2

(T  ti)2 = (X  xi)2 + (Y  yi)2 + (Z  zi)2

for i = 1, 2, 3, 4. Expanding the right hand equations and canceling based on the left hand equalities, we have the system of equations

The left hand side of all four of these equations is the invariant squared proper time interval τ2 from O to P, and we wish to express this in terms of just the four null measures

in the four chosen directions. For a specified set of directions in space, this information can be conveyed by the four values t1, t2, t3, and t4, since the magnitudes of the spatial components are determined by the directions of the axes and the magnitude of the corresponding t. In general we can define the direction coefficients aij such that

with the condition ai12 + ai22 + ai32 = 1. Making these substitutions, the system of equations can be written in matrix form as

We can use any four directions for which the determinant of the coefficient matrix does not vanish. One natural choice is to use the vertices of a tetrahedron inscribed in a unit sphere, so that the four directions are perfectly symmetrical. We can take as the coordinates of the vertices

Inserting these values for the direction coefficients aij, we can solve the matrix equation for T, X, Y, and Z to give

Substituting into the relation τ2 = T2  X2  Y2  Z2 and solving for τ2 gives

Naturally if t1 = t2 = t3 = t4 = t, then this gives τ = 2t. Also, notice that, as expected, this expression is perfectly symmetrical in the four lightlike coordinates. It's interesting that if the right hand term was absent, then τ would be simply the harmonic mean of the ti. More generally, in a spacetime of 1 + (D1) dimensions, the invariant interval in terms of D perfectly symmetrical null measures t1, t2,..., tD satisfies the equation

It can be verified that with D = 2 this expression reduces to τ2 = 4t1t2 , which agrees with our earlier hyperbolic formulation τ2 = αβ with α = 2t1 and β=2t2. In the particular case D = 4, if we define U = 2/τ and uj = 1/(2tj) this equation can be written in the form

where σ is the average squared difference of the individual u terms from the average, i.e.,

This is the statistical variance of the uj values. Incidentally, we've seen that the usual representation s2 = x2  t2 of the invariant spacetime interval is a generalization of the familiar Pythagorean "sum-of-squares" equation of a circle, whereas the interval can also be expressed in the hyperbolic form s2 = αβ. This reminds us of other fundamental relations of physics that have found expression as hyperbolic relations, such as the uncertainty relations

in quantum mechanics, where h is Planck's constant. In general if the operators A,B corresponding to two observables do not commute (i.e., if AB  BA  0), then an uncertainty relation applies to those two observables, and they are said to be incompatible. Spatial position and momentum are maximally incompatible, as are energy and time. Such pairs of variables are called conjugates. This naturally raises the question of whether the variables parameterizing two oppositely directed null rays in spacetime can, in some sense, be regarded as conjugates, accounting for the invariance of their product. Indeed the special theory of relativity can be interpreted in terms of a fundamental limitation on our ability to make measurements, just as can the theory of quantum mechanics. In quantum mechanics we say that it's not possible to simultaneously measure the values of two conjugate variables such that the product of the uncertainties of those two measurements is less than h/4π. Likewise in special relativity we could say that it's not possible to measure the time difference dt between two events separated by the spatial distance dx such the ratio dt/dx of the variables is less than 1/c. In quantum mechanics we may imagine that the particle possesses a precise position and momentum, even though we are unable to determine it due to practical limitations of our measurement techniques. If only we have infinitely weak signal, i.e., if only h = 0, we could measure things with infinite precision. Likewise in special relativity we may imagine that there is an absolute and precise relationship between the times of two distant

events, but we are prevented from determining it due to the practical limitations. If only we had an infinnitely fast signal, i.e., if only 1/c was zero, we could measure things with infinite precision. In other words, nature possesses structure and information that is inaccessible to us (hidden variables), due to the limitations of our measuring capabilities. However, it's also possible to regard the limitations imposed by quantum mechanics (h  0) and special relativity (1/c  0) not as limitations of measurement, but as expressions of an actual ambiguity and "incompatibility" in the independent meanings of those variables. Einstein's central contribution to modern relativity was the idea that there is no one "true" simultaneity between spatially separate events, but rather spacetime events are only partially ordered, and the decomposition of space and time into separate variables contains an inherent ambiguity on the scale of 1/c. In other words, he rejected Lorentz's "hidden variable" approach, and insisted on treating the ambiguity in the spacetime decomposition as fundamental. This is interesting in part because, when it came to quantum mechanics, Einstein's instinct was to continue trying to find ways of measuring the "hidden variables", and he was never comfortable with the idea that the Heisenberg uncertainty relations express a fundamental ambiguity in the decomposition of conjugate variables on the scale of h. (Late in life, as Einstein continued arguing against Bohr's notion of complementarity in quantum mechanics, one of his younger collegues said "But Professor Einstein, you yourself originated this kind of positivist reasoning about conjugate variables in the theory of space and time", to which Einstein replied "Well, perhaps I did, but it's nonsense all the same".) Another model suggested by the relativistic interpretation of spacetime is to conceive of space and time as two superimposed waves, combining constructively in the directions of the space and time axes, but destructively (i.e., cancelling out) along light lines. For any given inertial coordinate system x,t, we can associate with each event an angle θ defined by tan(θ) = t/x. Thus the interval from the origin to the point x,t makes an angle θ with the positive x axis, and we have t = x tan(θ), so we can express the squared magnitude of a spacelike interval as

Multiplying through by cos(θ)2 gives

Substituting t2 / tan(θ)2 for x2 gives the analogous expression

Adding these two expressions gives the result

Consequently the "circular" locus of events satisfying x2 + t2 = r2 for any fixed r can be represented in polar coordinates (s,θ) by the equation

which is the equation of two lemniscates, as illustrated below.

The lemniscate was first discussed by Jakob Bernoulli in 1694, as the locus of points satisfying the equation

which is, in Bernoulli's words, "a lying eight-like figure, folded in a knot of a bundle, or of a lemniscus, a knot of a French ribbon". (The study of this curve led Fagnano, Euler, Legendre, Gauss, and others to the discovery of addition theorems for integrals, of which the relativistic velocity composition law is an example.) Notice that the lemniscate is the inverse (in the sense of inversive geometry) of the hyperbola relative to the circle of radius k. In other words, if we draw a line emanating from the origin and it strikes the lemniscate at the radius s, then it strikes the hyperbola at the radius R where sR = k2. This follows from the fact that the equation for a hyperbola in polar coordinates is R2 = k2/[E2 cos(θ)2  1] where E is the eccentricity, and for an orthogonal hyperbola we have E = . Hence the denominator is 2cos(θ)2  1 = cos(2θ), and the equation of the hyperbola is R2 = k2/cos(2θ). Since the polar equation for the lemniscate is s2 = k2cos(2θ) we have sR = k2. 2.1 The Spacetime Interval …and then it was There interposed a fly, With blue, uncertain, stumbling buzz, Between the light and me,

And then the windows failed, and then I could not see to see. Emily Dickinson, 1879 The advance of the quantum wave function of any physical system as it passes uniformly from the event (t,x,y,z) to the event (t+dt, x+dx, y+dy, z+dz) is proportional to the value of dτ given by

where t,x,y,z are any system of inertial coordinates and c is a constant (the speed of light, equal to 300 meters per microsecond). The quantity dτ is called the elapsed proper time of the interval, and it is invariant with respect to any system of inertial coordinates. To illustrate, consider a muon particle, which has a radioactive mean life of roughly 2 µsec with respect to its inertial rest frame coordinates. In other words, between the appearance of a typical muon (arising from, say, the decay of a pion) and its decay there is an interval of about 2 µsec in terms of the time coordinate of the muon's inertial rest frame, so the components of this interval are {2,0,0,0}, and the quantum phase of the particle advances by an amount proportional to dτ, where

Now suppose we assess this same physical phenomenon with respect to a relatively moving system of inertial coordinates, e.g., a system with respect to which the muon moved from the spatial origin [0,0,0] all the way to the spatial position [980m, -750m, 1270m] before it decayed. With respect to these coordinates, the muon traveled a spatial distance of 1771 meters. Since the advance of the quantum wave function (i.e., the proper time) of a system or particle over any interval of its worldline is invariant, the corresponding time component of this physical interval with respect to these relatively moving inertial coordinates must be much greater than 2 µsec. If we let (dT,dX,dY,dZ) denote the components of this interval with respect to the relatively moving system of inertial coordinates, we must have

Solving for dT and substituting for the spatial components noted above, we have

This represents the time component of the muon decay interval with respect to the

moving system of inertial coordinates. Since the muon has moved a spatial distance of 1771 meters in 6.23 µsec, we see that its velocity with respect to these coordinates is 284 m/µsec, which is 0.947c. The identification of the spacetime interval with quantum phase applies to null intervals as well, consistent with the fact that the quantum phase of a photon does not advance at all between its emission and absorption. (For a further discussion of this, see Section 9.10.) Hence the physical significance of a null spacetime interval is that the quantum state of any system is constant along that interval. In other words, the interval represents a single quantum state of the system. It follows that the emission and absorption of a photon must be regarded as, in some sense, a single quantum event. Note, however, that the quantum phase is path dependent. In other words, two particles at opposite ends of a lightlike (null) interval do not share the same quantum state unless the second particle reached that event by passing along that null interval. Hence the concept of the spacetime interval as a measure of the phase of the quantum wave function does not conflict with the exclusion principle for fermions such as electrons, because even though two electrons can be null-separated, they cannot have separated along that null path, because they have non-zero rest mass. Of course, it is possible for two photons at opposite ends of a null interval to have reached that condition by progressing along that interval, in which case they represent the same quantum phase (and in some sense may be regarded as "the same photon"), but photons are bosons, and hence not excluded from occupying the same state. In fact, the presence of one photon in a particular quantum state actually enhances the probability of another photon entering that state. (This is responsible for the phenomenon of stimulated emission, which is the basis of operation of lasers.) In this regard it's interesting to consider neutrinos, which (like electrons) are fermions, meaning that they have anti-symmetric eigenfunctions, and hence are subject to the Pauli exclusion principle. On the other hand, neutrinos were traditionally regarded as massless, meaning they propagate along null intervals. This raises the prospect of two instances of a neutrino at opposite ends of a null interval, with the second occupying the same quantum state as the first, in violation of the exclusion principle for fermions. It might be argued that these two instances are really the same neutrino, and a particle obviously can't exclude itself from occupying its own state. However, this is somewhat problematic due to the indistinguishability and the lack of definite identities for individual particles. A different approach would be to argue that all fermions, including neutrinos, must have mass, and thus be excluded from traveling along null intervals. The idea that neutrinos actually do have mass seems to be supported by recent experimental observations, but the questions remains open. Based on the general identification of the invariant magnitude (proper time) of a timelike interval with quantum phase along that interval, it follows that all physical processes and characteristic sequences of events will evolve in proportion to this quantity. The name "proper time" is appropriate because this quantity represents the most meaningful known measure of elapsed time along that interval, based on the fact that the quantum state is the

most complete possible description of physical reality. Since not all spacetime intervals are timelike, we conclude that the temporal relations between events induce only a partial ordering, rather than a total ordering (as discussed in Section 1.2), because a set of events can be totally ordered only if they are each inside the future or past null cone of each of the others. This doesn't hold if any of the pairwise intervals is spacelike. As a consequence of this partial ordering, between two fixed timelike separated events there exist timelike paths with different lapses of proper time. Admittedly a partial ordering of events has been considered unacceptable by some people, basically because they regard total temporal ordering in a classical Cartesian setting as an inviolable first principle. Rather than accept partial ordering they prefer to (more or less arbitrarily) select one particular inertial reference system and declare it to be the "true" configuration, as in Lorentz's original theory, in an attempt to restore an unambiguous total temporal ordering to events. They then account for the apparent differences in elapsed time (as in muon observations) by regarding them as effects of absolute velocity relative to the "true" frame of reference, again following Lorentz. However, unlike Lorentz, we now have a theory of quantum mechanics, and the quantum state of a system gives (arguably) the most complete possible objective description of the system. Therefore, modern advocates of total temporal ordering face the daunting task of finding some mechanism underlying quantum mechanics (i.e., hidden variables) to provide a physical significance for their preferred total ordering. Unfortunately, the only prospects for a viable hidden-variable theory seem to be things like the explicitly nonlocal contrivances described by David Bohm, which must surely be anathema to those who seek a physics based on classical Cartesian mechanisms. So, although the theories of relativity and quantum mechanics are in some respects incongruent, it is nevertheless true that the (putative) validity and completeness of quantum mechanics constitutes one of the strongest argument in favor of the relativistic interpretation of Lorentz invariance. We should also mention that a tacit assumption has been made above, namely, the assumption of physical equivalence between instantaneously co-moving frames, regardless of acceleration. For example, we assume that two co-moving clocks will keep time at the same instantaneous rate, even if one is accelerating and the other is not. This is just a hypothesis - we have no a priori reason to rule out physical effects of the 2nd, 3rd, 4th,... time derivatives. It just so happens that when we construct a theory on this basis, it works pretty well. (Similarly we have no a priori reason to think the field equations necessarily depend only on the metric and its 1st and 2nd derivatives; but it works.) Another way of expressing this "clock hypothesis" is to say that an ideal clock is unaffected by acceleration, and to regard this as the definition of an "ideal clock", i.e., one that compensates for any effects of 2nd or higher derivatives. Of course the physical significance of this definition arises from the hypothesized fact that acceleration is absolute, and therefore perfectly detectable (in principle). In contrast, we hypothesize that velocity is perfectly undetectable, which explains why we cannot define our "ideal clock" to compensate for velocity (or, for that matter, position). The point is that these are both assumptions invoked by relativity: (1) the zeroth and first derivatives of position

are perfectly relative and undetectable, and (2) the second and higher derivatives of position are perfectly absolute and detectable. Most treatments of relativity emphasize the first assumption, but the second is no less important. The notion of an ideal clock takes on even more physical significance from the fact that there exist physical entities (such a vibrating atoms, etc) in which the intrinsic forces far exceed any accelerating forces we can apply, so that we have in fact (not just in principle) the ability to observe virtually ideal clocks. For example, in the Rebka and Pound experiments it was found that nuclear clocks were slowed by precisely the factor γ(v), even though subject to accelerations up to 1016 g (which is huge in normal terms, but of course still small relative to nuclear forces). It was emphasized in Section 1 that a pulse of light has no inertial rest frame, but this may seem puzzling at first. The pulse has a well-defined spatial position versus time with respect to some inertial coordinate system, representing a fixed velocity c relative to that system, and we know that any system of orthogonal coordinates in uniform non-rotating motion relative to an inertial coordinate system is also inertial, so why can we not simply apply the velocity c to the base frame to arrive at the rest frame of the light pulse? How can an entity have a well-defined velocity and yet have no well-defined rest frame? The only answer can be that the transformation is singular, i.e., the coordinate system moving with a uniform speed c relative to an inertial frame is not well defined. The singular behavior of the transformation corresponds to the fact that the absolute magnitude of the spacetime intervals along lightlike paths is null. The transformation through a velocity v from the xt to the x't' coordinates is t' = (tvx)/γ and x' = (xvt)/γ where γ = (1v2)1/2, so it's clear that for v = 1 the individual t' and x' components are undefined, but the ratio of dt' over dx' remains well-defined, with magnitude 1 and the opposite sign from v. The singularity of the Lorentz transformation for the speed c suggests that the conception of light as an entity in itself may be somewhat misleading, and it is often useful to regard light as simply an interaction between two massive bodies along a null spacetime interval. Discussions of special relativity often refer to the use of clocks and reflected light signals for the evaluation of spacetime intervals. For example, suppose two identical clocks are moving uniformly with speeds +v and -v along the x axis of a given inertial coordinate system, and these clocks are set to zero at the intersection of their worldlines. When the leftward clock indicates the proper time τ1, it emits a pulse of light, which bounces off the rightward clock when that clock indicates τ2, and arrives back at the leftward clock when that clock reads τ3. This is illustrated in the drawing below.

By similar triangles we immediately have τ2/τ1 = τ3/τ2, and thus τ22 = τ1τ3. Of course, this same relation holds good in Galilean spacetime as well (not to mention Euclidean plane geometry, using distances instead of time intervals), and the reflected signal need not be a light pulse. Any object moving at the same speed (angle) in both directions with respect to this coordinate system would serve just as well, and would lead to the same result that τ2 is the geometric mean of τ1 and τ3. Naturally if we apply any Minkowskian, Galilean, or Euclidean transformation (respectively), the pictorial angles of the lines will differ, but the three absolute intervals will remain unchanged. It is, of course, possible to distinguish between the Galilean and Minkowskian cases based just on the values of the elapsed times, provided we know the relative speeds of the clocks and the signal. In Galilean spacetime each proper time τj equals the coordinate time tj, whereas in Minkowski spacetime it equals (tj2  xj2)1/2 where xj = v tj. Hence the proper time τj in Minkowski spacetime is tj(1  v2)1/2. This might seem to imply that the ratios of proper times are the same in the Galilean and Minkowskian cases, but in fact we have not made a valid comparison for equal relative speeds between the clocks. In this example each clock is moving with speed v away from the midpoint, which implies that the relative speed is 2v in the Galilean case, but only 2v/(1 + v2) in the Minkowskian case. To give a valid comparison for equal relative speeds between the clocks, let's transform the events to a system of coordinate such that the left-hand clock is stationary and the right-hand clock is moving at the speed v. Now this v represents magnitude of the actual relative speed between the two clocks. We now stipulate that the original signal is moving with speed u relative to the left-hand clock, and the reflected signal is moving with speed -u relative to the right-hand clock. The situation is illustrated in the figure below.

The speed, with respect to these coordinates, of the reflected signal is what distinguishes the Galilean from the Minkowskian case. Letting x2 and t2 denote the coordinates of the reflection event, and noting that τ1 = t1 and τ3 = t3, we have v = x2/t2 and u = x2/(t2τ1). We also have

Dividing the numerator and denominator of the expression for u by t2, and replacing x2/t2 with v, gives u = v/[1(τ1/t2)]. Likewise the above expressions can be written as

Solving these equations for the time ratios, we have

Consequently, depending on whether the metric is Galilean or Minkowskian, the ratio of t3 over t1 is given by

respectively. If u happens to be unity (meaning that the signals propagate at the speed of light), these expressions reduce to the squares of the Galilean and relativistic Doppler shift factors, i.e., 1/(1v)2 and (1+v)/(1v), discussed more fully in Section 2.4.

Another distinguishing factor between the two metrics is that with the Minkowski metric the speed of light is invariant with respect to any system of inertial coordinates, so (arguably) we can even say that it represents the same "u" relative to a spacelike interval as it does relative to a timelike interval, in order to adhere to our stipulation that the reflected signal has the speed u relative to "the rest frame of the right-hand clock". Of course, a spacelike interval cannot actually be the worldline of a clock (or any other material object), but the invariance of the speed of light under Minkowskian transformations enables us to rationally apply the same "geometric mean" formula to determine the magnitudes of spacelike intervals, provided we use light-like signals, as illustrated below.

In this case we have τ1 = τ3, so τ22 = τ32, meaning that squared spacelike intervals are negative. 2.2 Force Laws and Maxwell's Equations While speaking of this state, I must immediately call your attention to the curious fact that, although we never lose sight of it, we need by no means go far in attempting to form an image of it and, in fact, we cannot say much about it. Hendrik Lorentz, 1909 Perhaps the most rudimentary scientific observation is that material objects exhibit a natural tendency to move in certain circumstances. For example, objects near the surface of the Earth tend to move in the local "downward" direction, i.e., toward the Earth's center. The Newtonian approach to describing such tendencies was to imagine a "force field" representing a vectorial force per unit charge that is applied to any particle at any given point, and then to postulate that the acceleration vector of each particle equals the applied force divided by the particle's inertial mass. Thus the "charge" of a particle determines how strongly that particle couples with a particular kind of force field, whereas the inertial mass determines how susceptible the particle's velocity is to arbitrary applied forces. In the case of gravity, the coupling charge happens to be the same as the

inertial mass, denoted by m, but for electric and magnetic forces the coupling charge q differs from m. Since the coupling charge and the response coefficient for gravity are identical, it follows that gravity can only operate in a single directional sense, because changing the sign of m for a particle would reverse the sense of both the coupling and the response, leaving the particle's overall behavior unchanged. In other words, if we considered gravitation to apply a repulsive force to a certain particle by setting the particle's coupling charge to -m, we would also set its inertial coefficient to -m, so the particle would still accelerate into the applied force. Of course, the identity of the gravitational coupling and response coefficients not only implies a unique directional sense, it implies a unique quantitative response for all material particles, regardless of m. In contrast, the electric and magnetic coupling charge q is separately specifiable from the inertial coefficient m, so by changing the sign of q while leaving m constant we can represent either negative or positive response, and by changing the ratio of q/m we can scale the quantitative response. According to this classical picture, a small test particle with mass m and electric charge q at a given location in space is subject to a vectorial force f given by

where g is the gravitational field vector, E is the electric field vector, and B is the magnetic field vector at the given location, and v is the velocity vector of the test particle. (See Part 1 of the Appendix for a review of vector products such as the cross product denoted by v  B.) As noted above, the acceleration vector a of the particle is simply f/m, so we have the equation of motion

Given the mass, charge, and initial position of a test particle, and the vectors g,E,B for every point in vicinity of the particle, this equation enables us to compute the particle's subsequent motion. Notice that acceleration of a test particle due to gravity is independent of the particle's properties and state of motion (to the first approximation), whereas the accelerations due to the electric and magnetic fields are both proportional to the particle's charge divided by it's inertial mass. In addition, the contribution of the magnetic field is a function of the particle's velocity. This dependence on the state of motion has important consequences, and leads naturally to the unification of the electric and magnetic fields, but before describing these effects it's worthwhile to briefly review the effect of the classical gravitational field on the motion of a particle. The gravitational acceleration field g at a point p due to a distant particle of mass m was specified classically by Newton's law

where r is the displacement vector (of magnitude r) from the mass particle to the point p. Noting that r2 = x2 + y2 + z2 and r = ix + jy + kz, it's straightforward to verify that the divergence of the gravitational field g vanishes at any point p away from the mass, i.e., we have

(See Part 3 of the Appendix for a review of the  differential operator notation.) The field due to multiple mass particles is just the sum of the individual fields, so the divergence of g due to any configuration of matter vanishes at every point in empty space. Of course, the field is singular (infinite) at any point containing a finite amount of mass, so we can't express the field due to a mass point precisely at the point. However, if we postulate a continuous distribution of gravitational charge (i.e., mass), with a density ρg specified at every point in a region, then it can be shown that the gravitational acceleration field at every point satisfies the equation

Incidentally, if we define the gravitational potential (a scalar field) due to any particle of mass as φ = -m / r where r is the distance from the source particle (and noting that the potential due to multiple particles is simply additive), it's easy to show that

so equations (3) and (4) can be expressed equivalently in terms of the potential, in which case they are called Laplace's equation and Poisson's equation, respectively. The equation of motion for a test particle in the absence of any electromagnetic effects is simply a = g, so equation (2) gives the three components

To illustrate the use of these equations of motion, consider a circular path for our test particle, given by

In this case we see that r is constant and the second derivatives of x and y are rω2sin(wt) and rω2cos(ωt) respectively. The equation of motion for z is identically satisfied and the equations for x and y both reduce to r3ω2 = m, which is Kepler's third law for circular orbits. Newton's analysis of gravity into a vectorial force field and a response was spectacularly successful in quantifying the effects of gravity, and by the beginning of the 20th century

this approach was able to account for nearly all astronomical phenomena in the solar system within the limits of observational accuracy (the only notable exception being a slightly anomalous precession in the orbit of the planet Mercury, as discussed in Section 6.2). Based on this success, it was natural that the other forces of nature would be formalized in a similar way. The next two most obvious forces that apply to material bodies are the electric and magnetic forces, represented by the last two terms in equation (1a). If we imagine that all of space is filled with a mist of tiny electrical charges qi with velocities vi, then we can define the classical charge density ρe and current density j as follows

where ΔV is an incremental volume of space. For the remainder of this section we will omit the subscript "e" with the understanding the ρ signifies the electric charge density. If we let x,y,z denote the position of the incremental quantity of charge, we can write out the individual components of the current density as

Maxwell's equations for the electro-magnetic fields are

where E is the electric field, B is the magnetic field. Equations (5a) and (5b) suggest that the electric and magnetic fields are similar to the gravitational field g, since the divergences at each point equal the respective charge densities, with the difference being that the electric charge density may be positive or negative, and there does not exist (as far as we know) an isolated magnetic charge, i.e., no magnetic monopoles. Equations (5a) and (5b) are both static equations, in the sense that they do not involve the time parameter. By themselves they could be taken to indicate that the electric and magnetic fields are each individually similar to Newton's conception of the gravitational field, i.e., instantaneous "force-at-a-distance". (On this static basis we would presumably never have identified the magnetic field at all, assuming magnetic monopoles don't exist, and that the universe is not subject to any boundary conditions that caused B to be non-zero.) However, equations (5c) and (5d) reveal a completely different aspect of the E and B

fields, namely, that they are dynamically linked together, so the fields are not only functions of each other, but their definitions explicitly involve changes in time. Recall that the Newtonian gravitational field g was defined totally by the instantaneous spatial condition expressed by g = ρg , so at any given instant the Newtonian gravitational field is totally determined by the spatial distribution of mass in that instant, consistent with the notion that simultaneity is absolute. In contrast, Maxwell's equations indicate that the fields E and B depend not only on the distribution of charge at a given putative "instant", but also on the movement of charge (i.e., the current density) and on the rates of change of the fields themselves at that "instant". Since these equations contain a mixture of partial derivatives of the fields E and B with respect to the temporal as well as the spatial coordinates, dimensional consistency requires that the effective units of space and time must have a fixed relation to each other, assuming the units of E and B have a fixed relation. Specifically, the ratio of space units to time units must equal the ratio of electrostatic and electromagnetic units (all with respect to any frame of reference in which the above equations are applicable). This is the reason we were able to write the above equations without constant coefficients, because the fixed absolute ratio between the effective units of measure of time and space enables us to specify all the variables x,y,z,t in the same units. Furthermore, this fixed ratio of space to time units has an extremely important physical significance for electromagnetic fields in empty space, where ρ and j are both zero. To see this, take the curl of both sides of (5c), which gives

Now, for any arbitrary vector S it's easy to verify the identity

Therefore, we can apply this to the left hand side of the preceding equation, and noting that E = 0 in empty space, we are left with

Also, recall that the order of partial differentiation with respect to two parameters doesn't matter, so we can re-write the right-hand side of the above expression as

Finally, since (5d) gives B = E/t in empty space, the above equation becomes

Similarly we can show that

Equations (6a) and (6b) are just the classical wave equation, which implies that electromagnetic changes propagate through empty space at a speed of 1 when using consistent units of space and time. In terms of conventional units this must equal the ratio of the electrostatic and electromagnetic units, which gives the speed

where µ0 and ε0 are the permeability and permittivity of the vacuum. To some extent our choice of units is arbitrary, and in fact we conventionally define our units so that the permeability constant has the value µ0 = 4π 10-7 (kilogrammeter) / (ampere2second2) Since force has units of kgm/sec2 and charge has units of ampsec, these conventions determine our units of force and charge, as well as distance, so we can then (theoretically) use Coulomb's law F = q1q2/(4π ε0 r2) to determine the permittivity constant by measuring the static force that exists between known electric charges at a certain distance. The best experimental value is ε0 = 8.854187818  10-12 (ampere2second4) / (kilogrammeter3) Substituting these values into equation (7) gives c = 2.997924579935  108

meter / second

This constant of proportionality between the units of space and time is based entirely on electrostatic and electromagnetic measurements, and it follows from Maxwell's equations that electromagnetic waves propagate at the speed c in a vacuum. In Section 3.3 we review the history of attempts to measure the speed of light (which of course for most of human history was not known to be an electromagnetic phenomenon), but suffice it to say here that the best measured value for the speed of light is 299792457.4 m/sec, which agrees with Maxwell's predicted propagation speed for electromagnetic waves to nine significant digits. This was Maxwell's greatest triumph, showing that electromagnetic waves propagate at the speed of light, from which we infer that light itself consists of electromagnetic waves,

thereby unifying optics and electromagnetism. However, this magnificent result also presented Maxwell, and other physicists of the late 19th century, with a puzzle that would baffle them for decades. Equation (7) implies that, assuming the permittivity and permeability of the vacuum are the same when evaluated at rest with respect to any inertial frame of reference, in accord with the classical principle of relativity, and assuming Maxwell's equations are strictly valid in all inertial frames of reference, then it follows that the speed of light must be independent of the frame of reference. This agrees with the Galilean principle of relativity, but flatly violates the Galilean transformation rules, because it does not yield simply additive composition of speeds. This was the conflict that vexed the young Einstein (age 16) when he was attending "prep school" in Aarau, Switzerland in 1895, preparing to re-take the entrance examination at the Zurich Polytechnic. Although he was deficient in the cultural subjects, he already knew enough mathematics and physics to realize that Maxwell's equations don't support the existence of a free wave at any speed other than c, which should be a fixed constant of nature according to the classical principle of relativity. But to admit an invariant speed seemed impossible to reconcile with the classical transformation rules. Writing out equations (5d) and (5a) explicitly, we have four partial differential equations

The above equations strongly suggest that the three components of the current density j and the charge density ρ ought to be combined into a single four-vector, such that each component is the incremental charge per volume multiplied by the respective component of the four-velocity of the charge, as shown below

where the parameter τ is the proper time of the charge's rest frame. If the charge is stationary with respect to these x,y,z,t coordinates, then obviously the current density components vanish, and jt is simply our original charge density ρ. On the other hand, if the charge is moving with respect to the x,y,z,t coordinates, we acquire a non-vanishing

current density, and we find that the charge density is modified by the ratio dt/dτ. However, it's worth noting that the incremental volume elements with respect to a moving frame of reference are also modified by the same Lorentz transformation, which ensures that the electrical charge on a physical object is invariant for all frames of reference. We can also see from the four differential equations above that if the arguments of the partial derivatives on the left-hand side are arranged according to their denominators, they constitute a perfect anti-symmetric matrix

If we let x1,x2,x3,x4 denote the coordinates x,y,z,t respectively, then equations (5a) and (5d) can be combined and expressed in the form

In exactly the same way we can combine equations (5b) and (5c) and express them in the form

where the matrix Q is an anti-symmetric matrix defined by

Returning again to equation (1a), we see that in the absence of a gravitational field the force on a particle with q = m = 1 and velocity v at a point in space where the electric and magnetic field vectors are E and B is given by

In component form this can be written as

Consequently the components of the acceleration are

To simplify the expressions, suppose the velocity of the particle with respect to the original x,y,z,t coordinates is purely in the positive x direction, i.e., we have vy = vz = 0 and vx = v. Then the force on the particle has the components

Now consider the same physical situation, but with respect to a system of inertial coordinates x',y',z',t' in terms of which the particle's velocity is zero. To the first approximation we expect that the components of force are the same when evaluated with respect to the primed coordinate system, and in fact by symmetry it's clear that fx' = fx. However, for the components perpendicular to the velocity, the symmetry of the situation allows to say only that (for any fixed speed v) fy' = kfy and fz' = kfz , where A is a constant that approaches 1 for small v. Hence the components of the electric field with respect to the primed and unprimed coordinate systems are related according to

By symmetry we can also write down the reciprocal transformation, replacing v with -v, which gives

Notice that we've used the same factor k for both transformations, because to the first order we know k(v) is simply 1, suggesting that the dependence of k on v is of the second order, which makes it likely that k(v) is an even function, i.e., we assume k(v) = k(-v). Substituting the expression for Ey' into the expression for Ey and solving the resulting equation for Bz' gives

By the same token, substituting the expression for Ez' into the expression for Ez and solving for By' gives

These last two expressions should look familiar, because they are formally identical to the expression for the transformed time coordinate developed in Section 1.7. Letting ϕ (v) denote the quantity in square brackets for any given v, the general transformation equations for the electric and magnetic field components perpendicular to the velocity are

Comparing these equations with equation (1) in Section 1.7, it should come as no surprise that the actual transformations for the components of the electric and magnetic field are given by setting ϕ(v) = 1. Consequently we have the invariants

Naturally we expect the field components parallel to the velocity to exhibit the corresponding invariance, i.e., we expect that

from which we infer the final transformation equation Bx' = Bx. So, the complete set of transformation equations for the electric and magnetic field components from one system of inertial coordinates to another (with a relative velocity v in the positive x direction) is

Just as the Lorentz transformation for space and time intervals shows that those intervals are the components of a unified space-time interval, these transformation equations show that the electric and magnetic fields are components of a unified electro-magnetic field.

The decomposition of the electromagnetic field into electric and magnetic components depends on the frame of reference. From the invariants noted above we see that, letting E2 and B2 denote squared magnitudes of the electric and magnetic field vectors at a given point, the quantity E2  B2 is invariant (as is the dot product EB), analogous to the invariant X2  T2 for spacetime intervals. The combined electromagnetic field can be represented by the matrix P defined previously, which transforms as a tensor of rank 2 under Lorentz transformations. So too does the matrix Q, and since Maxwell's equations can be expressed in terms of P and Q (as shown by equations (8a) and (8b)), we see that Maxwell's equations are invariant under Lorentz transformations. 2.3 The Inertia of Energy Please reveal who you are of such fearsome form... I wish to clearly know you, the primeval being,because I cannot fathom your intention. Lord Krsna said: I am terrible Time, destroyer of all beings in all worlds, here to destroy this world. Of those heroic soldiers now arrayed in the opposing army, even without you, none will be spared. Bhagavad Gita One of the first and most famous examples of the heuristic power of Einstein's relativistic interpretation of space and time was the suggestion that energy and inertial mass are, in a fundamental sense, equivalent. The word "suggestion" is used advisedly, because massenergy equivalence is not a logically necessary consequence of special relativity (as explained below). In fact, when combined with the gravitational equivalence principle, it turns out that mass-energy equivalence is technically incompatible with special relativity. Indeed this was one of Einstein's main motivations for developing the general theory. Nevertheless, by showing that the kinematics of phenomena can best be described in terms of a unified four-dimensional continuum, with time as a fourth coordinate, distinct from the path parameter, the special theory did clearly suggest that energy be regarded as the fourth (viz., time-like) component of momentum, and hence that all energy has inertia and all inertia represents energy. It should also be mentioned that some kind of equivalence between mass and energy had long been recognized by physicists, even prior to 1905. Indeed Maxwell’s equations already imply that the energy of an electromagnetic wave carries momentum, and Poincare had noted that if Galilean relativity was applied to electrodynamics, the equivalence of mass and energy follows. Lorentz had attempted to describe the mass of an electron as a manifestation of electromagnetic energy. (It's interesting that while some people were trying to "explain" electromagnetism as a disturbance in a material medium, others were trying to explain material substances as manifestations of electromagnetism!) However, the fact that mass-energy equivalence emerges so naturally from Einstein's kinematics, applicable to all kinds of mass and energy (not just electrons and electromagnetism), was mainly responsible for the recognition of this equivalence as a general and fundamental aspect of nature. We'll first give a brief verbal explanation of

how this equivalence emerges from Einstein's kinematics, and then follow with a quantitative description. The basic principle of special relativity is that inertial measures of spatial and temporal intervals are such that the velocity of light with respect to those measures is invariant. It follows that relative velocities are not transitively additive from one reference frame to another, and, as a result, the acceleration of an object with respect to one inertial frame must differ from its acceleration with respect to another inertial frame. However, by symmetry, an impact force exerted by two objects (in one spatial dimension) upon each another is equal and opposite, regardless of their relative velocity. These simple considerations lead directly to the idea that inertia (as quantified by mass) is an attribute of energy. Given an object O of mass m, initially at rest, we apply a force F to the object, giving it an acceleration of F/m. After a while the object has achieved some velocity v, and we continue to apply the constant force F. But now imagine another inertial observer, this one momentarily co-moving with the object at this instant with a velocity v. This other observer sees a stationary object O of mass m subject to a force F, so, on the assumption that the laws of physics are the same in all inertial frames, we know that he will see the object respond with an acceleration of F/m (just as we did). However, due to nonadditivity of velocities, the acceleration with respect to our measures of time and space must now be different. Thus, even though we're still applying a force F to the object, its acceleration (relative to our frame) is no longer equal to F/m. In fact, it must be less, and this acceleration must go to zero as v approaches the speed of light. Hence the effective inertia of the object in the direction of its motion increases. During this experiment we can also integrate the force we exerted over the distance traveled by the object, and determine the amount of work (energy) that we imparted to the object in bringing it to the velocity v. With a little algebra we can show that the ratio of the amount of energy we put into the object to the amount by which the object's inertia (units of mass) increased is exactly c2. To show this quantitatively, suppose the origin of a system of inertial coordinates K0 is moving with speed u0 relative to another system of inertial coordinates K. If a particle P is moving with speed u (in the same direction as u0) with respect to the K0 coordinates, then the speed of the particle relative to the K coordinates is given by the velocity composition law

Differentiating with respect to u gives

Hence, at the instant when P is momentarily co-moving with the K0 coordinates, we have

If we let τ and t denote the time coordinates of K0 and K respectively, then from the metric (dτ)2 = c2(dt)2  (dx)2 and the fact that v2 = (dx/dt)2 it follows that the incremental lapse of proper time dτ along the worldline of P as it advances from t to t + dt is , so we can divide the above expression by this quantity to give

The quantity a = dv/dt is the acceleration of P with respect to the K coordinates, whereas a0 = du / dτ is the “rest acceleration” of P with respect to the K0 coordinates (relative to which it is momentarily at rest). Now, by symmetry, a force F exerted (along the axis of motion) between a particle at rest in K on the particle P at rest in K0 must be of equal and opposite magnitude with respect to both frames of reference. Also, by definition, a force of magnitude F applied to a particle of “rest mass” m0 will result in an acceleration a0 = F/m0 with respect to the reference frame in which the particle is momentarily at rest. Therefore, using the preceding relation between the accelerations with respect to the K0 and K coordinates, we have

The coefficient of “a” in this expression has sometimes been called the “longitudinal mass”, because it represents the effective proportionality between force and acceleration along the direction of action. Now let us define two quantities, p(v) and e(v), which we will call the momentum and kinetic energy of a particle of mass m0 at any relative speed v. These quantities are defined respectively by the integrals of Fdt and Fds over an interval in which the particle is accelerated by a force F from rest to velocity v. The results of these integrations are independent of the pattern of acceleration, so we can assume constant acceleration “a” throughout the interval. Hence the integral of Fdt is evaluated from t = 0 to t = v/a, and since s = (1/2)at2, the integral of Fds is evaluated from s = 0 to s = v2/(2a). In addition, we will define the inertial mass m of the particle as the

ratio p/v. Therefore, the inertial mass and the kinetic energy of the particle at any speed v are given by

If the force F were equal to m0a (as in Newtonian mechanics) these two quantities would equal m0 and (1/2)m0v2 respectively. However, we’ve seen that consistency with relativistic kinematics requires the force to be given by equation (1). As a result, the inertial mass is given by m = m0/ , so it exceeds the rest mass whenever the particle has non-zero velocity. This increase in inertial mass is exactly proportional to the kinetic energy of the particle, as shown by

The exact proportionality between the extra inertia and the extra energy of a moving particle naturally suggests that it is the energy itself which has contributed the inertia, and this in turn suggests that all of the particle’s inertia (including its rest inertia m0) corresponds to some form of energy. This leads us to hypothesize a very general and important relation, E = mc2, which signifies a fundamental equivalence between energy and inertial mass. From this we might imagine that all inertia is potentially convertible to energy, although it's worth noting that this does not follow rigorously from the principles of special relativity. It is just a hypothesis suggested by special relativity (as it is also suggested by Maxwell's equations). In 1905 the only experimental test that Einstein could imagine was to see if a lump of "radium salt" loses weight as it gives off radiation, but of course that would never be a complete test, because the radium doesn't decay down to nothing. The same is true with an nuclear bomb, i.e., it's really only the binding energy of the nucleus that is being converted, so it doesn't demonstrate an entire proton (for example) being converted into energy. However, today we can observe electrons and positrons annihilating each other completely, and yielding amounts of energy precisely in accord with the predictions of special relativity. Incidentally, the above derivation followed Newton in adopting the Third Law (at least for impulse interactions along the line of motion) as a fundamental postulate, on the basis of symmetry. From this the conservation of momentum can be deduced. However, most modern treatments of relativity proceed in the opposite direction, postulating the conservation of momentum and then deducing something like the Third Law. (There are complications when applying the Third Law to extended interactions, and to interactions in which the forces are not parallel to the direction of motion, due to aberration effects

and the ambiguity of simultaneity relations, but the preceding derivation was based solely on interactions that can be modeled as mutual contact events at single points, with the forces parallel to the direction of motion, in which case the Third Law is unproblematic.) The typical modern approach to relativistic mechanics is to begin by defining momentum as the product of rest mass and velocity. One formal motivation for this definition is that the resulting 3-vector is well-behaved under Lorentz transformations, in the sense that if this quantity is conserved with respect to one inertial frame, it is automatically conserved with respect to all inertial frames (which would not be true if we defined momentum in terms of, say, longitudinal mass). On a more fundamental level, this definition is motivated by the fact that it agrees with non-relativistic momentum in the limit of low velocities. The heuristic technique of deducing the appropriate observable parameters of a theory from the requirement that they match classical observables in the classical limit was used extensively in early development of relativity, but apparently no one dignified the technique with a name until Bohr (characteristically) elevated it to the status of a "principle" in quantum mechanics, where it is known as the "Correspondence Principle". Based on this definition, the modern approach then simply postulates that momentum is conserved. Then we define relativistic force as the rate of change of momentum. This is Newton's Second Law, and it's motivated largely by the fact that this "force", together with conservation of momentum, implies Newton's Third Law (at least in the case of contact forces). However, from a purely relativistic standpoint, the definition of momentum as a 3-vector seems incomplete. Its three components are proportional to the derivatives of the three spatial coordinates x,y,z of the object with respect to the proper time τ of the object, but what about the coordinate time t? If we let xj, j = 0, 1, 2, 3 denote the coordinates t,x,y,z, then it seems natural to consider the 4-vector

where m is the rest mass. Then define the relativistic force 4-vector as the proper rate of change of momentum, i.e.,

Our correspondence principle easily enables us to identify the three components p1, p2, p3 as just our original momentum 3-vector, but now we have an additional component, p0, equal to m(dt/dτ). Let's call this component the "energy" E of the object. In full fourdimensional spacetime coordinate time t is related to the object's proper time τ according to

In geometric units (c = 1) the quantity in the square brackets is just v2. Substituting back into our energy definition, we have

The first term is simply m (or mc2 in normal units), so we interpret this as the rest energy of the mass. This is sometimes presented as a derivation of mass-energy equivalence, but at best it's really just a suggestive heuristic device. The key step in this "derivation" was when we blithely decided to call p0 the "energy" of the object. Strictly speaking, we violated our "correspondence principle" by making this definition, because by correspondence with the low-velocity limit, the energy E of a particle should be something like (1/2)mv2, and clearly p0 does not reduce to this in the low-speed limit. Nevertheless, we defined p0 as the "energy" E, and since that component equals m when v = 0, we essentially just defined our result E = m (or E = mc2 in ordinary units) for a mass at rest. From this reasoning it isn't clear that this is anything more than a bookkeeping convention, one that could just as well be applied in classical mechanics using some arbitrary squared velocity to convert from units of mass to units of energy. The assertion of physical equivalence between inertial mass and energy has significance only if it is actually possible for the entire mass of an object, including its rest mass, to manifestly exhibit the qualities of energy. Lacking this, the only equivalence between inertial mass and energy that special relativity strictly entails is the "extra" inertia that bodies exhibit when they acquire kinetic energy. As mentioned above, even the fact that nuclear reactors give off huge amounts of energy does not really substantiate the complete equivalence of energy and inertial mass, because the energy given off in such reactions represents just the binding energy holding the nucleons (protons and neutrons) together. The binding energy is the amount of energy required to pull a nuclei apart. (The terminology is slightly inapt, because a configuration with high binding energy is actually a low energy configuration, and vice versa.) Of course, protons are all positively charged, so they repel each other by the Coulomb force, but at very small distances the strong nuclear force binds them together. Since each nucleon is attracted to every other nucleon, we might expect the total binding energy of a nucleus comprised of N nucleons to be proportional to N(N-1)/2, which would imply that the binding energy per nucleon would increase linearly with N. However, saturation effects cause the binding energy per nucleon to reach a maximum at for nuclei with N  60 (e.g., iron), then to decrease slightly as N increases further. As a result, if an atom with (say) N = 230 is split into two atoms, each with N=115, the total binding energy per nucleon is increased, which means the resulting configuration is in a lower energy state than the original configuration. In such circumstances, the two small atoms have slightly less total rest mass than the original large atom, but at the instant of the split the overall "mass-like" quality is conserved, because those two smaller atoms have enormous

velocities, precisely such that the total relativistic mass is conserved. (This physical conservation is the main reason the old concept of relativistic mass has never been completely discarded.) If we then slow down those two smaller atoms by absorbing their energy, we end up with two atoms at rest, at which point a little bit of apparent rest mass has disappeared from the universe. On the other hand, it is also possible to fuse two light nuclei (e.g., N = 2) together to give a larger atom with more binding energy, in which case the rest mass of the resulting atom is less than the combined rest masses of the two original atoms. In either case (fission or fusion), a net reduction in rest mass occurs, accompanied by the appearance of an equivalent amount of kinetic energy and radiation. (The actual detailed mechanism by which binding energy, originally a "rest property" with isotropic inertia, becomes a kinetic property representing what we may call relativistic mass with anisotropic inertia, is not well understood.) Another derivation of mass-energy equivalence is based on consideration of a bound "swarm" of particles, buzzing around with some average velocity. If the swarm is heated (i.e., energy E is added) the particles move faster and thereby gain both longitudinal and transverse mass, so the inertia of the individual particles is anisotropic, but since they are all buzzing around in random directions, the net effect on the stationary swarm (bound together by some unspecified means) is that its resistance to acceleration is isotropic, and its "rest mass" has effectively been increased by E/c2. Of course, such a composite object still consists of elementary particles with some irreducible rest mass, so even this picture doesn't imply complete mass-energy equivalence. To get complete equivalence we need to imagine something like photons bound together in a swarm. Now, it may appear that equation (2) fails to account for the energy of light, because it gives E proportional to the rest mass m, which is zero for a photon. However, the denominator of (2) is also zero for a photon (because v = 1), so we need to evaluate the expression in the limit as m goes to zero and v goes to 1. We know from the study of electro-magnetic radiation that although a photon has no rest mass, it does (according to Maxwell's equations) have momentum, equal to |p| = E (or E/c in conventional units). This suggests that we try to isolate the momentum component from the rest mass component of the energy. To do this, we square equation (2) and expand the simple geometric series as follows

Excluding the first term, which is purely rest mass, all the remaining terms are divisible by (mv)2, so we can write this is

The right-most term is simply the squared magnitude of the momentum, so we have the apparently fundamental relation

consistent with our premise that the E (or E/c in conventional units) equals the magnitude of the momentum |p| for a photon. Of course, electromagnetic waves are classically regarded as linear, meaning that photons don't ordinarily interfere with each other (directly). As Dirac said, "each photon interferes only with itself... interference between two different photons never occurs". However, the non-linear field equations of general relativity enable photons to interact gravitationally with each other. Wheeler coined the word "geon" to denote a swarm of massless particles bound together by the gravitational field associated with their energy, although he noted that such a configuration would be inherently unstable, viz., it would very rapidly either dissipate or shrink into complete gravitational collapse. Also, it's not clear that any physically realistic situation would lead to such a configuration in the first place, since it would require concentrating an amount of electromagnetic energy equivalent to the mass m within a radius of about r = Gm/c2. For example, to make a geon from the energy equivalent of one electron, it would be necessary to concentrate that energy within a radius of about (6.7)10-58 meters. An interesting alternative approach to deducing (3) is based directly on the Minkowski metric

This is applicable both to massive timelike particles and to light. In the case of light we know that the proper time dτ and the rest mass m are both zero, but we may postulate that the ratio m/dτ remains meaningful even when m and dτ individually vanish. Multiplying both sides of the Minkowski line element by the square of this ratio gives immediately

The first term on the right side is E2 and the remaining three terms are px2, py2, and pz2, so this equation can be written as

Hence this expression is nothing but the Minkowski spacetime metric multiplied through by (m/dτ)2, as illustrated in the figure below.

The kinetic energy of the particle with rest mass m along the indicated worldline is represented in this figure by the portion of the total energy E in excess of the rest energy. Returning to the question of how mass and energy can be regarded as different expressions of the same thing, recall that the energy of a particle with rest mass m0 and speed V is m0/(1V2)1/2. We can also determine the energy of a particle whose motion is defined as the composition of two orthogonal speeds. Let t,x,y,z denote the inertial coordinates of system S, and let T,X,Y,Z denote the (aligned) inertial coordinates of system S'. In S the particle is moving with speed vy in the positive y direction so its coordinates are

The Lorentz transformation for a coordinate system S' whose spatial origin is moving with the speed v in the positive x (and X) direction with respect to system S is

so the coordinates of the particle with respect to the S' system are

The first of these equations implies t = T(1  vx2)1/2, so we can substitute for t in the expressions for X and Y to give

The total squared speed V2 with respect to these coordinates is given by

Subtracting 1 from both sides and factoring the right hand side, this relativistic composition rule for orthogonal speeds vx and vy can be written in the form

It follows that the total energy (neglecting stress and other forms of potential energy) of a ring of matter with a rest mass m0 spinning with an intrinsic circumferential speed u and translating with a speed v in the axial direction is

A similar argument applies to translatory motions of the ring in any direction, not just the axial direction. For example, consider motions in the plane of the ring, and focus on the contributions of two diametrically opposed particles (each of rest mass m0/2) on the ring, as illustrated below.

If the circumferential motion of the two particles happens to be perpendicular to the translatory motion of the ring, as shown in the left-hand figure, then the preceding formula for E is applicable, and represents the total energy of the two particles. If, on the other hand, the circumferential motion of the two particles is parallel to the motion of the ring's center, as shown in the right-hand figure, then the two particles have the speeds (v+u)/(1+vu) and (vu)/(1vu) respectively, so the combined total energy (i.e., the relativistic mass) of the two particles is given by the sum

Thus each pair of diametrically opposed particles with equal and opposite intrinsic motions parallel to the extrinsic translatory motion contribute the same total amount of energy as if their intrinsic motions were both perpendicular to the extrinsic motion. Every bound system of particles can be decomposed into pairs of particles with equal and opposite intrinsic motions, and these motions are either parallel or perpendicular or some combination relative to the extrinsic motion of the system, so the preceding analysis shows that the relativistic mass of the bound system of particles is isotropic, and the system behaves just like an object whose rest mass equals the sum of the intrinsic relativistic masses of the constituent particles. (Note again that we are not considering internal stresses and other kinds of potential energy.) This nicely illustrates how, if the spinning ring was mounted inside a box, we would simply regard the angular kinetic energy of the ring as part of the rest mass M0 of the box with speed v, i.e.,

where the "rest mass" of the box is now explicitly dependent on its energy content. This naturally leads to the idea that each original particle might also be regarded as a "box" whose contents are in an excited energy state via some kinetic mode (possibly rotational), and so the "rest mass" m0 of the particle is actually just the relativistic mass of a lesser amount of "true" rest mass, leading to an infinite regress, and the idea that perhaps all matter is really some form of energy. But does it really make sense to imagine that all the mass (i.e., inertial resistance) is really just energy, and that there is no irreducible rest mass at all? If there is no original kernel of irreducible matter, then what ultimately possesses the energy? To picture how an aggregate of massless energy can have non-zero rest mass, first consider two identical massive particles connected by a massless spring, as illustrated below.

Suppose these particles are oscillating in a simple harmonic motion about their common center of mass, alternately expanding and compressing the spring. The total energy of the system is conserved, but part of the energy oscillates between kinetic energy of the moving particles and potential (stress) energy of the spring. At the point in the cycle when the spring has no tension, the speed of the particles (relative to their common center of mass) is a maximum. At this point the particles have equal and opposite speeds +u and -u, and we've seen that the combined rest mass of this configuration (corresponding to the amount of energy required to accelerate it to a given speed v) is m0/(1u2)1/2. At other points in the cycle, the particles are at rest with respect to their common center of mass,

but the total amount of energy in the system with respect to any given inertial frame is constant, so the effective rest mass of the configuration is constant over the entire cycle. Since the combined rest mass of the two particles themselves (at this point in the cycle) is just m0, the additional rest mass to bring the total configuration up to m0/(1u2)1/2 must be contributed by the stress energy stored in the "massless" spring. This is one example of a massless entity acquiring rest mass by virtue of its stored energy. Recall that the energy-momentum vector of a particle is defined as [E, px, py, pz] where E is the total energy and px, py, pz are the components of the momentum, all with respect to some fixed system of inertial coordinates t,x,y,z. The rest mass m0 of the particle is then defined as the Minkowskian "norm" of the energy-momentum vector, i.e.,

If the particle has rest mass m0, then the components of its energy-momentum vector are

If the object is moving with speed u, then dt/dτ = γ = 1/(1u2)1/2, so the energy component is equal to the transverse relativistic mass. The rest mass of a configuration of arbitrarily moving particles is simply the norm of the sum of their individual energy-momentum vectors. The energy-momentum vectors of two particles with individual rest masses m0 moving with speeds dx/dt = u and dx/dt = u are [γm0, γm0u, 0, 0] and [γm0, γm0u, 0, 0], so the sum is [2γm0, 0, 0, 0], which has the norm 2γm0. This is consistent with the previous result, i.e., the rest mass of two particles in equal and opposite motion about the center of the configuration is simply the sum of their (transverse) relativistic masses, i.e., the sum of their energies. A photon has no rest mass, which implies that the Minkowskian norm of its energymomentum vector is zero. However, it does not follow that the components of its energymomentum vector are all zero, because the Minkowskian norm is not positive-definite. For a photon we have E2  px2  py2  pz2 = 0 (where E = hν, so the energy-momentum vectors of two photons, one moving in the positive x direction and the other moving in the negative x direction, are of the form [E, E, 0, 0] and [E, E, 0, 0] respectively. The Minkowski norms of each of these vectors individually are zero, but the sum of these two vectors is [2E, 0, 0, 0], which has a Minkowski norm of 2E. This shows that the rest mass of two identical photons moving in opposite directions is m0 = 2E = 2hν, even though the individual photons have no rest mass. If we could imagine a means of binding the two photons together, like the two particles attached to the massless spring, then we could conceive of a bound system with positive rest mass whose constituents have no rest mass. As mentioned previously, in normal circumstances photons do not interact with each other (i.e., they can be superimposed without affecting each other), but we can, in principle, imagine photons bound together

by the gravitational field of their energy (geons). The ability of electrons and antielectrons (positrons) to completely annihilate each other in a release of energy suggests that these actual massive particles are also, in some sense, bound states of pure energy, but the mechanisms or processes that hold an electron together, and that determine its characteristic mass, charge, etc., are not known. It's worth noting that the definition of "rest mass" is somewhat context-dependent when applied to complex accelerating configurations of entities, because the momentum of such entities depends on the space and time scales on which they are evaluated. For example, we may ask whether the rest mass of a spinning disk should include the kinetic energy associated with its spin. For another example, if the Earth is considered over just a small portion of its orbit around the Sun, we can say that it has linear momentum (with respect to the Sun's inertial rest frame), so the energy of its circumferential motion is excluded from the definition of its rest mass. However, if the Earth is considered as a bound particle during many complete orbits around the Sun, it has no net momentum with respect to the Sun's frame, and in this context the Earth's orbital kinetic energy is included in its "rest mass". Similarly the atoms comprising a "stationary" block of lead are not microscopically stationary, but in the aggregate, averaged over the characteristic time scale of the mean free oscillation time of the atoms, the block is stationary, and is treated as such. The temperature of the lead actually represents changes in the states of motion of the constituent particles, but over a suitable length of time the particles are still stationary. We can continue to smaller scales, down to sub-atomic particles comprising individual atoms, and we find that the position and momentum of a particle cannot even be precisely stipulated simultaneously. In each case we must choose a context in order to apply the definition of rest mass. Physical entities possess multiple modes of excitation (kinetic energy), and some of these modes we may choose (or be forced) to absorb into the definition of the object's "rest mass", because they do not vanish with respect to any inertial reference frame, whereas other modes we may choose (and be able) to exclude from the "rest mass". In order to assess the momentum of complex physical entities in various states of excitation, we must first decide how finely to decompose the entities, and the time intervals over which to make the assessment. The "rest mass" of an entity invariably includes some of what would be called energy or "relativistic mass" if we were working on a lower level of detail. 2.4 Doppler Shift for Sound and Light I was much further out than you thought And not waving but drowning. Stevie Smith, 1957 For historical reasons, some older text books present two different versions of the

Doppler shift equations, one for acoustic phenomena based on traditional Newtonian kinematics, and another for optical and electromagnetic phenomena based on relativistic kinematics. This sometimes gives the impression that relativity requires us to apply a different set of kinematical rules to the propagation of sound than to the propagation of light, but of course that is not the case. The kinematics of relativity apply uniformly to the propagation of all kinds of signals, provided we give the exact formulae. The traditional acoustic formulas are inexact, tacitly based on Newtonian approximations, but when they are expressed exactly we find that they are perfectly consistent with the relativistic formulas. Consider a frame of reference in which the medium of signal propagation is assumed to be at rest, and suppose an emitter and absorber are located on the x axis, with the emitter moving to the left at a speed of ve and the absorber moving to the right, directly away from the emitter, at a speed of va. Let cs denote the speed at which the signal propagates with respect to the medium. Then, according to the classical (non-relativistic) treatment, the Doppler frequency shift is

(It's assumed here that u and v are less than cs, because otherwise there may be shock waves and/or lack of communication between transmitter and receiver, in which case the Doppler effect does not apply.) The above formula is often quoted as the Doppler effect for sound, and then another formula is given for light, suggesting that relativity arbitrarily treats sound and light signals differently. In truth, relativity has just a single formula for the Doppler shift, which applies equally to both sound and light. This formula can basically be read directly off the spacetime diagram shown below

If an emitter on worldline OA turns a signal ON at event O and OFF at event A, the proper duration of the signal is the magnitude of OA, and if the signal propagates with

the speed of the worldline AB, then the proper duration of the pulse for a receiver on OB will equal the magnitude of OB. Thus we have

and

Substituting xA = vetA and xB = vatB into the equation for cs and re-arranging terms gives

from which we get

Substituting this into the ratio of |OA| / |OB| gives the ratio of proper times for the signal, which is the inverse of the ratio of frequencies:

Now, if va and ve are both small compared to c, it's clear that the relativistic correction factor (the square root quantity) will be indistinguishable from unity, and we can simply use the leading factor, which is the classical Doppler formula for both sound and light. However, if va and/or ve are fairly large (i.e., on the same order as c) we can't neglect the relativistic correction. It may seem surprising that the formula for sound waves in a fixed medium with absolute speeds for the emitter and absorber is also applicable to light, but notice that as the signal propagation speed cs goes to c, the above Doppler formula smoothly evolves into

which is very nice, because we immediately recognize the quantity inside the square root as the multiplicative form of the relativistic composition law for velocities (discussed in section 1.8). In other words, letting u denote the composition of the speeds va and ve

given by the formula

it follows that

Consequently, as cs increases to c, the absolute speeds ve and va of the emitter and absorber relative to the fixed medium merge into a single relative speed u between the emitter and absorber, independent of any reference to a fixed medium, and we arrive at the relativistic Doppler formula for waves propagating at c for an emitter and absorber with a relative velocity of u:

To clarify the relation between the classical and relativistic Doppler shift equations, recall that for a classical treatment of a wave with characteristic speed cs in a material medium the Doppler frequency shift depends on whether the emitter or the absorber is moving relative to the fixed medium. If the absorber is stationary and the emitter is receding at a speed of v (normalized so cs = 1), then the frequency shift is given by

whereas if the emitter is stationary and the absorber is receding the frequency shift is

To the first order these are the same, but they obviously differ significantly if v is close to 1. In contrast, the relativistic Doppler shift for light, with cs = c, does not distinguish between emitter and absorber motion, but simply predicts a frequency shift equal to the geometric mean of the two classical formulas, i.e.,

Naturally to first order this is the same as the classical Doppler formulas, but it differs from both of them in the second order, so we should be able to check for this difference,

provided we can arrange for emitters and/or absorbers to be moving with significant speeds. The Doppler effect has in fact been tested at speeds high enough to distinguish between these two formulas. The possibility of such a test, based on observing the Doppler shift for “canal rays” emitted from high-speed ions, had been considered by Stark in 1906, and Einstein published a short paper in 1907 deriving the relativistic prediction for such an experiment. However, it wasn’t until 1938 that the experiment was actually performed with enough precision to discern the second order effect. In that year, Ives and Stilwell shot hydrogen atoms down a tube, with velocities (relative to the lab) ranging from about 0.8 to 1.3 times 106 m/sec. As the hydrogen atoms were in flight they emitted light in all directions. Looking into the end of the tube (with the atoms coming toward them), Ives and Stilwell measured a prominent characteristic spectral line in the light coming forward from the hydrogen. This characteristic frequency ν was Doppler shifted toward the blue by some amount dνapproach because the source was approaching them. They also placed a mirror at the opposite end of the tube, behind the hydrogen atoms, so they could look at the same light from behind, i.e., as the source was effectively moving away from them, red-shifted by some amount dνreceed. The following is a table of results from the original 1938 experiment for four different velocities of the hydrogen atom:

Ironically, although the results of their experiment brilliantly confirmed Einstein’s prediction based on the special theory of relativity, Ives and Stillwell were not advocates of relativity, and in fact gave a completely different theoretical model to account for their experimental results and the deviation from the classical prediction. This illustrates the fact that the results of an experiment can never uniquely identify the explanation. They can only split the range of available models into two groups, those that are consistent with the results and those that aren't. In this case it's clear that any model yielding the classical prediction is ruled out, while the Lorentz/Einstein model is found to be consistent with the observed results. All the above was based on the assumption that the emitter and absorber are moving relative to each other directly along their "line of sight". More generally, we can give the Doppler shift for the case when the (inertial) motions of the emitter and absorber are at any specified angles relative to the "line of sight". Without loss of generality we can assume the absorber is stationary at the origin of inertial coordinates and the emitter is moving at a speed v and at an angle ϕ relative to the direct line of sight, as illustrated below.

For two pulses of light emitted at coordinate times differing by Δte, arrival times at the receiver will differ by Δta = (1  vr) Δt where vr = v cos(ϕ) is the radial component of the emitter’s velocity. Also, the proper time interval along the emitter’s worldline between the two emissions is Δτe = Δte (1 – v2)1/2. Therefore, since the frequency of the transmissions with respect to the emitter’s rest frame is proportional to 1/Δτe, and the frequency of receptions with respect to the absorber’s rest frame is proportional to 1/Δta, the full frequency shift is

This differs in appearance from the Doppler shift equation given in Einstein’s 1905 paper, but only because, in Einstein’s equation, the angle ϕ is evaluated with respect to the emitter’s rest frame, whereas in our equation the angle is evaluated with respect to the absorber’s rest frame. These two angles differ because of the effect of aberration. If we let ϕ' denote the angle with respect to the emitter's rest frame, then ϕ' is related to ϕ by the aberration equation

(See Section 2.5 for a derivation of this expression.) Substituting for cos(ϕ) into the previous equation gives Einstein’s equation for the Doppler shift, i.e.,

Naturally for the "linear" cases, when ϕ = ϕ' = 0 or ϕ = ϕ' = π we have

respectively. This highlights the symmetry between emitter and absorber that is so characteristic of relativistic physics. Even more generally, consider an emitter moving with constant velocity u, an absorber moving with constant velocity v, and a signal propagating with velocity C in terms of an inertial coordinate system in which the signal’s speed |C| is independent of direction. This would apply to a system of coordinates at rest with respect to the medium of the signal, and it would apply to any inertial coordinate system if the signal is light in a vacuum. It would also apply to the case of a signal emitted at a fixed speed relative to the emitter, but only if we take u = 0, because in this case the speed of the signal is independent of direction only in terms of the rest frame of the emitter. We immediately have the relation

where re and ra are the position vectors of the emission and absorption events at the times te and ta respectively. Differentiating both sides with respect to ta and dividing through by 2(ta  te), and noting that (ra – re)/(ta – te) = C, we get

where u and v are the velocity vectors of the emitter and absorber respectively. Solving for the ratio dte/dta, we arrive at the relation

Making use of the dot product identity r·s = |r||s|cos(θr,s) where θr,s is the angle between the r and s vectors, these can be re-written as

The frequency of any process is inversely proportional to the duration of the period, so the frequency at the absorber relative to the emitter, projected by means of the signal, is given by νa/νe = dte/dta. Therefore, the above expressions represent the classical Doppler effect for arbitrarily moving emitter and receiver. However, the elapsed proper time along

a worldline moving with speed v in terms of any given inertial coordinate system differs from the elapsed coordinate time by the factor

where c is the speed of light in vacuum. Consequently, the actual ratio of proper times – and therefore proper frequencies – for the emitter and absorber is

The leading ratio is the classical Doppler effect, and the square root factor is the relativistic correction.

2.5 Stellar Aberration It was chiefly therefore Curiosity that tempted me (being then at Kew, where the Instrument was fixed) to prepare for observing the Star on December 17th, when having adjusted the Instrument as usual, I perceived that it passed a little more Southerly this Day than when it was observed before. James Bradley, 1727 The aberration of starlight was discovered in 1727 by the astronomer James Bradley while he was searching for evidence of stellar parallax, which in principle ought to be observable if the Copernican theory of the solar system is correct. He succeeded in detecting an annual variation in the apparent positions of stars, but the variation was not consistent with parallax. The observed displacement was greatest for stars in the direction perpendicular to the orbital plane of the Earth, and most puzzling was the fact that the displacement was exactly three months (i.e., 90 degrees) out of phase with the effect that would result from parallax due to the annual change in the Earth’s position in orbit around the Sun. It was as if he was expecting a sine function, but found instead a cosine function. Now, the cosine is the derivative of the sine, so this suggests that the effect he was seeing was not due to changes in the earth’s position, but to changes in the Earth’s (directional) velocity. Indeed Bradley was able to interpret the observed shift in the incident angle of starlight relative to the Earth’s frame of reference as being due to the transverse velocity of the Earth relative to the incoming corpuscles of light, assuming the latter to be moving with a finite speed c. The velocity of the corpuscles relative to the Earth equals their velocity vector c with respect to the Sun’s frame of reference plus the

negative of the orbital velocity vector v of the Earth, as shown below.

In this figure, θ1 is the apparent elevation of a star above the Earth’s orbital plane when the Earth’s velocity is most directly toward the star (say, in January), and θ2 is the apparent elevation six months later when the Earth’s velocity is in the opposite direction. The law of sines gives

Since the aberration angles α are quite small, we can closely approximate sin(α) with just α. Therefore, the apparent position of a star that is roughly θ above the ecliptic ought to describe a small circle (or ellipse) around its true position, and the “radius” of this path should be sin(θ)(v/c) where v is the Earth’s orbital speed and c is the speed of light. When Bradley made his discovery he was examining the star γ Draconis, which has a declination of about 51.5 degrees above the Earth’s equatorial plane, and about 75 degrees above the ecliptic plane. Incidentally, most historical accounts say Bradley chose this star simply because it passes directly overhead in Greenwich England, the site of his observatory, which happens to be at about 51.5 degrees latitude. Vertical observations minimize the effects of atmospheric refraction, but surely this is an incomplete explanation for choosing γ Draconis, because stars with this same declination range from 28 to 75 degrees above the ecliptic, due to the Earth’s tilt of 23.5 degrees. Was it just a lucky coincidence that he chose (as Leibniz had previously) γ Draconis, a star with the maximum possible elevation above the ecliptic among stars that pass directly over Greenwich? Accidental or not, he focused on nearly the ideal star for detecting aberration. The orbital speed of the Earth is roughly v = (2.98)104 m/sec, and the speed of light is c = (3.0)108 m/sec, so the magnitude of the aberration for γ Draconis is (v/c)sin(75 deg) = (9.59)10-5 radians = 19.8 seconds of arc. Bradley subsequently confirmed the expected aberration for stars at other declinations. Ironically, although it was not the effect Bradley had been seeking, the existence of stellar aberration was, after all, conclusive observational proof of the Earth’s motion, and hence

of the Copernican theory, which had been his underlying objective. Furthermore, the discovery of stellar aberration not only provided the first empirical proof of the Copernican theory, it also furnished a new and independent proof of the finite speed of light, and even enabled that speed to be estimated from knowledge of the orbital speed of the Earth. The result was consistent with the earlier estimate of the speed of light by Roemer based on observations of Jupiter’s moons (see Section 3.3). Bradley’s interpretation, based on the Newtonian corpuscular concept of light, accounted quite well for the basic phenomenon of stellar aberration. However, if light consists of ballistic corpuscles their speeds ought to depend on the relative motion between the source and observer, and these differences in speed ought to be detectable, whereas no such differences were found. For example, early in the 19th century Arago compared the focal length of light from a particular star at six-month intervals, when the Earth’s motion should alternately add and subtract a velocity component equal to the Earth’s orbital speed to the speed of light. According to the corpuscle theory, this should result in a slightly different focal length through the system of lenses, but Arago observed no difference at all. In another experiment he viewed the aberration of starlight through a normal lens and through a thick prism with a very different index of refraction, which ought to give a slightly different aberration angle according to the Newtonian corpuscular model, but he found no difference. Both these experiments suggest that the speed of light is independent of the motion of the source, so they tended to support the wave theory of light, rather than the corpuscular theory. Unfortunately, the phenomenon of stellar aberration is somewhat problematic for theories that regard electromagnetic radiation as waves propagating in a luminiferous ether. It’s worthwhile to examine the situation in some detail, because it is a nice illustration of the clash between mechanical and electromagnetic phenomena within the context of Galilean relativity. If we conceive of the light emanating from a distant star reaching the Earth’s location as a set of essentially parallel streams of particles normal to the Earth’s orbit (as Bradley did), then we have the situation shown in the left-hand figure below, and if we apply the Galilean transformation to a system of coordinates moving with the Earth (in the positive x direction) we get the situation shown in the right-hand figure.

According to this model the aberration arises because each corpuscle has equations of

motion of the form y = -ct and x = x0, so the Galilean transformation x = x’+vt, y = y’, t = t’ leads to y’ = ct’ and x’+vt = x0, which gives (after eliminating t) the path x’ – v(y’/c) = x0. Thus we have dx’/dy’ = v/c = tan(α). In contrast, if we conceive of the light as essentially a plane wave, the sequence of wave crests is as shown below.

In this case each wavecrest has the equation y = ct, with no x specification, because the wave is uniform over the entire wavefront. Applying the same Galilean transformation as before, we get simply y’ = ct’, so the plane wave looks the same in terms of both systems of coordinates. We might try to argue that the flow of energy follows definite streamlines, and if these streamlines are vertical with respect to the unprimed coordinates they would transform into slanted streamlines in the primed coordinates, but this would imply that the direction of propagation of the wave energy is not exactly normal to the wave fronts, in conflict with Maxwell’s equations. This highlights the incompatibility between Maxwell’s equations and Galilean relativity, because if we regard the primed coordinates as stationary and the distant star as moving transversely with speed –v, then the waves reaching the Earth at this moment should have the same form as if they were emitted from the star when it was to the right of its current position, and therefore the wave fronts ought to be slanted by an angle of v/c. Of course, we do actually observe aberration of this amount, so the wave fronts really must be tilted with respect to the primed coordinates, and we can fairly easily explain this in terms of the wave model, but the explanation leads to a new complication. According to the early 19th century wave model with a stationary ether, an observation of a distant star consists of focusing a set of parallel rays from that star down to a point, and this necessarily involves some propagation of light in the transverse direction (in order to bring the incoming rays together). Taking the focal point to be midway between two rays, and assuming the light propagates transversely at the same speed in both directions, we will align our optical device normal to the plane wave fronts. However, suppose the effective speed of light is slightly different in the two transverse directions. If that were the case, we would need to tilt our optical device, and this would introduce a time skew in our evaluation of the wave front, because our optical image would associate rays from different points on the wave front at slightly different times. As a result, what we regard as the wave front would actually be slanted. The proponents of the wave model argued that the speed of light is indeed different in the two transverse directions relative to a

telescope on the Earth pointed up at a star, because the Earth is moving sideways (through the ether) with respect to the incoming rays. Assuming light always propagates at the fixed speed c relative to the ether, and assuming the Earth is moving at a speed v relative to the ether, we could argue that the transverse speed of light inside our telescope is c+v in one direction and cv in the other. To assess the effect of this asymmetry, consider for simplicity just two mirror elements of a reflecting telescope, focusing incoming rays as illustrated below.

The two incoming rays shown in this figure are from the same wavecrest, but they are not brought into focus at the midpoint of the telescope, due to the (putative) fact that the telescope is moving sideways through the ether with a speed v. Both pulses strike the mirrors at the same time, but the left hand pulse goes a distance proportional to c+v in the time it takes the right hand pulse to go a distance proportional to cv. In order to bring the wave crest into focus, we need to increase the path length of the left hand ray by a distance proportional to v, and decrease the right hand path length by the same distance. This is done by tilting the telescope through a small angle whose tangent is roughly v/c, as shown below.

Thus the apparent optical wavefront is tilted by an angle θ given by tan(θ) = v/c, which is the same as the aberration angle for the rays, and also in agreement with the corpuscle model. However, this simple explanation assumes a total vacuum, and it raises questions about what would happen if the telescope was filled with some material medium such as air or water. It was already accepted in Fresnel’s day, for both the wave and the corpuscle models of light, that light propagates more slowly in a dense medium than in vacuum. Specifically, the speed of light in a medium with index of refraction n is c/n. Hence if we fill our reflecting telescope with such a medium, then the speed of light in the two transverse directions would be c/n + v and c/n – v, and the above analysis would lead us to expect an aberration angle given by tan(θ) = nv/c. The index of refraction of air is just 1.0003, so this doesn’t significantly affect the observed aberration angle for telescopes in air. However, the index of refraction of water is 1.33, so if we fill a telescope with water,

we ought to observe (according to this theory) significantly more stellar aberration. Such experiments have actually been carried out, but no effect on the aberration angle is observed. In 1818 Fresnel suggested a way around this problem. His hypothesis, which he admitted appeared extraordinary at first sight, was that although the luminiferous ether through which light propagates is nearly immobile, it is dragged along slightly by material objects, and the higher the refractive index of the object, the more it drags the ether along with its motion. If an object with refractive index n moves with speed v relative to the nominal rest frame of the ether, Fresnel hypothesized that the ether inside the object is dragged forward at a speed (1 – 1/n2)v. Thus for objects with n = 1 there is no dragging at all, but for n greater than 1 the ether is pulled along slightly. Fresnel gave a plausibility argument based on the relation between density and refractivity, making his hypothesis seem at least slightly less contrived, although it was soon pointed out that since the index of refraction of a given medium varies with frequency, Fresnel’s model evidently requires a different ether for each frequency. Neglecting this second-order effect of chromatic dispersion, Fresnel was able on the basis of his partial dragging hypothesis to account for the absence of any change in stellar aberration for different media. He pointed out that, in the above analysis, the speed of light in the two directions has the values

For the vacuum we have n = 1, and these expressions are the same as before. In the presence of a material medium with n greater than 1, the optical device must now be tilted through an angle whose tangent is approximately

It might seem as if Fresnel’s hypothesis has simply resulted in exchanging one problem for another, but recall that our telescope is aligned normal to the apparent wave front, whereas it is at an angle of v/c to the normal of the actual wave front, so the wave will be refracted slightly (assuming n is not equal to 1). According to Snell’s law (which for small angles is n1θ1 = n2θ2), the refracted angle will be less than the incident angle by the factor 1/n. Hence we must orient our telescope at an angle of v/c in order for the rays within the medium to be at the required angle. This is how, on the basis of somewhat adventuresome hypotheses and assumptions, physicists of the 19th century were able to account for stellar aberration on the basis of the wave model of light. (Accommodating the lack of effect of differing indices of refraction proved to be even more challenging for the corpuscular model.) Fresnel’s remarkable hypothesis was directly confirmed (many years later) by Fizeau, and it is now recognized as a first-order approximation of the relativistic velocity addition law, composing the speed of light in a medium with the speed of the medium

It’s worth noting that all the “speeds” discussed here are phase speeds, corresponding to the time parameter for a given wave. Lorentz later showed that Fresnel’s formula could also be interpreted in the context of a perfectly immobile ether along with the assumption of phase shifts in the incoming wave fronts so that the effective time parameter transformation was not the Galilean t’ = t but rather t’ = t – vx/c2. Despite the success of Fresnel’s hypothesis in matching all optical observations to the first order in v/c, many physicists considered his partially dragged ether model to be ad hoc and unphysical (especially the apparent need for a different ether for each frequency of light), so they sought other explanations for stellar aberration that would be consistent with a more mechanistically realistic wave model. As an alternative to Fresnel’s hypothesis, Lorentz evaluated a proposal of Stokes, who in 1846 had suggested that the ether is totally dragged along by material bodies (so the ether is co-moving with the body at the body’s surface), and is irrotational, incompressible, and inviscid, so that it supports a velocity potential. Under these assumptions it can be shown that the normal of a light wave incident on the Earth undergoes a total deflection during its approach such that (to first order) the apparent shift in the star’s position agrees with observation. Unfortunately, as Lorentz pointed out, the assumptions of Stokes’ theory are mutually contradictory, because the potential flow field around a sphere does not give zero velocity on the sphere’s surface. Instead, the velocity of the ether wind on the Earth’s surface would vary with position, and so too would the aberration of starlight. Planck suggested a way around this objection by supposing the luminiferous ether was compressible, and accumulated with greatly increased density around large objects. Lorentz admitted that this was conceivable, but only if we also assume the speed of light propagating through the ether is unaffected by the changes in density of the ether, an assumption that plainly contradicts the behavior of wave propagation in ordinary substances. He concluded In this branch of physics, in which we can make no progress without some hypothesis that looks somewhat startling at first sight, we must be careful not to rashly reject a new idea… yet I dare say that this assumption of an enormously condensed ether, combined, as it must be, with the hypothesis that the velocity of light is not in the least altered by it, is not very satisfactory. With the failure of Stoke’s theory, the only known way of reconciling stellar aberration with a wave theory of light was Fresnel’s “extraordinary” hypothesis of partial dragging, or Lorentz’s equivalent interpretation in terms of the effective phase time parameter t’. However, the Fresnel-Lorentz theory predicted a non-null result for the MichelsonMorley experiment, which was the first experiment accurate to the second order in v/c. To remedy this, Lorentz ultimately incorporated Fitzgerald’s length contraction into his theory, which amounts to replacing the Galilean transformation x’ = x  vt with the

relation x’ = (x – vt)/ (1 – (v/c)2)1/2, and then for consistency applying this same secondorder correction to the time transformation, giving t’ = (t – vx/c2)/(1 – (v/c)2)1/2, thereby arriving at the full Lorentz transformation. By this point the posited luminiferous ether had lost all of its mechanistic properties. Meanwhile, Einstein's 1905 paper on the electrodynamics of moving bodies included a greatly simplified derivation of the full Lorentz transformation, dispensing with the ether altogether, and analyzing a variety of phenomena, including stellar aberration, from a purely kinematical point of view. If a photon is emitted from object A at the origin of the xyt coordinates and an angle α relative to the x axis, then at time t1 it will have reached the point

(Notice that the units have been scaled to make c = 1, so the Minkowski metric for a null interval gives x12 + y12 = t12.) Now consider an object B moving in the positive x direction with velocity v, and being struck by the photon at time t1 as shown below.

Naturally an observer riding along with B will not see the light ray arriving at an angle α from the x axis, because according to the system of coordinates co-moving with B the source object A has moved in the x direction (but not in the y direction) between the times of transmission and reception of the photon. Since the angle is just the arctangent of the ratio of Δy to Δx of the photon's path, and since value of Δx is different with respect to B's co-moving inertial coordinates whereas Δy is the same, it's clear that the angle of the photon's path is different with respect to B's co-moving coordinates than with respect to A's co-moving coordinates. In general the transformation of the angles of the paths of moving objects from one system of inertial coordinates to another is called aberration. To determine the angle of the incoming ray with respect to the co-moving inertial coordinates of B, let x'y't' be an orthogonal coordinate system aligned with the xyt coordinates but moving in the positive x direction with velocity v, so that B is at rest in the primed coordinate system. Without loss of generality we can co-locate the origins of the primed and unprimed coordinates systems, so in both systems the photon is emitted at (0,0,0). The endpoint of the photon's path in the primed coordinates can be computed from the unprimed coordinates using the standard Lorentz transformation for a boost in the positive x direction:

Just as we have cos(α) = x1/t1, we also have cos(α') = x1'/t1', and so

which is the general relativistic aberration formula relating the angles of light rays with respect to relatively moving coordinate systems. Likewise we have sin(α') = y1'/t1', from which we get

Using these expressions for the sine and cosine of α' it follows that

Recalling the trigonometric identity tan(z) = sin(2z)/[1+cos(2z)] this gives

which immediately shows that aberration can be represented by stereographic projection from a sphere to the tangent plane. (This is discussed more fully in Section 2.6.) To see the effect of equation (3), suppose that, with respect to the inertial rest frame of a given particle, the rays of starlight incident on the particle are uniformly distributed in all directions. Then suppose the particle is given some speed v in the positive x direction relative to this original isotropic frame, and we evaluate the angles of incidence of those same rays of starlight with respect to the particle's new rest frame. The results, for speeds ranging from 0 to 0.999, are shown in the figure below. (Note that the angles in equation (3) are evaluated between the positive x or x' axis and the positive direction of the light ray.)

The preceding derivation applies to the case when the light is emitted from the unprimed coordinate system at a certain angle and evaluated with respect to the primed coordinate system, which is moving relative to the unprimed system. If instead the light was emitted from B and received at A, we can repeat the above derivation, except that the direction of the light ray is reversed, going now from B to A. The spatial coordinates are all the same but the emission event now occurs at -t1, because it is in the past of event (0,0,0). The result is simply to replace each occurrence of v in the above expressions with -v. Of course, we could reach the same result simply by transposing the primed and unprimed angles in the above expressions. Incidentally, the aberration formula used by astronomers to evaluate the shift in the apparent positions of stars resulting from the Earth's orbital motion is often expressed in terms of angles with respect to the y axis (instead of the x axis), as shown below

This configuration corresponds to a distant star at A sending starlight to the Earth at B, which is moving nearly perpendicular to the incoming ray. This gives the greatest aberration effect, which explains why the stars furthest from the ecliptic plane experience the greatest aberration. The formula can be found simply by making the substitution α = π  θ in equation (1), and noting the trigonometric identity tan(acos(π/2  x)) = x/ . This gives the equivalent form

Another interesting aspect of aberration is illustrated by considering two separate light sources S1 and S2, and two momentarily coincident observers A and B as shown below

If observer A is stationary with respect to the sources of light, he will see the incoming rays of light striking him from the negative x direction. Thus, the light will impart a small amount of momentum to observer A in the positive x direction. On the other hand, suppose observer B is moving to the right (away from the sources of light) at nearly the speed of light. According to our aberration formula, if B is traveling with a sufficiently great speed, he will see the light from S1 and S2 approaching from the positive x direction, which means that the photons are imparting momentum to B in the negative x direction even though the light sources are "behind" B. This may seem paradoxical, but the explanation becomes clear when we realize that the x component of the velocities of the incoming light rays is less than c (because (vx)2 = c2  (vy)2), which means that it's possible for observer B to be moving to the right faster than the incoming photons are moving to the right. Of course, this effect relies only on the relative motion of the observer and the source, so it works just as well if we regard B as motionless and the light sources S1,S2 moving to the left at near the speed of light. Thus, it might seem that we could use light rays to "pull" an object from behind, and in a sense this is true. However, since the light rays are moving to the right more slowly than the object, they clearly cannot catch up with the object from behind, so they must have been emitted when the object was still to the left of the sources. This illustrates how careful one must be to correctly account for the effective aberration of non-uniformly moving objects, because the simple aberration formulas are based on the assumption that the light source has been in uniform motion for an indefinite period of time. To correctly describe the aberration of non-uniformly moving light sources it is necessary to return to the basic metrical relations. For example, consider a binary star system in which one large central star is roughly stationary (relative to our Sun), and a smaller companion star is orbiting around the central star with a large angular velocity in a plane normal to the direction to our Sun, as illustrated below.

It might seem that the periodic variations in the velocity of the smaller star relative to our Sun would result in significantly different amounts of aberration as viewed from the Earth, causing the two components of the binary star system to appear in separate locations in the sky - which of course is not what is observed. Fortunately, it's easy to show that the correct application of the principles of special relativity, accounting for the non-uniform variations in the orbiting star's velocity, leads to prediction that agree perfectly with observation of binary star systems. At any moment of observation on Earth we can consider ourselves to be at rest at the point P0 in the momentarily co-moving inertial frame, with respect to which our coordinates are

Suppose the large central star of a binary pair is at point P1 at a distance L from the Earth with the coordinates

The fundamental assertion of special relativity is that light travels along null paths, so if a pulse of light is emitted from the star at time t = T and arrives at Earth at time t = 0, we have

and so

from which it follows that x1/z1 at time T is have the aberration angle

. Thus, for the central star we

Now, what about the aberration of the other star in the binary pair, the one that is assumed to be much smaller and revolving at a radius R and angular speed w around the larger star in a plane perpendicular to the Earth? The coordinates of that revolving star at point P2 are

where θ = wt is the angular position of the smaller star in its orbit. Again, since light travels along null paths, a pulse of light arriving on Earth at time t = 0 was emitted at time

t = T satisfying the relation

Solving this quadratic for T (and noting that the phase θ depends entirely on the arbitrary initial conditions of the orbit) gives

If the radius R of the binary star's orbit is extremely small in comparison with the distance L from those stars to the Earth, and assuming v is not very close to the speed of light, then the quantity inside the square root is essentially equal to 1. Therefore, the tangents of the angles of incidence in the x and y directions are

These expressions make it clear why Einstein emphasized in his 1905 treatment of aberration that the light source was at infinite distance, i.e., L goes to infinity, so all but the middle term of the x tangent vanish. Of course, the leading terms in these tangents are obviously just the inherent "static" angular separation between the two stars viewed from the Earth, and the last term in the x tangent is completely negligible assuming R/L and/or v are sufficiently small compared with 1, so the aberration angle is essentially

which of course is the same as the aberration of the central star. Indeed, binary stars have been carefully studied for over a century, and the aberrations of the components are consistent with the relativistic predictions for reasonable Keplerian orbits. (Incidentally, recall that Bradley's original formula for aberration was tan(α) = v, whereas the corresponding relativistic equation is sin(α) = v. The actual aberration angles for stars seen from Earth are small enough that the sine and tangent are virtually indistinguishable.) The experimental results of Michelson and Morley, based on beams of light pointed in various directions with respect to the Earth's motion around the Sun, can also be treated as aberration effects. Let the arm of Michelson's interferometer be of length L, and let it

make an angle α with the direction of motion in the rest frame of the arm. We can establish inertial coordinates t,x,y in this frame, in terms of which the light pulse is emitted at t1 = 0, x1 = 0, y1 = 0, reflected at t2 = L, x2 = Lcos(α), y2 = Lsin(α), and arrives back at the origin at t3 = 2L, x3 = 0, y3 = 0. The Lorentz transformation to a system x',y',t' moving with velocity v in the x direction is x' = (xvt)/γ, y' = y, t' = (tvx)/γ where γ2 = (1 v2), so the coordinates of the three events are x1' = 0, y1' = 0, t1' = 0, and x2' = L(cos(α)v)/ γ, y2' = Lsin(α), t2' = L[1vcos(α)]/γ, and x3' = -2vL/γ, y3' = 0, t3' = 2L/γ. Hence the total elapsed time in the primed coordinates is 2L/γ. Also, the total spatial distance traveled is the sum of the outward distance

and the return distance

so the total distance is 2L/γ, giving a light speed of 1 regardless of the values of v and α. Of course, the angle of the interferometer arm cannot be α with respect to the primed coordinates. The tangent of the angle equals the arm's y extent divided by its x extent, which gives tan(α) = Lsin(α)/[L(cos(α)] in the arm's rest coordinates. In the primed coordinates the y' extent of the arm is the same as the y extent, Lsin(α), but the x' extent is Lcos(α)γ, so the tangent of the arm's angle is tan(α') = tan(α)/γ. However, this should not be confused with the angle (in the primed coordinates) of the light pulse as it travels along the arm, because the arm is in motion with respect to the primed coordinates. The outward direction of motion of the light pulse is given by evaluating the primed coordinates of the emission and absorption events at x1,y1 and x2,y2 respectively. Likewise the inward direction of the light pulse is based on the interval from x2,y2 to x3,y3. These give the tangents of the outward and inward angles

Naturally these are consistent with the result of taking the ratio of equations (1) and (2). 2.6 Mobius Transformations and The Night Sky What we are beginning to see here is the first step of a powerful correspondence between the spacetime geometry of relativity and the holomorphic geometry of complex spaces. Roger Penrose, 1977

Any proper orthochronous Lorentz transformation (including ordinary rotations and relativistic boosts) can be represented by

where

and Q* is the transposed conjugate of Q. The coefficients a,b,c,d of Q are allowed to be complex numbers, normalized so that ad  bc = 1. Just to be explicit, this implies that if we define

then the Lorentz transformation (1) is

Two observers at the same point in spacetime but with different orientations and velocities will "see" incoming light rays arriving from different relative directions with respect to their own frames of reference, due partly to ordinary rotation, and partly to the aberration effect described in the previous section. This leads to the remarkable fact that the combined effect of any proper orthochronous (and homogeneous) Lorentz transformation on the incidence angles of light rays at a point corresponds precisely to the effect of a particular linear fractional transformation on the Riemann sphere via ordinary stereographic projection from the extended complex plane. The latter is illustrated below:

The complex number p in the extended complex plane is identified with the point p' on the unit sphere that is struck by a line from the "North Pole" through p. In this way we can identify each complex number uniquely with a point on the sphere, and vice versa. (The North Pole is identified with the "point at infinity" of the extended complex plane, for completeness.) Relative to an observer located at the center of the Riemann sphere, each point of the sphere lies in a certain direction, and these directions can be identified with the directions of incoming light rays at a point in spacetime. If we apply a Lorentz transformation of the form (1) to this observer, specified by the four complex coefficients a,b,c,d, the resulting change in the directions of the incoming rays of light is given exactly by applying the linear fractional transformation (also known as a Mobius transformation)

to the points of the extended complex plane. Of course, our normalization ad  bc = 1 implies the two conditions

so of the eight coefficients needed to specify the four complex numbers a,b,c,d, these two constraints reduce the degrees of freedom to six, which is precisely the number of degrees of freedom of Lorentz transformations (namely, three velocity components vx,vy,vz, and three angular specifications for the longitude and latitude of our line of sight and orientation about that line). To illustrate this correspondence, first consider the "identity" Mobius transformation w  w. In this case we have

so our Lorentz transformation reduces to t' = t, x' = x, y' = y, z' = z as expected. None of the points move on the complex plane, so none move on the Riemann sphere under stereographic projection, and nothing changes in the sky's appearance. Now let's consider

the Mobius transformation w  1/w. In this case we have

and so the corresponding Lorentz transformation is t' = t, x' = x, y' = y, z' = z . Thus the x and z coordinates have been reflected. This is certainly a proper orthochronous Lorentz transformation, because the determinant is +1 and the coefficient of t is positive. But does reflecting the x and z coordinates agree with the stereographic effect on the Riemann sphere of the transformation w  1/w? Note that the point w = r + 0i maps to 1/r + 0i. There's a nice little geometric demonstration that the stereographic projections of these points have coordinates (x,0,z) and (x,0,z) respectively, noting that the two projection lines have negative inverse slopes and so are perpendicular in the xz plane, which implies that they must strike the sphere on a common diameter (by Pythagoras' theorem). A similar analysis shows that points off the real axis with projected coordinates (x,y,z) in general map to points with projections (x,y, z) points. The two examples just covered were both trivial in the sense that they left t unchanged. For a more interesting example, consider the Mobius transformation w  w + p, which corresponds to the Lorentz transformation

If we denote our spacetime coordinates by the column vector X with components x0 = t, x1 = x, x2 = y, x3 = z, then the transformation can be written as

where

To analyze this transformation it's worthwhile to note that we can decompose any Lorentz transformation into the product of a simple boost and a simple rotation. For a given relative velocity with magnitude |v| and components v1, v2, v3, let γ denote the "boost factor"

It's clear that

Thus, these four components of L are fixed purely by the boost. The remaining components depend on the rotational part of the transformation. If we define a "pure boost" as a Lorentz transformation such that the two frames see each other moving with velocities (v1,v2,v3) and (v1,v2,v3) respectively, then there is a unique pure boost for any given relative velocity vector v1,v2,v3. This boost has the components

where Q = (γ1)/|v|2. From our expression for L we can identify the components to give the boost velocity in terms of the Mobius parameter p

and

From these we write the pure boost part of L as follows

We know that our Lorentz transformation L can be written as the product of this pure boost B times a pure rotation R, i.e., L = BR, so we can determine the rotation

which in this case gives

In terms of Euler angles, this represents a rotation about the y axis through an angle of

The correspondence between the coefficients of the Mobius transformation and the Lorentz transformation described above assumes stereographic projection from the North pole to the equatorial plane. More generally, if we're projecting from the North Pole of the Riemann sphere to a complex plane parallel to (but not necessarily on) the equator, and if the North Pole is at a height h above the plane, then every point in the plane is a factor of h further away from the origin than in the case of equatorial projection (h=1), so the Mobius transformation corresponding to the above Lorentz transformation is w  (Aw+B)/(Cw+D) where

It's also worth noting that the instantaneous aberration observed by an accelerating observer does not differ from that observed by a momentarily co-moving inertial observer. We're referring here to the null (light-like) rays incident on a point of zero extent, so this is not like a finite spinning body whose outer edges have significant velocities relative to their centers. We're just referring to different coordinate systems whose origins coincide at a given point in spacetime, and describing how the light rays pass through that point in terms of the different coordinate systems at that instant. In this context the acceleration (or spinning) of the systems make no difference to the answer. In other words, as long as our inertial coordinate system has the same velocity and orientation as the (ideal point-like) observer at the moment of the observation, it doesn't matter if the observer is in the process of changing his orientation or velocity. (This is a corollary of the "clock hypothesis" of special relativity, which asserts that a traveler's time dilation at a given instant depends only on his velocity and not his acceleration at that instant.) In general we can classify Mobius transformations (and the corresponding Lorentz transformations) according to their "squared trace", i.e., the quantity

This is also the "conjugacy parameter", i.e., two linear fractional transformations are conjugate if and only if they have the same value of σ. The different kinds of transformations are listed below: 0 σ <4 σ=4 σ>4 σ < 0 or not real

elliptic parabolic hyperbolic loxodromic

For example, the class of pure rotations are a special case of elliptic transformations, having the form with where an overbar denotes complex conjugation. Also, it's not hard to show that the compositions of an arbitrary linear fractional transformation f(z) are cyclical with a period m if and only if σ = 4cos(2kπ/m)2. We've seen that the general finite transformation of the incoming null rays can be expressed naturally in the form of a finite Mobius transformation of the complex plane (under sterographic projection). This is a very simple algebraic operation, given by the function

for complex constants a,b,c,d. This generates the discrete sequence f1(z) = f(z), f2(z) = f(f(z)), f3(z) = f(f(f(z))), and so on for all fn(z) where n is a positive integer. It's also possible to parameterize a Mobius transformation to give the corresponding infinitesimal generator, which can be applied to give "fractional iterations" such as f1/2(z), or more generally the continuously parameterized transformation fp(z) for any real (or even complex) value of p. To accomplish this we must (in general) first map the discrete generator f(z) to a domain in which it has some convenient exponential form, then apply the pth-order transformation, and then map back to the original domain. There are several cases to consider, depending on the character of the discrete generator. In the degenerate case when ad = bc with c  0, the pth iterate of f(z) is simply the constant fp(z) = a/c. On the other hand, if c = 0 and a = d  0, then fp(z) = z + (b/d)p. The third case is with c = 0 and a  d. The pth iterate of f(z) in this case is

Notice that the second and third cases are really linear transformations, since c = 0. The fourth case is with c  0 and (a+d)2/(ad-bc) = 4, which leads to the following closed form expression for the pth iterate

This corresponds to the case when the two fixed points of the Mobius transformation are co-incident. In this "parabolic" case, if a+d = 0 then the Mobius transformation reduces to the first case with ad-bc = 0. Finally, in the most general case we have c  0 and (a+d)2 /(ad-bc)  4, and the pth iterate of f(z) is given by

where

This is the general case with two distinct fixed points. (If a+d = 0 then σ = 0 and K = -1.) The parameters A and B are the coefficients of the linear transformation that maps real line to the locus of points with real part equal to 1/2. Notice that the pth composition of f satisfies the relation

so we have

where

Thus , which shows that f(z) is conjugate to the simple function Kz. Since A+B is the complex conjugate of B, we see that h(z) can be expressed as

where

This enables us to express the pth composition of any linear fractional transformation with two fixed points, and therefore any corresponding Lorentz transformation, in the form

This shows that there is a particular oriented frame of reference, represented by h(z), with respect to which the relation between the oriented frames z and f(z) is purely exponential. (We must refer to oriented frames rather than merely frames because the Mobius transformation represented the effects of general orientation as well as velocity boost.) To show explicitly how the action of fp(z) on the complex plane varies with p, consider the relatively simple linear fractional transformation f(z) with fixed points at 0 and 1 on the real axis, which implies A = 1 and B = 0. In parameterized form the pth composition of this transformation is of the form

for some complex constant K, and the similarity parameter for this transformation is σ = (1+K)2/K. For any given K and complex initial value z = x + iy, let

Then the real and imaginary components of fp(z) are given by

This makes explicit how the action of fp(z) on the complex plane is entirely determined by the magnitude and phase angle of the constant K, which, as we saw previously, is given by

If a,b,c,d are all real, then σ is real, in which case either K is real (σ>4 or σ<0) or K is complex (0<σ<4) with a magnitude (norm) of 1. However, if a,b,c,d are allowed to be complex, then K can be complex with a magnitude other than 1. Of course, if K is real, we can set R = K and θ = 0, so pθ = 0 for all p, and the above equations reduce to

Clearly the computational complexity of the continuous parameterized transformation (4) exceeds that of the discrete transformation (3). This raises an interesting question, at least from a neo-Platonic perspective. Is it possible that nature prefers the simplicity of the discrete form over the continuous? In other words, are all physically realizable Lorentz transformations actually discrete? If so, what determines the "size" of the minimum transformation? What is the "size" of a Mobius transformation? We know that every Mobius is conjugate to a pure exponential, and the effect of which is a rotation and a re-scaling. In addition, the conjugation itself may impose some kind of size. It's interesting that the elements now are not frames but differences between frames, including rotations. Thus, rather than the ontological objects of consideration being events of the form x,y,z,t, or even coordinate systems, the objects are the transformations with complex coefficients a,b,c,d. This again introduces the octonion space, though restricted by the fact that there are only three (complex) degrees of freedom.

2.7 The Sagnac Effect Blind unbelief is sure to err, And scan his work in vain;

God is his own interpreter, And he will make it plain. William Cowper, 1780 If two pulses of light are sent in opposite directions around a stationary circular loop of radius R, they will traveled the same inertial distance at the same speed, so they will arrive at the end point simultaneously. This is illustrated in the left-hand figure below.

The figure on the right indicates what happens if the loop itself is rotating during this procedure. The symbol α denotes the angular displacement of the loop during the time required for the pulses to travel once around the loop. For any positive value of α, the pulse traveling in the same direction as the rotation of the loop must travel a slightly greater distance than the pulse traveling in the opposite direction. As a result, the counterrotating pulse arrives at the "end" point slightly earlier than the co-rotating pulse. Quantitatively, if we let ω denote the angular speed of the loop, then the circumferential tangent speed of the end point is v = ωR, and the sum of the speeds of the wave front and the receiver at the "end" point is cv in the co-rotating direction and c+v in the counterrotating direction. Both pulses begin with an initial separation of 2πR from the end point, so the difference between the travel times is

where A = πR2 is the area enclosed by the loop. This analysis is perfectly valid in both the classical and the relativistic contexts. Of course, the result represents the time difference with respect to the axis-centered inertial frame. A clock attached to the perimeter of the ring would, according to special relativity, record a lesser time, by the factor γ = (1 (v/c)2)1/2, so the Sagnac delay with respect to such a clock would be [4Aω/c2]/(1(v/c)2)1/2. However, the characteristic frequency of a given light source co-moving with this clock

would be greater, compared to its reduced value in terms of the axis-centered frame, by precisely the same factor, so the actual phase difference of the beams arriving at the receiver is invariant. (It's also worth noting that there is no Doppler shift involved in a Sagnac device, because each successive wave crest in a given direction travels the same distance from transmitter to receiver, and clocks at those points show the same lapse of proper time, both classically and in the context of special relativity.) This phenomenon applies to any closed loop, not necessarily circular. For example, suppose a beam of light is split by a half-silvered mirror into two beams, and those beams are directed in a square path around a set of mirrors in opposite directions as shown below.

Just as in the case of the circular loop, if the apparatus is unaccelerated, the two beams will travel equal distances around the loop, and arrive at the detector simultaneously and in phase. However, if the entire device (including source and detector) is rotating, the beam traveling around the loop in the direction of rotation will have farther to go than the beam traveling counter to the direction of rotation, because during the period of travel the mirrors and detector will all move (slightly) toward the counter-rotating beam and away from the co-rotating beam. Consequently the beams will reach the detector at slightly different times, and slightly out of phase, producing optical interference "fringes" that can be observed and measured. Michelson had proposed constructing such a device in 1904, but did not pursue it at the time, since he realized it would show only the absolute rotation of the device. The effect was first demonstrated in 1911 by Harress (unwittingly) and in 1913 by Georges Sagnac, who published two brief notes in the Comptes Rendus describing his apparatus and summarizing the results. He wrote The result of measurements shows that, in ambient space, the light is propagated with a speed V0, independent of the overall movement of the source of light O and optical system. This rules out the ballistic theory of light propagation (as advocated by Ritz in 1909), according to which the speed of light is the vector sum of the velocity of the source plus a vector of magnitude c. Ironically, the original Michelson-Morley experiment was consistent with the ballistic theory, but inconsistent with the naïve ether theory, whereas the Sagnac effect is consistent with the naïve ether theory but inconsistent with the ballistic theory. Of course, both results are consistent with fully relativistic theories of

Lorentz and Einstein, since according to both theories light is propagated at a speed independent of the state of motion of the source. Because of the incredible precision of interferometric techniques, devices like this are capable of detecting and measuring extremely small amounts of absolute rotation. One of the first applications of this phenomenon was an experiment performed by Michelson and Gale in 1925 to measure the absolute rotation rate of the Earth by means of a rectangular optical loop 2/5 mile long and 1/5 mile wide. (See below for Michelson’s comments on this experiment.) More recently, the invention of lasers around 1963 has led to practical small-scale devices for measuring rotation by exploiting the Sagnac effect. There are two classes of such devices, namely, ring interometers and ring lasers. A ring interferometer typically consists of many windings of fiber optic lines, conducting light (of a fixed frequency) in opposite directions around a loop, and then recombining them to measure the phase difference, just as in the original Sagnac apparatus, but with greater efficiency and sensitivity. A ring laser, on the other hand, consists of a laser cavity in the shape of a ring, which allows light to circulate in both directions, producing two standing waves with the same number of nodes in each direction. Since the optical path lengths in the two directions are different, the resonant frequencies of the two standing waves are also different. (In practice it is typically necessary to “dither” the ring to prevent phase locking of the two modes.) The “beat” between the two frequencies is measured, giving a result proportional to the rotation rate of the device. Incidentally, it isn’t necessary for the actual laser cavity to circumscribe the entire loop; longitudinal pumping can be used, driven by feedback carried in opposite directions around the loop in ordinary optical fibers. (Needless to say, the difference in resonant frequency of the two stand waves in a ring laser due to the different optical path lengths is not to be confused with a Doppler shift.) Today such devices are routinely used in guidance and navigation systems for commercial airliners, nautical ships, spacecraft, and in many other applications, and are capable of detecting rotation rates as slight as 0.00001 degree per hour. We saw previously that the time delay (and therefore the difference in the optical path lengths) for a circular loop is proportional to the area enclosed by the loop. This interesting fact actually applies to arbitrary closed loops. To prove this, we will derive the difference in arrival times of the two pulses of light for an arbitrary polygonal loop inscribed in a circle. Let the (inertial) coordinates of two consecutive mirrors separated by a subtended angle θ be

where ω is the angular velocity of the device. Since light rays travel along null intervals, we have c2(dt)2 = (dx)2 + (dy)2, so the coordinate time T required for a light pulse to travel from one mirror to the next in the forward and reverse directions satisfies the equations

Typically ωT is extremely small, i.e., the polygon doesn't rotate through a very large angle in the time it takes light to go from one mirror to the next, so we can expand these equations in ωT (up to second order) and collect powers of T to give the quadratic

The two roots of this polynomial are the values of T, one positive and one negative, for the co-rotating and counter-rotating solutions, so the difference in the absolute times is the sum of these roots. Hence we have

This is the net contribution of this edge to the total time increment. Recalling that the area of a regular n-sided polygon of radius R is nR2sin(2π/n)/2, the area of the triangle formed by the hub and the two mirrors is R2sin(θ)/2. It follows that each edge of an arbitrary polygonal loop inscribed in a circle contributes 4Aiω/(c2  v2cos(θ)) to the total time discrepancy, where Ai is the area of the ith triangular slice of the loop and v = Rω is the tangential speed of the mirrors. Therefore, the total discrepancy in travel times for the corotating and counter-rotating beams around the entire loop is simply

where A is the total area enclosed in the loop. This applies to polygons with any number of sides, including the limiting case of circular fiber-optic loops with virtually infinitely many edges (where the "mirrors" are simply the inner reflective lining of the fiber-optic cable), in which case θ goes to zero and the denominator of the phase difference is simply c2  v2. For realistic values of v (i.e., very small compared with c), the phase difference reduces to the well-known result 4Aω/c2. It's worth noting that nothing in this derivation is unique to special relativity, because the Sagnac effect is a purely "classical" effect. The apparatus is set up as a differential device, so the relativistic effects apply equally in both directions, and hence the higher-order corrections of special relativity cancel out of the phase difference. Despite the ease and clarity with which special relativity accounts for the Sagnac effect, one occasionally sees claims that this effect entails a conflict with the principles of special relativity. The usual claim is that the Sagnac effect somehow falsifies the invariance of light speed with respect to all inertial coordinate systems. Of course, it does no such thing, as is obvious from the fact that the simple description of an arbitrary Sagnac device given above is based on isotropic light speed with respect to one particular system of inertial coordinates, and all other inertial coordinate systems are related to this one by Lorentz transformations, which are defined as the transformations that preserve light speed. Hence no description of a Sagnac device in terms of any system of inertial

coordinates can possibly entail non-isotropic light speed, nor can any such description yield physically observable results different from those derived above (which are known to agree with experiment). Nevertheless, it remains a seminal tenet of anti-relativityism (for lack of a better term) that the trivial Sagnac effect somehow "disproves relativity". Those who espouse this view sometimes claim that the expressions "c+v" and "cv" appearing in the derivation of the phase shift are prima facie proof that the speed of light is not c with respect to some inertial coordinate system. When it is pointed out that those quantities do not refer to the speed of light, but rather to the sum and difference of the speed of light and the speed of some other object, both with respect to a single inertial coordinate system, which can be as great as 2c according to special relativity, the anti-relativityists are undaunted, and merely proceed to construct progressively more convoluted and specious "objections". For example, they sometimes argue that each point on the perimeter of a rotating circular Sagnac device is always instantaneously at rest in some inertial coordinate system, and according to special relativity the speed of light is precisely c in all directions with respect to any inertial system of coordinates, so (they argue) the speed of light must be isotropic at every point around the entire circumference of the loop, and hence the light pulses must take an equal amount of time to traverse the loop in either direction. Needless to say, this "reasoning" is invalid, because the pulses of light are never (let alone always) at the same point in the loop at the same time during their respective trips around the loop in opposite directions. At any given instant the point of the loop where one pulse is located is necessarily accelerating with respect to the instantaneous inertial rest frame of the point on the loop where the other pulse is located (and vice versa). As noted above, it’s self-evident that since the speed of light is isotropic with respect to at least one particular frame of reference, and since every other frame is related to that frame by a transformation that explicitly preserves light speed, no inconsistency with the invariance of the speed of light can arise. Having accepted that the observable effects predicted by special relativity for a Sagnac device are correct and entail no logical inconsistency, the dedicated opponents of special relativity sometimes resort to claims that there is nevertheless an inconsistency in the relativistic interpretation of what's really happening locally around the device in certain extreme circumstances. The fundamental fallacy underlying such claims is the idea that the beams of light are traveling the same, or at least congruent, inertial paths through space and time as they proceed from the source to the detector. If this were true, their inertial speeds would indeed need to differ in order for their arrival times at the detector to differ. However, the two pulses do not traverse congruent paths from emission to detector (assuming the device is absolutely rotating). The co-rotating beam is traveling slightly farther than the counter-rotating beam in the inertial sense, because the detector is moving away from the former and toward the latter while they are in transit. Naturally the ratio of optical path lengths is the same with respect to any fixed system of inertial coordinates. It’s also obvious that the absolute difference in optical path lengths cannot be "transformed away", e.g., by analyzing the process with respect to coordinates rigidly

attached to and rotating along with the device. We can, of course, define a system of coordinates in terms of which the position of a point fixed on the disk is independent of the time coordinate, but such coordinates are necessarily rotating (accelerating), and special relativity does not entail invariant or isotropic light speed with respect to noninertial coordinates. (In fact, one need only consider the distant stars circumnavigating the entire galaxy every 24 hours with respect to the Earth's rotating system of reference to realize that the limiting speed of travel is generally not invariant and isotropic in terms of accelerating coordinates.) A detailed analysis of a Sagnac device in terms of non-inertial (i.e., rotating) coordinates is presented in Section 4.8, and discussed from a different point of view in Section 5.1. For the present, let's confine our attention to inertial coordinates, and demonstrate how a Sagnac device is described in terms of instantaneously co-moving inertial frames of an arbitrary point on the perimeter. Suppose we've sent a sequence of momentary pulses around the loop, at one-second intervals, in both directions, and we have photo-detectors on each mirror to detect when they are struck by a co-rotating or counter-rotating pulse. Clearly the pulses will strike each mirror at one-second intervals from both directions (though not necessarily synchronized) because if they were arriving more frequently from one direction than from the other, the secular lag between corresponding pulses would be constantly increasing, which we know is not the case. So each mirror is receiving one pulse per second from both directions. Furthermore, a local measurement of light speed performed (over a sufficiently short period of time) by an observer riding along at a point on the perimeter will necessarily show the speed of light to be c in all direction with respect to his instantaneously co-moving inertial coordinates. However, this system of coordinates is co-moving with only one particular point on the rim. At other points on the rim these coordinates are not co-moving, and so the speed of light is not c at other points on the rim with respect to these coordinates. To describe this in detail, let's first analyze the Sagnac device from the hub-centered inertial frame. Throughout this discussion we assume an n-sided polygonal loop where n is very large, so the segment between any two adjacent mirrors subtends only a very small angle. With respect to the hub-centered frame each segment is moving with a velocity v parallel to the direction of travel of the light beams, so the situation on each segment is as plotted below in terms of hub-frame coordinates:

In this drawing, tf is the time required for light to cross this segment in the co-rotating direction, and tr is the time required for light to cross in the counter-rotating direction. The difference between these two times, denoted by dt, is the incremental Sagnac effect for a segment of length dp on the perimeter. Now, the ratio of dt/dp as a function of the rim velocity v can easily be read off this diagram, and we find that

This can be taken as a measure of the anisotropy over an incremental segment with respect to the hub frame. (Notice that this anisotropy with respect to the conventional relativistic spacetime decomposition for any inertial frame is actually in the distance traveled, not the speed of travel.) All the segments are symmetrical in this frame, so they all have this same anisotropy. Therefore, we can determine the total difference in travel times for co-rotating and counter-rotating beams of light making a complete trip around the loop by integrating dt around the perimeter. Thus we have

Substituting ωr in place of v in the numerator, and noting that the enclosed area is A = π r2, we again arrive at the result T = 4Aω/(c2  v2). Now let's analyze the loop with respect to one of our tangential frames of reference, i.e., an inertial frame that is momentarily co-moving with one of the segments on the rim. If we examine the situation on that particular segment in terms of its own co-moving inertial frame we find, not surprisingly, the situation shown below:

This shows that dt/dp = 0, meaning no anisotropy at all. Nevertheless, if the light beams are allowed to go all the way around the loop, their total travel times will differ by T as computed above, so how does that difference arise with respect to this tangential frame? Notice that although dt/dp equals zero at this tangent point with respect to the tangent frame, segments 90 degrees away from this point have the same anisotropy as we found for all the segments relative to the hub frame, namely, dt/dp = 2v/(c2  v2), because the velocity of those two segments relative to our tangential frame is exactly v along the direction of the light rays, just as it was with respect to the hub frame. Furthermore, the segment 180 degrees away from our tangent segment has twice the anisotropy as it has with respect to the original hub-frame inertial coordinates, because that segment has a velocity of 2v with respect to our tangential frame. In general, the anisotropy dt/dp can be computed for any segment on the loop simply by determining the projection of that segment's velocity (with respect our tangential frame) onto the axis of the light rays. This gives the results illustrated below, showing the ratio of the tangential frame anisotropy to the hub frame anisotropy:

It's easy to show that

where θ is the angle relative to the tangent point. To assess the total difference in arrival times for light rays going around the loop in opposite directions, we need to integrate dt by dp around the perimeter. Noting that θ equals p/r, we have

which again equals 4Aω/(c2  v2), in agreement with the hub frame analysis. Thus, although the anisotropy is zero at each point on the rim's surface when evaluated with respect to that point's co-moving inertial frame, we always arrive at the same overall nonzero anisotropy for the entire loop. This was to be expected, because the absolute physical situation and intervals are the same for all inertial frames. We're simply decomposing those absolute intervals into space and time components in different ways. The union of all the "present" time slices of the sequence of instantaneous co-moving inertial coordinate systems for a point fixed on the rim of a rotating disk, with each time slice assigned a time coordinate equal to the proper time of the fixed point, constitutes a coherent and unambiguous coordinate system over a region of spacetime that includes the entire perimeter of the disk. The general relation for mapping the proper time of one worldline into another by means of the co-moving planes of simultaneity of the former is derived at the end of Section 2.9, where it is shown that the derivative of the mapped time from a point fixed on the rim to a point at the same radius fixed in the hub frame is positive provided the rim speed is less than c. Of course, for locations further from the center of rotation the planes of simultaneity of a revolving point fixed on the rim will be become "retrograde", i.e., will backtrack, making the coordinate system ambiguous. This occurs for locations at a distance greater than 1/a from the hub, where a is the acceleration of the point fixed on the rim. It's also worth noting that the amount of angular travel of the device during the time it takes for one pair of light pulses to circumnavigate a circular loop is directly proportional to the net "anisotropy" in the travel times. To prove this, note that in a circular Sagnac device of radius R the beam of light in the direction of rotation travels a distance of (2π  ωt1)R and the other beam goes a distance of (2π + ωt2)R where t1 and t2 are the travel times of the two beams, and ω is the angular velocity of the loop. The travel times of the beams are just these distances divided by c, so we have

Solving for the times gives

so the difference in times is

where A = 2πR2 and v = ωR. The "anisotropic ratio" is the ratio of the travel times, which is

Solving this for ωR gives

Letting θ denote the angular travel of the loop during the travel of the two light beams, we have

Substituting for ωR this reduces to

Therefore, the amount by which the ratio of travel times differs from 1 is exactly proportional to the angle through which the loop rotates during the transit of light, and this is true independent of R. (Of course, increasing the radius has the effect of increasing the difference between the travel times, but it doesn't alter the ratio.) It's worth emphasizing that the Sagnac effect is purely a classical, not a relativistic phenomenon, because it's a "differential device", i.e., by running the light rays around the loop in opposite directions and measuring the time difference, it effectively cancels out the "transverse" effects characteristic of truly relativistic phenomenon. For example, the length of each incremental segment around the perimeter is shorter by a factor of [1 (v/c)2]1/2 in the hub based frame than in it's co-moving tangential frame, but this factor applies in both directions around the loop, so it doesn't affect the differential time. Likewise a clock on the perimeter moving at the speed v runs slow, in accord with special relativity, but the frequency of the light source is correspondingly slow, and this applies

equally in both directions, so this does not affect the phase difference at the receiver. Thus, a pure Sagnac apparatus does not discriminate between relativistic and prerelativistic theories (although it does rule out ballistic theories, ala Ritz). Ironically, this is the main reason it comes up so often in discussions of relativity, because the effect can easily be computed on a non-relativistic basis (as we did above for a circular loop, taking the sums c+v and cv to determine the transit times in the two directions). Of course, if the light traveling around the loop passes through moving media with indices of refraction differing significantly from unity, then the Fizeau effect must also be taken into account, and in this case the results, while again perfectly consistent with special relativity, are quite problematic for any non-relativistic ether-based interpretation. As mentioned above, as early as 1904 Michelson had proposed using such a device to measure the rotation of the earth, but he hadn't pursued the idea, since measurements of absolute rotation are fairly commonplace (e.g. Focault’s pendulum). Nevertheless, he (along with Gale) agreed to perform the experiment in 1925 (at considerable cost) at the urging of "relativists", who wished him to verify the shift of 236/1000 of a fringe predicted by special relativity. This was intended mainly to refute the ballistic theory of light propagation, which predicts zero phase shift (for a circular device). Michelson was not enthusiastic, since classical optics on the assumption of a stationary ether predicted exactly the same shift does special relativity (as explained above). He said We will undertake this, although my conviction is strong that we shall prove only that the earth rotates on its axis, a conclusion which I think we may be said to be sure of already. As Harvey Lemon wrote in his biographical sketch of Michelson, "The experiment, performed on the prairies west of Chicago, showed a displacement of 230/1000, in very close agreement with the prediction. The rotation of the Earth received another independent proof, the theory of relativity another verification. But neither fact had much significance." Michelson himself wrote that "this result may be considered as an additional evidence in favor of relativity - or equally as evidence of a stationary ether". The only significance of the Sagnac effect for special relativity (aside from providing another refutation of ballistic theories) is that although the effect itself is of the first order in v/c, the qualitative description of the local conditions on the disk in terms of inertial coordinates depends on second-order effects. These effects have been confirmed empirically by, for example, the Michelson-Morley experiment. Considering the Earth as a particle on a large Sagnac device as it orbits around the Sun, the ether drift experiments demonstrate these second-order effects, confirming that the speed of light is indeed invariant in terms of relatively moving systems of inertial coordinates. 2.8 Refraction At A Plane Boundary Between Moving Media Mathematicians usually consider the Rays of Light to be Lines reaching from the luminous Body to the Body illuminated, and the refraction of

those Rays to be the bending or breaking of those lines in their passing out of one Medium into another. And thus may Rays and Refractions be considered, if Light be propagated in an instant. But by an Argument taken from the Equations of the times of the Eclipses of Jupiter's Satellites, it seems that Light is propagated in time, spending in its passage from the Sun to us about seven Minutes of time: And therefore I have chosen to define Rays and Refractions in such general terms as may agree to Light in both cases. Isaac Newton (Opticks), 1704 The ray angles θ1 and θ2 for incident and refracted optical rays at a plane boundary between regions of constant indices of refraction n1 and n2 are related according to Snell’s law

However, this formula applies only if the media (which are assumed to have isotropic index of refraction with respect to their rest frames) are at rest relative to each other. If the media are in relative transverse motion, it is necessary to account for the effect of aberration on the ray angles relative to the rest frames of the respective media. The result is that the effective refraction is a function of the relative transverse velocity of the media. Thus, measurements of the optical refraction could (in principle) be used to determine the velocity of a moving volume of fluid. Unlike Doppler shift measurement techniques, this approach does not rely on the presence of discrete particles in the fluid, and involves only measurements of direct, rather than reflected, light signals. Since the amount of refraction at a boundary depends on the angle of incidence with respect to the rest frames of the media, it follows that if the media have different rest frames the simple form of Snell’s law does not apply directly, because it will be necessary to account for aberration. To derive the law of refraction for transversely moving media, consider the arrangement shown in Figure 1, drawn with respect to a system of coordinates (x,y,t) relative to which the medium with refractive index n1 is at rest.

In these coordinates the medium with index n2 is moving transversely with a speed v. By both Fermat’s principle of “least time” and the principles of quantum electrodynamics, we know that the path of light from point P0 to point P2 is such that the travel time is stationary (which, in this case, means minimized), so if we express the total travel time as a function of the x coordinate of the “corner point” P1, we can differentiate to find the position that minimizes the time, and from this we can infer the angles of incidence and refraction. With respect to the xyt coordinates in which the n1 medium is at rest, the squared spatial distance from P0 to P1 is x12 + y12, so the time required for light to traverse that distance is

On the other hand, for the trip from point P1 to point P2 we need to know the distance traveled with respect to the coordinates x'y't' in which the n2 medium is at rest. If we define

then the Lorentz transformation gives us the corresponding increments in the primed coordinates

Therefore, the squared spatial and temporal distances from P1 to P2 in the n2 rest coordinates are given by

Since the ratio of these increments equals the square of the speed of light in the n2 medium, which is 1/n22, we have

Solving this quadratic for Δt, which equals tC  tB, gives

Differentiating with respect to Δx, and noting that d(Δx)/dx1 = 1, we can minimize the total travel time t2  t0 by adding the derivatives of Δt and t1  t0 with respect to x1, and setting the result to zero. This leads to the condition

Making the substitutions

we arrive at the equation for refraction at the plane boundary between transversely moving media

As expected, this reduces to Snell’s law for stationary media if we set v = 0. Also, if the moving medium has a refractive index of n2 = 1, this equation again reduces to Snell’s law, regardless of the velocity, because the concept of speed doesn’t apply to the vacuum. If we define the parameter

then the refraction equation can be written more compactly as

This can be solved explicitly for sin(θ2) to give the result

with the appropriate sign for the square root. Taking n1 = 1.2 and n2 = 1.5, the figure below shows the angle of refraction θ2 as a function of the transverse speed v of the medium with various angles of incidence θ1 ranging from -3π/8 to +3π/8 radians.

Incidentally, when plotting these lines it is necessary to take the positive root when v is above the zero-crossing speed, and the negative root when v is below. The zero-crossing speed (i.e., the speed v when the refracted angle is zero) is

The figure shows that at high relative speeds and high angle of incidence we can achieve total internal reflection, even though the downstream medium is more dense than the upstream medium. The critical conditions occur when the squared quantity in parentheses in the preceding equation reaches 1, which implies

Solving these two quadratics for v (remembering that ψ2 is a function of v), we have the four distinguished speeds

The two speeds given by 1/n2 (which are just the speeds of light in the moving medium) generally correspond to removable singularities, because both the numerator and denominator of the expression for sin(θ2) vanish. At these speeds the values of θ2 can be assigned continuously as

It isn’t clear what, if any, optical effects would appear at these two removable singularities. The other two distinguished speeds represent the onset of total internal reflection if their values fall in the range from -1 to +1. For example, the figure above shows that total internal reflection for an incident angle of θ1 = 3π/8 with n1 = 1.2 and n2 = 1.5 begins when the speed v exceeds

Notice that for an incidence angle of zero, this speed is simply n2, which is ordinarily greater than 1, and thus outside the range of achievable speeds (since we assume the medium itself is moving through a vacuum). However, for non-zero angles of incidence it is possible for one of these two critical speeds to lie in the achievable range. In fact, for certain values of n1, n2, and θ1, it is possible for all four of the critical speeds to lie within the achievable range, leading to some interesting phenomena. For example, with n1 = n2 = 2.5 and with θ1 = 45 degrees, the refracted angle as a function of medium speed is as

shown below.

In this case the distinguished speeds are -0.4, +0.203, +0.4, and +0.783. This suggests that as the transverse speed of the medium increases from 0, the refracted ray becomes steeper until reaching 90 degrees at v = +0.203, at which point there is total internal reflection. This remains the case until achieving a speed of +0.783, at which point some refraction is re-introduced, and the refracted angle sweeps back from +90 to about +80 degrees (relative to the stationary frame), and then back to +90 degrees as speed continues to increase to 1. This can be explained in terms of the variations in the effective critical angle and the aberration angle. As speed increases, the effective critical angle for total internal reflection initially increases faster than the aberration angle, pushing the ray into total internal reflection. However, eventually (at close to the speed of light) the aberration effect brings the incident ray back into the refractive range. For an alternative derivation that leads to a different, but equivalent, relation, suppose the index of refraction of the stationary region is n1 = 1, which implies this region is a vacuum. If we let d1 denote the spatial distance from P0 to P1 with respect to the rest frame, then we have

These are the components of the interval P0 to P1 with respect to the rest frame of n1, and they can be converted to the frame of n2 (denoted by upper case letters) using the Lorentz transformation

Letting Θ1 denote the angle θ1 with respect to the moving n2 coordinate system, we can express the tangent of this angle as

Taking the sine of the inverse tangent of both sides gives the familiar aberration formula

Since we are assuming the n1 medium is a vacuum, we are free to treat the entire configuration as being at rest in the n2 coordinates, with the angle of incidence as defined above. Therefore, Snell’s law for stationary media can be applied to give the refracted angle relative to these coordinates

Now, if D2 is the spatial distance from P1 to P2 with respect to the moving coordinates, we have

Also, the Lorentz transformation gives the coordinates of points P1 and P2 in the rest frame in terms of the coordinates in the moving frame as follows:

From these we can construct the tangent of θ2 with respect to the rest coordinates

Substituting for the coordinate differences gives

We saw previously that

so we can explicitly compute θ2 from θ1. It can be shown that this solution is identical to the solution (with n1 = 1) derived previously on the basis of Fermat's principle. Furthermore, we can solve these equations for sin(θ1) as a function of θ2 and then by equating this sin(θ1) with n3 sin(θ3) for a stationary medium neighboring the vacuum region, we again have the general solution for two refractive media in relative transverse motion. A plot of θ2 from θ1 for various values of v is shown below:

2.9 Accelerated Travels This yields the following peculiar consequence: If there are two synchronous clocks, and one of them is moved along a closed curve with constant [speed] until it has returned, then this clock will lag on its arrival behind the clock that has not been moved. Albert Einstein, 1905 Suppose a particle accelerates in such a way that it is subjected to a constant proper acceleration a0 for some period of time. The proper acceleration of a particle is defined as the acceleration with respect to the particle's momentarily co-moving inertial coordinates at any given instant. The particle's velocity is v = 0 at the time t = 0, when it is located at x = 0, and at some infinitesimal time Δt later its velocity is Δt a0 and its location is (1/2) a0 Δt2. The slope of its line of simultaneity is the inverse of the slope 1/v of its worldline, so its locus of simultaneity at t = Δt is the line given by

This line intersects the particle's original locus of simultaneity at the point (x,0) where

At each instant the particle is accelerating relative to its current instantaneous frame of reference, so in the limit as Δt goes to zero we see that its locus of simultaneity constantly passes through the point (-1/a0, 0), and it maintains a constant absolute spacelike distance of -1/a0 from that point, as illustrated in the figure below.

This can be compared to a particle moving with a speed v tangentially to a center of

attraction toward which it is drawn with a constant acceleration a0. The path of such a particle is a circle in space of radius v2/a0. Likewise in spacetime a particle moving with a speed c tangentially to a center of "repulsion" with a constant acceleration a0 traces out a hyperbola with a "radius" of c2/a0. (In this discussion we are using units with c=1, so the "radius" shown in the above figure is written as 1/a0.) Since the worldline of a particle with constant proper acceleration is a branch of a hyperbola with "radius" 1/a0, we can shift the x axis by 1/a0 to place the origin at the center of the hyperbola, and then write the equation of the worldline as

Differentiating both sides with respect to t gives

which shows that the velocity of the worldline at any point (x,t) is given by v = t/x. Consequently the line from the origin through any point on the hyperbolic path represents the space axis for the co-moving inertial coordinates of the accelerating worldline at that point. The same applies to any other hyperbolic path asymptotic to the same lightlines, so a line from the origin intersects any two such hyperbolas at points that are mutually simultaneous and separated by a constant proper distance (since they are both a fixed proper distance from the origin along their mutual space axis). It follows that in order for a slender "rigid" rod accelerating along its axis to maintain a constant proper length (with respect to its co-moving inertial frames), the parts of the rod must accelerate along a family of hyperbolas asymptotic to the same lightlines, as illustrated below.

The x',t' axes represent the mutual co-moving inertial frame of the hyperbolic worldlines

where they intersect with the x' axis. All the worldlines have constant proper distances from each other along this axis, and all have the same speed. The latter implies that they have each been accelerated by the same total amount at any instant of their mutual comoving inertial frame, but the accelerations have been distributed differently. The "innermost" worldline (i.e., the trailing end of the rod) has been subjected to a higher level of instantaneous acceleration but for a shorter time, whereas the "outer-most" worldline (i.e., the leading end of the rod) has been accelerated more mildly, but for a longer time. It's worth noting that this form of "coherent" acceleration would not occur if the rod were accelerated simply by pushing on one end. It would require the precisely coordinated application of distinct force profiles to each individual particle of the rod. Any deviation from these profiles would result in internal stresses of one part of the rod on another, and hence the rest length would not remain fixed. Furthermore, even if the coherent acceleration profiles are perfectly applied, there is still a sense in which the rod has not remained in complete physical equilibrium, because the elapsed proper times along the different hyperbolic worldlines as the rod is accelerated from a rest state in x,t to a rest state in some x',t' differ, and hence the quantum phases of the two ends of the rod are shifted with respect to each other. Thus we must assume memorylessness (as mentioned in Section 1.6) in order to assert the equivalence of the equilibrium states for two different frames of reference. We can then determine the lapse of proper time τ along any given hyperbolic worldline using the relation

, which leads (for the hyperbola of unit "radius") to

Integrating this relation gives

Solving this for t and substituting into the equation of the hyperbola to give x, we have the parametric equation of the hyperbola as a function of the proper time along the worldline. If we subtract 1/a0 from x to return to our original x coordinate (such that x = 0 at t = 0) these equations are

Differentiating the above expressions gives

so the particle's velocity relative to the original inertial coordinates is

We're using "time units" throughout this section, which means that all times and distances are expressed in units of time. For example, if the proper acceleration of the particle is 1g (the acceleration of gravity at the Earth's surface), then g

= (3.27)10-8 sec-1

= 1.031 years-1

and all distances are in units of light-seconds. To show the implications of these formulas, suppose a space traveler moves away from the Earth with a constant proper acceleration of 1g for a period of T years as measured on Earth. He then reverses his acceleration, coming to rest after another T years has passed on Earth, and then continues his constant Earthward acceleration for another T Earthyears, at which point he reverses his acceleration again and comes to rest back at the Earth in another T Earth-years. The total journey is completed in 4T Earth-years, and it consists of 4 similar hyperbolic segments as illustrated below.

There are several questions we might ask about this journey. First, how far away from Earth does the traveler reach at his furthest point? This occurs at point C, which is at 2T according to Earth time, when the traveler's acceleration brings him momentarily to rest with respect to the Earth. To answer this question, recall that τ can be expressed as a function of t by

Now, the maximum distance from Earth is twice the distance at point B, when t = T, so we have

The maximum speed of the traveler in terms of the Earth's inertial coordinates occurs at point B, where t = T (and again at point D, where t = 3T), and so is given by

The total elapsed proper time for the traveler during the entire journey out and back, which takes 4T years according to Earth time, is 4 times the lapse of proper time to point B at t = T, so it is given by

So far we have focused mainly on a description of events in terms of the Earth's inertial coordinates x and t, but we can also describe the same events in terms of coordinate systems associated with the accelerating traveler. At any given instant the traveler is momentarily at rest with respect to a system of inertial coordinates, so we can define "proper" time and space measurements in terms of these coordinates. However, when we differentiate these time and space intervals as the traveler progresses along his worldline, we will find that new effects appear, due to the fact that the coordinate system itself is changing. As the traveler accelerates he continuously progresses from one system of momentarily co-moving inertial coordinates to another, and the effect of this change in the coordinates will show up in any derivatives that we take with respect to the time and space components. For example, suppose we ask how fast the Earth is moving relative to the traveler. This question can be interpreted in different ways. With respect to the traveler's momentarily co-moving inertial coordinates, the Earth's velocity is equal and opposite to the traveler's velocity with respect to the Earth's inertial coordinates. However, this quantity does not equal the derivative of the proper distance with respect to the proper time. The proper distance s from the Earth in terms of the traveler's momentarily co-moving inertial coordinates at the proper time τ is

which shows that the proper distance approaches a constant 1/g (about 1 light-year) as τ increases. This shouldn't be surprising, because we've already seen that the traveler's proper distance from a fixed point on the other side of the Earth actually is constant and equal to 1/g throughout the period of constant proper acceleration. The derivative of the proper distance of the Earth with respect to the proper time is

This can be regarded as a kind of velocity, since it represents the proper rate of change of the proper distance from the Earth as the traveler accelerates away. A plot of this function as τ varies from 0 to 6 years is shown below.

Initially the proper distance from the Earth increases as the traveler accelerates away, but eventually (if the constant proper acceleration is maintained for a sufficiently long time) the "length contraction" effect of his increasing velocity becomes great enough to cause the derivative to drop off to zero as the proper distance approaches a constant 1/g. To find the point of maximum ds/dτ we differentiate again with respect to τ to give

Setting this to zero, we see that the maximum occurs at , and substituting this into the expression for ds/dτ gives the maximum value of 1/2. Thus the derivative of proper distance from Earth with respect to proper time during a constant 1g acceleration away from the Earth reaches a maximum of half the speed of light at a proper time of about 0.856 years, after which is drops to zero.

Similarly, the traveler's proper distance S from the turnaround point is given by

The derivative of this with respect to the traveler's proper time is

A plot of this "velocity" is shown below for the first quartile leg of a journey as described above with T = 20 years.

The magnitude of this "velocity" increases rapidly at the start of the acceleration, due to the combined effects of the traveler's motion and the onset of "length contraction", but if allowed to continue long enough the "velocity" drops off and approaches 2 (i.e., twice the speed of light) at the point where the traveler reverses his acceleration. Of course, the fact that this derivative exceeds c does not conflict with the fact that c is an upper limit on velocities with respect to inertial coordinate systems, because S and τ do not constitute inertial coordinates. To find the extreme point on this curve we differentiate again with respect to τ, which gives

Consequently we see that the extreme value occurs (assuming the journey is long enough and the acceleration is great enough) at the proper time value of dS/dτ is

, where the

By symmetry, these same two characteristics apply to all four of the "quadrants" of the traveler's journey, with the appropriate changes of sign and direction. The figure below shows the proper distances s(t) and S(t) (i.e., the distances from the origin and the destination respectively) during the first two quadrants of a journey with T = 6.

By symmetry we see that the portions of these curves to the right of the mid-point can be generated from the relation s(τ) = S(τC  τ). Also, it's obvious that

If we consider journeys with non-constant proper accelerations, it's possible to construct some slightly peculiar-sounding scenarios. For example, suppose the traveler accelerates in such a way that his velocity is 1  exp(-kt) for some constant k. It follows that the distance in the Earth's frame at time t is [kt + exp(-kt)  1]/k, so the distance in the traveler's frame is

This function initially increases, then reaches a maximum, and then asymptotically approaches zero. With k = 1 year-1 the maximum occurs at roughly 3 years and a distance of about 0.65 light-years (relative to the traveler's frame). Thus we have the seemingly paradoxical situation that the Earth "becomes closer" to the traveler as he moves further away. This is not as strange as it may sound at first. Suppose we leave home and drive for 1 hour at a constant speed of 20 mph. We could then say that we are "1 hour from home". Now suppose we suddenly accelerate to 40 mph. How far (in time) are we away from home? If we extrapolate our current worldline back in time, we are only 1/2 hour from home. If we speed up some more, our "distance" (in terms of time) from home becomes less and less. Of course, we have to speed up at a rate that more than compensates for the increasing road distance, but that's not hard to do (in theory). The only difference between this scenario and the relativistic one is that when we accelerate to relativistic speeds both our time and our space axes are affected, so when we extrapolate our current frame of reference back to Earth we find that both the time and the distance are shortened. Another interesting acceleration profile is the one that results from a constant nozzle velocity u and constant exhaust mass flow rate w = dm0/dτ, where τ is the proper time of the rocket, the effective force is uw throughout the acceleration. This does not result in constant proper acceleration, because the rest mass of the rocket is being reduced while the applied proper force remains constant. In this case we have

where t is the time of the initial coordinates and v is the velocity of the rocket with respect to those coordinates. Also, we have m0(τ) = m0(0)  w τ, so we can integrate to get the speed

Letting ρ(τ) denote the ratio [m(0)  w τ]/m(0), which is the ratio of rest masses at the start of the acceleration to the rest mass at proper time τ, the result is

so we have

Also, since dt = dτ / , we can integrate this to get the coordinate time t as a function of the rocket's proper time

In the limit as the nozzle velocity u approaches 1, this expression reduces to

It's interesting that for photonic propulsion (u=1) the mass ratio r is identical to the Doppler frequency shift of the exhaust photons relative to the original rest frame, i.e., we have

Thus if the rocket continues to convert its own mass to energy and eject it as photons of a fixed frequency, the energy of each photon as seen from the fixed point of origin is exactly proportional to the rest mass of the rocket at the moment when the photon was ejected. Also, since r(t) is the current rest mass m0(t) divided by the original rest mass m0(0), and since the inertial mass m(t) is related to the rest mass m0(t) by the equation m(t) = m0(t) / , we find that the inertial mass m(t) of the rocket is given as a function of the rocket's velocity v by the equation

Thus we find that as the rocket's velocity goes to 1 at the moment when it is converting the last of its rest mass into energy, so its rest mass is going to zero, its inertial mass goes to m0(0)/2, i.e., exactly half of the rocket's original rest mass. This is to be expected, because momentum must be conserved, and all the photons except that very last have been ejected in the rearward direction at the speed of light, leaving only the last remaining photon (which has nothing to react against) moving in the forward direction, so it must have momentum equal to all the rearward momentum of the ejected photons. The momentum of a photon is p = hν/c = E/c, so in units with c = 1 we have p = E. The original energy content of the rocket was it's rest mass, m0(0), which has been entirely

converted to energy, half in the forward direction (in the last remaining super-energetic photon) and half in the rearward direction (the progressively more redshifted stream of exhaust photons). The preceding discussion focused on purely linear motion, but we can just as well consider arbitrary accelerated paths. It's trivial to determine the lapse of proper time along any given timelike path as a function of an inertial time coordinate simply by integrating dτ over the path, but it's a bit more challenging to express the lapse of proper time along one arbitrary worldline with respect to the lapse of proper time along another, because the appropriate correspondence is ambiguous. Perhaps the most natural correspondence is given by mapping the proper time along the reference worldline to the proper time along the subject worldline by means of the instantaneously co-moving planes of inertial simultaneity of the reference worldline. In other words, to each point along the reference worldline we can assign a locus of simultaneous points based on co-moving inertial coordinates at that point, and we can then find the intersections of these loci with the subject worldline. Quantitatively, suppose the reference worldline W1 is given parametrically by the functions x1(t), y1(t), z1(t) where x,y,z,t are inertial coordinates. From this we can determine the derivatives = dx1/dt, = dy1/dt, and = dz1/dt. These also represent the components of the gradient of the space of simultaneity of the instantaneously comoving inertial frame of the object. In other words, the spaces of simultaneity for W1 have the partial derivatives

These enable us to express the total differential time as a function of the differentials of the spatial coordinates

If the subject worldline W2 is expressed parametrically by the functions x2(t), y2(t), z2(t), and if the inertial plane of simultaneity of the event at coordinate time t1 on W1 is intersected by W2 at the coordinate time t2, then the difference in coordinate times between these two events can be expressed in terms of the differences in their spatial coordinates by substituting into the above total differential the quantities dt = t2t1, dx = x2(t2)x1(t1) and so on. The result is

where the derivatives of x1, y1, and z1 are evaluated at t1. Rearranging terms and omitting the indications of functional dependence for the W1 coordinates, this can be written in the form

This is an implicit formula for the value of t2 on W2 corresponding to t1 on W1 based on the instantaneous inertial simultaneity of W1. Every quantity in this equation is an explicit function of either t1 or t2, so we can solve for t2 to give a function F1 such that t2 = F1(t1). We can also integrate the absolute intervals along the two worldlines to give the functions f1 and f2 which relate the proper times along W1 and W2 to the coordinate time, i.e., we have τ1 = f1(t) and τ2 = f2(t). With these substitutions we arrive at the general form of the expression for τ2 with respect to τ1:

To illustrate, suppose W1 is the worldline of a particle moving along some arbitrary path and W2 is just the worldline of the spatial origin of the inertial coordinates. In this case we have x2 = y2 = z2 = 0 and τ2 = t2, so the above formula reduces to

where r and v are the position and velocity vectors of W1 with respect to the inertial rest coordinates of W2. Differentiating with respect to t1, and multiplying through by dt1/dτ1 = (1v2)-1/2, we get

where a is the acceleration vector and θ is the angle between the r and a vectors. Thus if the acceleration of W1 is zero, we have dτ2/dτ1 = (1v2)1/2. On the other hand, if W2 is moving around W1 in a circle at constant speed, we have a = -v2/r and the position and acceleration vectors are perpendicular, giving the result dτ2/dτ1 = (1v2)-1/2. This is consistent with the fact that, if the object is moving tangentially, the plane of simultaneity for its instantaneously co-moving inertial coordinate system intersects with the constant-t plane along the line from the object to the origin, and hence the time difference is entirely due to the transverse dilation (i.e., the square root of 1v2 factor). If the speed v of W1 is constant, then we have the explicit equation

To illustrate, suppose the object whose worldline is W2 begins at the origin at t = 0 and thereafter moves counter-clockwise in a circle tangent to the origin in the xy plane with a constant angular velocity ω as illustrated below.

In this case the object's spatial coordinates and their derivatives as a function of coordinate time are

Substituting into the equation for τ2 and replacing each appearance of t with gives the result

This is the proper time of the spatial origin according to the instantaneous time slices of the moving object's proper time. This function is plotted below with R = 1 and v = 0.8. Also shown is the stable component

.

Naturally if the circle radius R goes to infinity the value of the sine function approaches the argument, and so the above expression reduces to

This confirms the reciprocity between the two worldlines when both are inertial. We can also differentiate the full expression for τ2 as a function of τ1 to give the relation between the differentials

This relation is plotted in the figure below, again for R = 1 and v = 0.8.

It's also clear from this expression that as R goes to infinity the cosine approaches 1, and we again have

.

Incidentally, the above equation shows that the ratio of time rates equals 1 when the moving object is a circumferential distance of

from the point of tangency. Hence, for small velocities v the configuration of "equal time rates" occurs when the moving object is at π/3 radians from the point of tangency. On the other hand, as v approaches 1, the configuration of equal time rates occurs when the moving object approaches the point of tangency. This may seem surprising at first, because we might expect the proper time of the origin to be dilated with respect to the proper time of the tangentially moving object. However, the planes of simultaneity of the moving object are tilting very rapidly in this condition, and this offsets the usual time dilation factor. As v approaches 1, these two effects approach equal magnitude, and cancel out for a location approaching the point of tangency.

2.10 The Starry Messenger “Let God look and judge!” Cardinal Humbert, 1054 AD Maxwell's equations are very successful at describing the propagation of light based on the model of electromagnetic waves, not only in material media but also in a vacuum, which is considered to be a region free of material substances. According to this model, light propagates in vacuum at a speed

, where µ0 is the permeability

constant and ε0 is the permittivity of the vacuum, defined in terms of Coulombs law for electrostatic force

The SI system of units is defined so that the permeability constant takes on the value µ0 = 4π10-7 tesla meter per ampere, and we can measure the value of the permittivity (typically by measuring the capacitance C between parallel plates of area A separated by a distance d, using the relation ε0 = Cd/A) to have the value ε0 = (8.854187818)10-12 coulombs2 per newton meters2. This leads to the familiar value

for the speed of light in a vacuum. Of course, if we place some substance between our capacitors when determining ε0 we will generally get a different value, so the speed of light is different in various media. This leads to the index of refraction of various transparent media, defined as n = cvacuum / cmedium. Thus Maxwell's theory of electromagnetism seems to clearly imply that the speed of propagation of such electromagnetic waves depends only on the medium, and is independent of the speed of the source. On the other hand, it also suggests that the speed of light depends on the motion of the medium, which is easy to imagine in the case of a material medium like glass, but not so easy if the "medium" is the vacuum of empty space. How can we even assign a state of motion to the vacuum? In struggling to answer this question, people tried to imagine that even the vacuum is permeated with some material-like substance, the ether, to which a definite state of motion could be assigned. On this basis it was natural to suppose that Maxwell's equations were strictly applicable (and the speed of light was exactly c) only with respect to the absolute rest frame of the ether. With respect to other frames of reference they expected to find that the speed of light differed, depending on the direction of travel. Likewise we would expect to find corresponding differences and anisotropies in the capacitance of the vacuum when measured with plates moving at high speed relative to the ether. However, when extremely precise interferometer measurements were carried out to find a directional variation in the speed of light on the Earth's surface (presumably moving through the ether at fairly high speed due to the Earth's rotation and its orbital motion around the Sun), essentially no directional variation in light speed was found that could be attributed to the motion of the apparatus through the ether. Of course, it had occurred to people that the ether might be "dragged along" by the Earth, so that objects on the Earth's surface are essentially at rest in the local ether. However, these "convection" hypotheses are inconsistent with other observed phenomena, notably the aberration of starlight, which can only be explained in an ether theory if it is assumed that an observer on the Earth's surface is not at rest with respect to the local ether. Also, careful terrestrial

measurements of the paths of light near rapidly moving massive objects showed no sign of any "convection". Considering all this, the situation was considered to be quite puzzling. There is a completely different approach that could be taken to modeling the phenomena of light, provided we're willing to reject Maxwell's theory of electromagnetic waves, and adopt instead a model similar to the one that Newton often seemed to have in mind, namely, an "emission theory". One advocate of such a theory early in the early 1900's was Walter Ritz, who rejected Maxwell's equations on the grounds that the advanced potentials allowed by those equations were unrealistic. Ritz debated this point with Albert Einstein, who argued that the observed asymmetry between advanced and retarded waves is essentially statistical in origin, due to the improbability of conditions needed to produce coherent advanced waves. Neither man persuaded the other. (Ironically, Einstein himself had already posited that Maxwell's equations were inadequate to fully represent the behavior of light, and suggested a model that contains certain attributes of an emission theory to account for the photo-electric effect, but this challenge to Maxwell's equations was on a more subtle and profound level than Ritz's objection to advanced potentials.) In place of Maxwell's equations and the electromagnetic wave model of light, the advocates of "emission theories" generally assume a Galilean or Newtonian spacetime, and postulate that light is emitted and propagates away from the source (perhaps like Newtonian corpuscles) at a speed of c relative to the source. Thus, according to emission theories, if the source is moving directly toward or away from us with a speed v, then the light from that source is approaching us with a speed c+v or cv respectively. Naturally this class of theories is compatible with experiments such as the one performed by Michelson and Morley, since the source of the light is moving along with the rest of the apparatus, so we wouldn't expect to find any directional variation in the speed of light in such experiments. Also, an emission theory of light is compatible with stellar aberration, at least up to the limits of observational resolution. In fact, James Bradley (the discoverer of aberration) originally explained it on this very basis. Of course, even an emission theory must account for the variations in light speed in different media, which means it can't simply say that the speed of light depends only on the speed of the source. It must also be dependent on the medium through which it is traveling, and presumably it must have a "terminal velocity" in each medium, i.e., a certain characteristic speed that it can maintain indefinitely as it propagates through the medium. (Obviously we never see light come to rest, nor even do we observe noticeable "slowing" of light in a given medium, so it must always exhibit a characteristic speed.) Furthermore, based on the principles of an emission theory, the medium-dependent speed must be defined relative to the rest frame of the medium. For example, if the characteristic speed of light in water is cw, and a body of water is moving relative to us with a speed v, then (according to an emission theory) the light must move with a speed cw + v relative to us when it travels for some significant distance through that water, so that it has reached its "steady-state" speed in the water. In optics

this distance is called the "extinction distance", and it is known to be proportional to 1/(ρ λ), where ρ is the density of the medium and λ is the wavelength of light. The extinction distance for most common media for optical light is extremely small, so essentially the light reaches its steady-state speed as soon as it enters the medium. An experiment performed by Fizeau in 1851 to test for optical "convection" also sheds light on the viability of emission theories. Fizeau sent beams of light in both directions through a pipe of rapidly moving water to determine if the light was "dragged along" by the water. Since the refractive index of water is about n = c/cw = 1.33 where cw is the speed of light in water, we know that cw equals c/1.33, which is about 75% of the speed of light in a vacuum. The question is, if the water is in motion relative to us, what is the speed (relative to us) of the light in the water? If light propagates in an absolutely fixed background ether, and isn't dragged along by the water at all, we would expect the light speed to still be cw relative to the fixed ether, regardless of how the water moves. This is admittedly a rather odd hypothesis (i.e., that light has a characteristic speed in water, but that this speed is relative to a fixed background ether, independent of the speed of the water), but it is one possibility that can't be ruled out a priori. In this case the difference in travel times for the two directions would be proportional to

which implies no phase shift in the interferometer. On the other hand, if emission theories are right, the speed of the light in the water (which is moving at the speed v) should be cw+v in the direction of the water's motion, and cwv in the opposite direction. On this basis the difference in travel times would be proportional to

This is a very small amount (remembering that cw is about 75% of the speed of light in a vacuum), but it is large enough that it would be measurable with delicate interferometry techniques. The results of Fizeau's experiment turned out to be consistent with neither of the above predictions. Instead, he found that the time difference (proportional to the phase shift) was a bit less than 43.5% of the prediction for an emission theory (i.e., 43.5% of the prediction based on the assumption of complete convection). By varying the density of the fluid we can vary the refractive index and therefore cw, and we find that the measured phase shift always indicates a time difference of (1cw2) times the prediction of the emission theory. For water we have cw = 0.7518, so the time lag is (1cw2) = 0.4346 of the emission theory prediction.

This implies that if we let S(cw,v) and S(cw,v) denote the speeds of light in the two directions, we have

By partial fraction decomposition this can be written in the form

where

Also, in view of the symmetry S(u,v) = S(v,u), we can swap cw with v to give

Solving these last two equations for A and B gives A = 1  vcw and B = 1 + vcw, so the function S is

which of course is the relativistic formula for the composition of velocities. So, even if we rejected Maxwell's equations, it still appears that emission theories cannot be reconciled with Fizeau's experimental results. More evidence ruling out simple emission theories comes from observations of a supernova made by Chinese astronomers in the year 1054 AD. When a star explodes as a supernova, the initial shock wave moves outward through the star's interior in just seconds, and elevates the temperature of the material to such a high level that fusion is initiated, and much of the lighter elements are fused into heavier elements, including some even heavier than iron. (This process yields most of the interesting elements that we find in the world around us.) Material is flung out at high speeds in all directions, and this material emits enormous amounts of radiation over a wide range of frequencies, including x-rays and gamma rays. Based on the broad range of spectral shifts (resulting from the Doppler effect), it's clear that the sources of this radiation have a range of speeds relative to the Earth of over 10000 km/sec. This is because we are receiving light emitted by some material that was flung out from the supernova in the direction away from the Earth, and by other material that was flung out in the direction toward the Earth. If the supernova was located a distance D from us, then the time for the "light" (i.e., EM radiation of all frequencies) to reach us should be roughly D/c, where c is the speed of

light. However, if we postulate that the actual speed of the light as it travels through interstellar space is affected by the speed of the source, and if the source was moving with a speed v relative to the Earth at the time of emission, then we would conclude that the light traveled at a speed of c+v on it's journey to the Earth. Therefore, if the sources of light have velocities ranging from -v to +v, the first light from the initial explosion to reach the Earth would arrive at the time D/(c+v), whereas the last light from the initial explosion to reach the Earth would arrive at D/(c-v). This is illustrated in the figure below.

Hence the arrival times for light from the initial explosion event would be spread out over an interval of length D/(cv)  D/(c+v), which equals (D/c)(2v/c) / (1(v/c)2). The denominator is virtually 1, so we can say the interval of arrival times for the light from the explosion event of a supernova at a distance D is about (D/c)(2v/c), where v is the maximum speed at which radiating material is flung out from the supernova. However, in actual observations of supernovae we do not see this "spreading out" of the event. For example, the Crab supernova was about 6000 light years away, so we had D/c = 6000 years, and with a range of source speeds of 10000 km/sec (meaning v = 5000) we would expect a range of arrival times of 200 years, whereas in fact the Crab was only bright for less than a year, according to the observations recorded by Chinese astronomers in July of 1054 AD. For a few weeks the "guest star", as they called it, in the constellation Taurus was the brightest star in the sky, and was even visible in the daytime for twenty-six days. Within two years it had disappeared completely to the naked eye. (It was not visible in Europe or the Islamic countries, since Taurus is below the horizon of the night sky in July for northern latitudes.) In the time since the star went supernova the debris has expanded to it's present dimensions of about 3 light years, which implies that this material was moving at only (!) about 1/300 the speed of light. Still, even with this value of v, the bright explosion event should have been visible on Earth for about 40 years (if the light really moved through space at c  v). Hence we can conclude that the light actually propagated through space at a speed essentially independent of the speed of the sources. However, although this source independence of light speed is obviously consistent with Maxwell's equations and special relativity, we should be careful not to read too much into it. In particular, this isn't direct proof that the speed of light in a vacuum is independent of the speed of the source, because for visible light (which is all that was noted on Earth in

July of 1054 AD) the extinction distance in the gas and dust of interstellar space is much less than the 6000 light year distance of the Crab nebula. In other words, for visible light, interstellar space is not a vacuum, at least not over distances of many light years. Hence it's possible to argue that even if the initial speed of light in a vacuum was c+v, it would have slowed to c for most of its journey to Earth. Admittedly, the details of such a counter-factual argument are lacking (because we don't really know the laws of propagation of light in a universe where the speed of light is dependent on the speed of the source, nor how the frequency and wavelength would be altered by interaction with a medium, so we don't know if the extinction distance is even relevant), but it's not totally implausible that the static interstellar dust might affect the propagation of light in such a way as to obscure the source dependence, and the extinction distance seems a reasonable way of quantifying this potential effect. A better test of the source-independence of light speed based on astronomical observations is to use light from the high-energy end of the spectrum. As noted above, the extinction distance is proportional to 1/(ρλ). For some frequencies of x-rays and gamma rays the extinction distance in interstellar space is about 60000 light years, much greater than the distances to many supernova events, as well as binary stars and other configurations with identifiable properties. By observing these events and objects it has been found that the arrival times of light are essentially independent of frequency, e.g., the x-rays associated with a particular identifiable event arrive at the same time as the visible light for that event, even though the distance to the event is much less than the extinction distance for x-rays. This gives strong evidence that the speed of light in a vacuum is actually invariant and independent of the motion of the source. With the aid of modern spectroscopy we can now examine supernovae events in detail, and it has been found that they exhibit several characteristic emission lines, particularly the signature of atomic hydrogen at 6563 angstroms. Using this as a marker we can determine the Doppler shift of the radiation, from which we can infer the speed of the source. The energy emitted by a star going supernova is comparable to all the energy that it emitted during millions or even billions of years of stable evolution. Three main categories of supernovae have been identified, depending on the mass of the original star and how much of its "nuclear fuel" remains. In all cases the maximum luminosity occurs within just the first few days, and drops by 2 or 3 magnitudes within a month, and by 5 or 6 magnitudes within a year. Hence we can conclude that the light actually propagated through empty space at a speed essentially independent of the speed of the sources. Another interesting observation involving the propagation of light was first proposed in 1913 by DeSitter. He wondered whether, if we assume the speed of light in a vacuum is always c with respect to the source, and if we assume a Galilean spacetime, we would notice anything different in the appearances of things. He considered the appearance of binary star systems, i.e., two stars that orbit around each other. More than half of all the visible stars in the night sky are actually double stars, i.e., two stars orbiting each other, and the elements of their orbits may be inferred from spectroscopic measurements of their radial speeds as seen from the Earth. DeSitter's basic idea was that if two stars are orbiting each other and we are observing them from the plane of their mutual orbit, the

stars will be sometimes moving toward the Earth rapidly, and sometimes away. According to an emission theory this orbital component of velocity should be added to or subtracted from the speed of light. As a result, over the hundreds or thousands of years that it takes the light to reach the Earth, the arrival times of the light from approaching and receding sources would be very different. Now, before we go any further, we should point out a potential difficulty for this kind of observation. The problem (again) is that the "vacuum" of empty space is not really a perfect vacuum, but contains small and sparse particles of dust and gas. Consequently it acts as a material and, as noted above, light will reach it's steady-state velocity with respect to that interstellar dust after having traveled beyond the extinction distance. Since the extinction distance for visible light in interstellar space is quite short, the light will be moving at essentially c for almost its entire travel time, regardless of the original speed. For this reason, it's questionable whether visual observations of celestial objects can provide good tests of emission theory predictions. However, once again we can make use of the high-frequency end of the spectrum to strengthen the tests. If we focus on light in the frequency range of, say, x-rays and gamma rays, the extinction distance is much larger than the distances to many binary star systems, so we can carry out DeSitter's proposed observation (in principle) if we use x-rays, and this has actually been done by Brecher in 1977. With the proviso that we will be focusing on light whose extinction distance is much greater than the distance from the binary star system to Earth (making the speed of the light simply c plus the speed of the star at the time of emission), how should we expect a binary star system to appear? Let's consider one of the stars in the binary system, and write its coordinates and their derivatives as

where D is the distance from the Earth to the center of the binary star system, R is the radius of the star's orbit about the system's center, and w is the angular speed of the star. We also have the components of the emissive light speed c2 = cx2 + cy2 In these terms we can write the components of the absolute speed of the light emitted from the star at time t:

Now, in order to reach the Earth at time T the light emitted at time t must travel in the x

direction from x(t) to 0 at a speed of direction. Hence we have

for a time Δt = Tt, and similarly for the y

Substituting for x, y, and the light speed derivatives

,

, we have

Squaring both sides of both equations, and adding the resulting equations together, gives

Re-arranging terms gives the quadratic in Δt

If we define the normalized parameters

then the quadratic in Δt becomes

Solving this quadratic for Δt = Tt and then adding t to both sides gives the arrival time T on Earth as a function of the emission time t on the star

If the star's speed v is much less than the speed of light, this can be expressed very nearly as

The derivative of T with respect to t is

and this takes it's minimum value when t = 0, where we have

Consequently we find the DeSitter effect, i.e., dT/dt goes negative if d > r / v2. Now, we know from Kepler's third law (which also applies in relativistic gravity with the appropriate choice of coordinates) that m = r3 w2 = r v2, so we can substitute m/r for v2 in our inequality to give the condition d > r2 / m. Thus if the distance of the binary star system from Earth exceeds the square of the system's orbital radius divided by the system's mass (in geometric units) we would expect DeSitter's apparitions - assuming the speed of light is c  v. As an example, for a binary star system a distance of d = 20000 light-years away, with an orbital radius of r = 0.00001 light-years, and an orbital speed of v = 0.00005, the arrival time of the light as a function of the emission time is as shown below:

This corresponds to a star system with only about 1/6 solar mass, and an orbital radius of about 1.5 million kilometers. At any given reception time on Earth we can typically "see" at least three separate emission events from the same star at different points in its orbit. These ghostly apparitions are the effect that DeSitter tried to find in photographs of many binary star systems, but none exhibited this effect. He wrote The observed velocities of spectroscipic doubles are as a matter of fact

satisfactorily represented by a Keplerian motion. Moreover in many cases the orbit derived from the radial velocities is confirmed by visual observations (as for δ Equuli, ζ Herculis, etc.) or by eclipse observations (as in Algol variables). We can thus not avoid the conclusion [that] the velocity of light is independent of the motion of the source. Ritz’s theory would force us to assume that the motion of the double stars is governed not by Newton’s law, but by a much more complicated law, depending on the star’s distance from the earth, which is evidently absurd. Of course, he was looking in the frequency range of visible light, which we've noted is subject to extinction. However, in the x-ray range we can (in principle) perform the same basic test, and yet we still find no traces of these ghostly apparitions in binary stars, nor do we ever see the stellar components going in "reverse time" as we would according to the above profile. (Needless to say, for star systems at great distances it is not possible to distinguish the changes in transverse positions but, as noted above, by examining the Doppler shift of the radial components of their motions we can infer the motions of the individual bodies.) Hence these observations support the proposition that the speed of light in empty space is essentially independent of the speed of the source. In comparison, if we take the relativistic approach with constant light speed c, independent of the speed of the source, an analysis similar to the above gives the approximate result

whose derivative is

which is always positive for any v less than 1. This means we can't possibly have images arriving in reverse time, nor can we have any multiple appearances of the components of the binary star system. Regarding this subject, Robert Shankland recalled Einstein telling him (in 1950) that he had himself considered an emission theory of light, similar to Ritz's theory, during the years before 1905, but he abandoned it because he could think of no form of differential equation which could have solutions representing waves whose velocity depended on the motion of the source. In this case the emission theory would lead to phase relations such that the propagated light would be all badly "mixed up" and might even "back up on itself". He asked me, "Do you understand that?" I said no, and he carefully repeated it all. When he came to the "mixed up" part, he waved his hands before his face and laughed, an open hearty laugh at the idea! 2.11 Thomas Precession

At the first turning of the second stair I turned and saw below The same shape twisted on the banister Under the vapour in the fetid air Struggling with the devil of the stairs who wears The deceitful face of hope and of despair. T. S. Eliot, 1930

Consider a slanted rod AB in the xy plane moving at speed u in the positive y direction as indicated in the left-hand figure below. The A end of the rod crosses the x axis at time t = 0, whereas the B end does not cross until time t = 1. Hence we conclude that the rod is oriented at some non-zero angle with respect to the xyt coordinate system. However, suppose we view the same situation with respect to a system of inertial coordinates x'y't' (with x' parallel to x) moving in the positive x direction with speed v. In accord with special relativity, the x' and t' axes are skewed with respect to the x and t axes as shown in the right-hand figure below.

As a result of this skew, the B end of the rod crosses the x' axis at the same instant (i.e., the same t') as does the A end of the rod, which implies that the rod is parallel to the x' axis - and therefore to the x axis - based on the simultaneity of the x'y't' inertial frame. This implies that if a rod was parallel to the x axis and moving in the positive x direction with speed v, it would be perfectly aligned with the rod AB as the latter passed through the x' axis. Thus if a rod is initially aligned with the x axis and moving with speed v in the positive x direction relative to a given fixed inertial frame, and then at some instant with respect to the rod's inertial rest frame it instantaneously changes course and begins to move purely in the positive y direction, without ever changing its orientation, we find that its orientation does change with respect to the original fixed frame of reference. This is because the changes in the states of motion of the individual parts of the rod do not occur simultaneously with respect to the original rest frame. In general, whenever we transport a vector, always spatially parallel to itself in its own instantaneous rest frame, over an accelerated path, we find that its orientation changes relative to any given fixed inertial frame. This is the basic idea behind Thomas precession, named after Llewellyn Thomas, who first wrote about it in 1927. For a simple

application of this phenomenon, consider a particle moving around a circular path. The particle undergoes continuous acceleration, but at each instant it is at rest with respect to the momentarily co-moving inertial frame. If we consider the "parallel transport" of a vector around the continuous cycle of momentary inertial rest frames of the particle, we find that the vector does not remain fixed. Instead, it "precesses" as we follow it around the cycle. This relativistic precession (which has no counter-part in non-relativistic physics) actually has observable consequences in the behavior of sub-atomic particles (see below). To understand how the Thomas precession for simple circular motion can be deduced from the basic principles of special relativity, we can begin by supposing the circular path of a particle is approximated by an n-sided polygon, and consider the transition from one of these sides to the next, as illustrated below.

Let v denote the circumferential speed of the particle in the counter-clockwise direction, and note that α = 2π/n for an arbitrary n-sided regular polygon. (In the drawing above we have set n = 8). The dashed lines represent the loci of positions of the spatial origins of two inertial frames K' and K" that are co-moving with the particle on consecutive edges. Now suppose the vector ab at rest in K' makes an angle θ1 with respect to the x axis (in terms of frame K), and suppose the vector AB at rest in K" makes an angle of θ2 with respect to the x axis. The figure below shows the positions of these two vectors at several consecutive instants of the frame K.

Clearly if θ1 is not equal to θ2, the two vectors will not coincide at the instant when their origins coincide. However, this assumes we use the definition of simultaneity associated with the inertial coordinate system K (i.e., the rest system of the polygon). The system K' is moving in the positive x direction at the speed v, so its time-slices are skewed relative to those of the polygon's frame of reference. Because of this skew, it is possible for the vectors ab and AB to be parallel with respect to K' even though they are not parallel with respect to K. The equations of the moving vectors ab and AB are easily seen to be

This confirms that at t = 0 (or at any fixed t) these lines are not parallel unless θ1 = θ2. However, if we substitute from the Lorentz transformation between the frames K and K'

where

, the equations of the moving vectors become

At t' = 0 these equations reduce to

In the limit as the number n of sides of the polygon increases and the angle α approaches zero, the value of cos(α) approaches 1 (to the second order), and the value of sin(α) approaches α. Hence the equations of the two moving vectors approach

Setting these equal to each other, multiplying through by γ/x', and re-arranging, we get the condition

Recalling the trigonometric identity

and noting that θ1 approaches θ2 in the limit as α goes to zero, the right-hand factor on the right side can be taken as

where θ is the limiting value of both θ1 and θ2 as α goes to zero. Making use of these substitutions, and also noting that tan(θ2  θ1) approaches θ2  θ1, the condition for the two families of lines to be parallel with respect to frame K' (in the limit as α goes to zero) is

This is the amount by which the two vectors are skewed with respect to the K frame due to the transition around a single vertex of the polygon, given that the transported vector makes an angle θ with the edge leading into the vertex. The total precession resulting from one complete revolution around the n-sided polygon is n times the mean value of θ2  θ1 for each of the n vertices of the polygon. Since n = 2π/α, we can express the total precession as

If the circumferential speed v is small compared with 1, the denominator of this expression is closely approximated by 1, and the transported vector changes its absolute orientation only very slightly on one revolution. In this case it follows that θ varies essentially uniformly from 0 to 2π as the vector is transported around the circle. Hence for small v the total precession for one revolution is given closely by

On the other hand, if v is not small, we can consider the general situation illustrated below:

The variable ϕ signifies the absolute angular position of the transported vector at any given time, and β signifies the vector's orientation relative to the positive y axis. As before, θ denotes the angle of the vector relative to the local tangent "edge". We have the relations

We also have the following identifications involving the parameters ϕ and β:

Substituting dϕ + dθ for dβ and re-arranging, we get

This can be integrated explicitly to give ϕ as a function of θ. Since β equals ϕ + θ, we can also give β as a function of θ, leading to the parametric equations

where . One complete "branch" is given by allowing θ to range from π/2 to π/2, giving the angle ϕ from γπ/2 to γπ/2, and the angles β from (π/2)(1γ) to (π/2)(1γ). This is shown in the figure below.

Consequently, a full cycle of ϕ corresponds to 2/γ times the above range, and so the average change in β per revolution (i.e., per 2π increase in ϕ) is

This function is plotted in the figure below, along with the "small v" approximation.

For all v less than 1 we can expand the general expression into a series

These expressions represent the average change per revolution, because the cycles of β do not in general coincide with the cycles of ϕ. Resonance occurs when the ratio of the change in ϕ to the change in β is rational. This is true if and only if there exist integers M,N such that

Adding 1 to both sides, we can set 1 + (M/N) equal to m/n for integers m and n, and we can then square both sides and re-arrange to give, we find that the "resonant" values of v are given by

where m,n are integers with |n| less than |m|. We previously derived the low-speed approximation of the amount Thomas precession

for a vector subjected to "parallel transport" around a circle with a constant circumferential speed v in the form πv2 radians per revolution. Dividing this by 2π gives the average precession rate of v2/2 in units of radians per radian (of travel around the circle). We can also determine the average rate of Thomas precession, with units of radians per second. Letting ωο denote the orbital angular velocity (i.e., the angular velocity with which the vector is transported around the circle of radius r), we have v = ω 2 or and a = v /r where a is the centripetal acceleration. Hence we have ωo = v/r = a/v, so multiplying v2/2 by ωo gives the average Thomas precession rate ωT = va/2 in units of rad/sec, which represents a frequency of νT = (v2/2)νo = va/(4π cycles/sec. Since the magnitude πv2 of the Thomas precession is of the second order in v, we might be tempted to think it is insignificant for ordinary terrestrial phenomena, but the expression νT = (v2/2)νo shows that the precession frequency can be quite large in absolute terms, even if v is small, provided νo is sufficiently large. This occurs when the orbital radius r is very small, giving a very large acceleration for any given orbital velocity. Consider, for example, the orbit of an electron around the nucleus of an atom. An electron has intrinsic quantum "spin" which tends to maintain it's absolute orientation much as does a spinning gyroscope, so it can be regarded as a vector undergoing parallel transport. Now, according to the original (naive) Bohr model, the classical orbit of an electron around the nucleus is given by equating the Coulomb and centripetal forces

where e is the charge of an electron, m is the mass, ε0 is the permittivity of the vacuum, and N is the atomic number of the nucleus, so the linear and angular speeds of the electron are

Bohr hypothesized that the angular momentum L = mvr can only be an integer multiple of h/(2π), so we have for some positive integer n

Therefore, the linear velocity and orbital frequency of an electron (in this simplistic model) are

where α = e2/(2hε0) is the dimensionless "fine structure constant", whose value is approximately 1/137. (Remember that we are using units such that c = 1, so all distances are expressed in units of seconds.) For the lowest energy state of a hydrogen atom we have n = N = 1, so the linear speed of the electron is about 1/137. Consequently the precession frequency is (v2/2) = -0.00002664 times the orbital frequency, which is a very small fraction, but it is still a very large frequency in absolute terms (1.755E-11 cycles/sec) because the orbital frequency is so large. (Note that these are not the frequencies of photons emitted from the atom, because those correspond to quanta of light given off due to transitions from one energy level to another, whereas these are the theoretical orbital frequencies of the electron itself in Bohr's simple model.) Incidentally, there is a magnetic interaction between the electron and nucleus of some atoms that is predicted to cause the electron's spin axis to precess by +v2 radians per orbital radian, but the actual observed precession rate of the spin axes of electrons in such atoms is only +(v2/2). For awhile after its discovery, there was no known explanation for this discrepancy. Only in 1927 did Thomas point out that special relativity implies the purely kinematic relativistic effect that now bears his name, which (as we've seen) yields a precession of (v2/2) radians per orbital radian. The sum of this purely kinematic effect due to special relativity with the predicted effect due to the magnetic interaction yields the total observed +(v2/2) precession rate. It's often said that the relativistic effect supplies a "factor of 2" (i.e., divides by 2) to the electron's precession rate. For example, Uhlenbeck wrote that ...when I first heard about [the Thomas precession], it seemed unbelievable that a relativistic effect could give a factor of 2 instead of something of order v/c... Even the cognoscenti of relativity theory (Einstein included!) were quite surprised.

(Uhlenbeck also told Pais that he didn't understand a word of Thomas's work when it first came out.) However, this description is somewhat misleading, because (as we've seen) the relativistic effect is actually additive, not multiplicative. It just so happens that a particular magnetic interaction yields a precession of twice the frequency, and the opposite sign, as the Thomas precession, so the sum of the two effects is half the size of the magnetic effect alone. Both of the effects are second-order in the linear speed v/c. 3.1 Postulates and Principles Complex ideas may perhaps be well known by definition, which is nothing but an enumeration of those parts or simple ideas that comprise them. But when we have pushed up definitions to the most simple ideas, and find still some ambiguity and obscurity, what resources are we then possessed of? David Hume, 1748

As discussed in Section 1, even after stipulating the existence of coordinate systems with respect to which inertia is homogeneous and isotropic, there remains a fundamental amgibuity as to the character of the relationship between relatively moving inertial coordinate systems, corresponding to three classes of possible metrical structures, with the k values 1, 0, and +1. There is a remarkably close historical analogy for this

situation, dating back to one of the first formal systems of thought ever proposed. In Book I of The Elements, Euclid consolidated and systematized plane geometry as it was known circa 300 BC into a formal deductive system. As it has come down to us, it is based on five postulates together with several definitions and common notions. (It’s worth noting, however, that the classifications of these premises was revised many times in various translations.) The first four of these postulates are stated very succinctly 1. 2. 3. 4.

A straight line may be drawn from any point to any other point. A straight line segment can be uniquely and indefinitely extended. We may draw a circle of any radius about any point. All right angles are equal to one another.

Each of these assertions actually entails a fairly complicated set of premises and ambiguities, but they were accepted as unobjectionable for two thousand years. However, Euclid's final postulate was regarded with suspicion from earliest times. It has a very different appearance from the others - a difference that neither Euclid nor his subsequent editors and translators attempted to disguise. The fifth postulate is expressed as follows: 5. If a straight line falling on two straight lines makes the [sum of the] interior angles on the same side less than two right angles, then the two straight lines, if produced indefinitely, meet on that side on which the angles are less than two right angles.

This postulate is equivalent to the statement that there's exactly one line through a given point P parallel to a given line L, as illustrated below

Although this proposition is fairly plausible (albeit somewhat awkward to state), many people suspected that it might be logically deducible from the other postulates, axioms, and common notions. There were also attempts to substitute for Euclid's fifth postulate a simpler or more self-evident proposition. However, we now understand that Euclid's fifth postulate is logically independent of the rest of Euclid's logical structure. In fact, it's possible to develop logically consistent geometries in which Euclid's fifth postulate is false. For example, we can assume that there are infinitely many lines through P that are parallel to (i.e., never intersect) the line L. It might seem (at first) that it would be impossible to reason with such an assumption, that it would either lead to contradictions or else cause the system to degenerate into a logical triviality about which nothing interesting could be said, but, remarkably, this turns out not to be the case. Suppose that although there are infinitely many lines through P that never intersect L, there are also infinitely many that do intersect L. This, combined with the other axioms and postulates of plane geometry, implies that there are two lines through P defining the boundary between lines that do intersect L and lines that don't, as shown below:

This leads to the original non-Euclidean geometry of Lobachevski, Bolyai, and Gauss, i.e., the hyperbolic plane. The analogy to Minkowski spacetime is obvious. The behavior of “straight lines” in a surface of negative curvature (although positive-definite) is nicely suggestive of how the light-lines in spacetime serve as the dividing lines between those lines through P that intersect with the future "L" and those that don't (distinguishing between spacelike and timelike intervals). This is also a nice illustration of the fact that even though Minkowski spacetime is "flat" in the Riemannian sense, it is nevertheless distinctly non-Euclidean. Of course, the possibility that spacetime might be curved as well as locally Minkowskian led to general relativity, but arguably the conceptual leap required to go from a positive-definite to a non-positive-definite metric is greater than that required to go from a flat to a curved metric. The former implies that the local geometrical structure of the effective spatio-temporal manifold of events is profoundly different than had been assumed for thousands of years, and this realization led naturally to a new set of principles with which to organize and interpret our experience. It became clear in the 19th century that there are actually three classes of geometries consistent with Euclid’s basic premises, depending on what we adopt as the “fifth postulate”. The three types of geometry correspond to spaces of negative, positive, or zero curvature. The analogy to the three possible classes of spacetimes (Euclidean, Galilean, and Minkowskian) is obvious, and in both cases it came to be recognized that, insofar as these mathematical structures were supposed to represent physical properties, the choice between the alternatives was a matter for empirical investigation. Nevertheless, the superficially axiomatic way in which Einstein presented the special theory in his 1905 paper tended to encourage the idea that special relativity represented a closed formal system, like Euclid’s geometry interpreted in the purely mathematical sense. For example, in 1907 Paul Ehrenfest wrote that In the formulation in which Mr Einstein published it, Lorentzian relativistic electrodynamics is rather generally viewed as a complete system. Accordingly it must be able to provide an answer purely deductively to the question [involving the shape of the moving electron]…

However, Einstein himself was quick to disavow this idea, answering The principle of relativity, or, more exactly, the principle of relativity together with the principle of the constancy of the velocity of light, is not to be conceived as a “complete system,” in fact, not as a system at all, but merely as a heuristic principle which, when considered by itself, contains only statements about rigid bodies, clocks, and light signals. It is only by requiring relations between otherwise seemingly unrelated laws that the theory of relativity provides additional statements.

Just as the basic premises of Euclid’s geometry were classified in many different ways

(e.g., postulates, axioms, common notions, definitions), the premises on which Einstein based special relativity can be classified in many different ways. Indeed, in his 1905 paper, Einstein introduced the first of these premises as follows: ... the same laws of electrodynamics and optics will be valid for all coordinate systems in which the equations of mechanics hold good. We will raise this conjecture (hereafter called the "principle of relativity") to the status of a postulate...

Here, in a single sentence, we find a proposition referred to as a conjecture, a principle, and a postulate. The meanings of these three terms are quite distinct, but they are each arguably applicable. The assertion of the co-relativity of optics and mechanics was, and will always be, conjectural, because it can be empirically corroborated only up to a limited precision. Einstein formally adopted this conjecture as a postulate, but on a more fundamental level it serves as a principle, since it entails the decision to organize our knowledge in terms of coordinate systems with respect to which the equations of mechanics hold good, i.e., inertial coordinate systems. Einstein goes on to introduce a second proposition that he formally adopts as a postulate, namely, ... that the velocity of light always propagates in empty space with a definite velocity c that is independent of the state of motion of the emitting body. These two postulates suffice for the attainment of a simple and consistent electrodynamics of moving bodies based on Maxwell's theory for bodies at rest.

Interestingly, in the paper "Does the Inertia of a Body Depend on Its Energy Content?" published later in the same year, Einstein commented that ... the principle of the constancy of the velocity of light... is of course contained in Maxwell's equations.

In view of this, some have wondered why he did not simply dispense with his "second postulate” and assert that the "laws of electrodynamics and optics" in the statement of the first principle are none other than Maxwell's equations. In other words, why didn’t he simply base his theory on the single proposition that Maxwell's equations are valid for every system of coordinates in which the laws of mechanics hold good? Part of the answer is that he realized important parts of physics, such as the physics of elementary particles, cannot possibly be explained in terms of Maxwellian electrodynamics. In a note published in 1907 he wrote It should be noted that the laws that govern [the structure of the electron] cannot be derived from electrodynamics alone. After all, this structure necessarily results from the introduction of forces which balance the electrodynamic ones.

More fundamentally, by 1905 he was already aware of the fact that, although Maxwell's equations are empirically satisfactory in many respects, they cannot be regarded as fundamentally correct or valid. In his paper "On a Heuristic Point of View Concerning the Production and Transformation of Light" he wrote ... despite the complete confirmation of [Maxwell's theory] by experiment, the theory of light, operating with continuous spatial functions, leads to contradictions when applied to the

phenomena of emission and transformation of light.

Thus it isn't surprising that he chose not to base the theory of relativity on Maxwell’s equations. He needed to distill from electromagnetic phenomena the key feature whose significance "transcended its connection with Maxwell's equations", and which would serve as a viable principle for organizing our knowledge of all phenomena, including both optics and mechanics. The principle he selected was the existence of an invariant speed with respect to any local system of inertial coordinates, and then for definiteness he could identify this speed with the speed of propagation of electromagnetic energy. After reviewing the operational definition of inertial coordinates in section §1 (which he does by optical rather than mechanical means, thereby missing an opportunity to clarify the significance of inertial coordinates in establishing the connection between mechanical and optical phenomena), he gives more formal statements of his two principles The following reflections are based on the principle of relativity and the principle of the constancy of the velocity of light. These two principles we define as follows: 1. The laws by which the states of physical systems undergo change are not affected, whether these changes of state be referred to the one or the other of two systems of co-ordinates in uniform translatory motion. 2. Any ray of light moves in the "stationary" system of co-ordinates with the determined velocity c, whether the ray is emitted by a stationary or by a moving body. Hence velocity equals [length of] light path divided by time interval [of light path], where time interval [and length are] to be taken in the sense of the definition in §1.

The first of these is nothing but the principle of inertial relativity, which had been accepted as a fundamental principle of physics since the time of Galileo (see section 1.3). Strictly speaking, Einstein’s statement of the principle here is incorrect, because he assumes the coordinate systems in which the equations of mechanics hold good are fully characterized by being in uniform translatory motion, whereas in fact it is also necessary to specify an inertially isotropic simultaneity. Einstein chose to address this aspect of inertial coordinate systems by means of a separate and seemingly arbitrary definition of simultaneity based on optical phenomena, which unfortunately has invited much misguided philosophical debate about what should be considered “true” simultaneity. All this could have been avoided if, from the start, Einstein had merely stated that an inertial coordinate system is one in which mechanical inertia is homogeneous and isotropic (just as Galileo said), and then noting that automatically entails the conventional choice of simultaneity. The content of his first principle (i.e., the relativity principle) is simply that the inertial simultaneity of mechanics and the optical simultaneity of electrodynamics are identical. Despite the shortcomings of its statement, the principle of relativity was very familiar to the physicists of 1905, whether they wholeheartedly accepted it or not. Einstein's second principle, by itself, was also not regarded as particularly novel, because it conveys the usual understanding of how a wave propagates at a fixed speed through a medium, independent of the speed of the source. It was the combination of these two principles that was new, since they had previously been thought to be irreconcilable. In a sense, the

first principle arose from the “ballistic particles in a vacuum” view of physics, and the second arose from the “wave in a material medium” view of physics. Both of these views can trace their origins back to ancient times, and both seem to capture some fundamental truth about the world, and yet they had always been regarded as mutually exclusive. Einstein’s achievement was to explain how they could be reconciled. Of course, Einstein’s second principle it isn't a self-contained statement, because its entire meaning and significance depends on "the sense of" time intervals and (implicitly) spatial lengths given in §1, where we find that time intervals and spatial lengths are defined to be such that their ratio equals the fixed constant c for light paths. This has tempted some readers to conclude that "Einstein's second principle" was merely a tautology, with no substantial content. The source of this confusion is the fact that the essential axiomatic foundations underlying special relativity are contained not in the two famous propositions at the beginning of §2 of Einstein's paper (as quoted above), but rather in the sequence of assumptions and definitions explicitly spelled out in §1. Among these are the very first statement Let us take a system of co-ordinates in which the equations of Newtonian mechanics hold good.

In subsequent re-prints of this paper Sommerfeld added a footnote to this statement, to say "i.e., to the first approximation", meaning for motion with speeds small in comparison with the speed of light. (This illustrates the difficulty of writing a paper that results in a modification of the equations of Newtonian mechanics!) Of course, Einstein was aware of the epistemological shortcomings of the above statement, because while it tells us to begin with an inertial system of coordinates, it doesn't tell us how to identify such a system. This has always been a potential source of ambiguity for mechanics based on the principle of inertia. Strictly speaking, Newton's laws are epistemologically circular, so in practice we must apply it both inductively and deductively. First we use them inductively with our primitive observations to identify inertial coordinate systems by observing how things behave. Then at some point when we've gained confidence in the inertialness of our coordinates, we begin to apply the laws deductively, i.e., we begin to deduce how things will behave with respect to our inertial coordinates. Ultimately this is how all physical theories are applied, first inductively as an organizing principle for our observations, and then deductively as "laws" to make predictions. Neither Galilean nor special relativity is able to justify the privileged role given to a particular class of coordinate systems, nor to provide a non-circular means of identifying those systems. In practice we identify inertial systems by means of an incomplete induction. Although Einstein was aware of the deficiency of this approach (which he subsequently labored to eliminate from the general theory), in 1905 he judged it to be the only pragmatic way forward. The next fundamental assertion in §1 of Einstein's paper is that lengths and time intervals can be measured by (and expressed in terms of) a set of primitive elements called "measuring rods" and "clocks". As discussed in Section 1.2, Einstein was fully aware of the weakness in this approach, noting that “strictly speaking, measuring rods and clocks should emerge as solutions of the basic equations”, not as primitive conceptions. Nevertheless

it was better to admit such inconsistency - with the obligation, however, of eliminating it at a later stage of the theory...

Thus the introduction of clocks and rulers as primitive entities was another pragmatic concession, and one that Einstein realized was not strictly justifiable on any other grounds than provisional expediency. Next Einstein acknowledges that we could content ourselves to time events by using an observer located at the origin of the coordinate system, which corresponds to the absolute time of Lorentz, as discussed in Section 1.6. Following this he describes the "much more practical arrangement" based on the reciprocal operational definition of simultaneity. He says We assume this definition of synchronization to be free of any possible contradictions, applicable to arbitrarily many points, and that the following relations are universally valid: 1. If the clock at B synchronizes with the clock at A, the clock at A synchronizes with the clock at B. 2. If the clock at A synchronizes with the clock at B and also with the clock at C, the clocks at B and C also synchronize with each other.

These are important and non-trivial assumptions about the viability of the proposed operational procedure for synchronizing clocks, but they are only indirectly invoked by the reference to "the sense of time intervals" in the statement of Einstein's second principle. Furthermore, as mentioned in Section 1.6, Einstein himself subsequently identified at least three more assumptions (homogeneity, spatial isotropy, memorylessness) that are tacitly invoked in the formal development of special relativity. The list of unstated assumptions would actually be even longer if we were to construct a theory beginning from nothing but an individual's primitive sense perceptions. The justification for leaving them out of a scientific paper is that these can mostly be classified as what Euclid called "common notions", i.e., axioms that are common to all fields of thought. In many respects Einstein modeled his presentation of special relativity not on Euclid’s Elements (as Newton had done in the Principia), but on the formal theory of thermodynamics, which is founded on the principle of the conservation of energy. There are different kinds of energy, with formally different units, e.g., mechanical and gravitational potential energy are typically measured in terms of joules (a force times a distance, or equivalently a mass times a squared velocity), whereas heat energy is measured in calories (the amount of heat required to raise the temperature of 1 gram of water by one degree C). It's far from obvious that these two things can be treated as different aspects of the same thing, i.e., energy. However, through careful experiments and observations we find that whenever mechanical energy is dissipated by friction (or any other dissipative process), the amount of heat produced is proportional to the amount of mechanical energy dissipated. Conversely, whenever heat is involved in a process that yields mechanical work, the heat content is reduced in proportion to the amount of work produced. In both cases the constant of proportionality is found to be 4.1833 joules per calorie.

Now, the First Law of thermodynamics asserts that the total energy of any physical process is always conserved, provided we "correctly" account for everything. Of course, in order for this assertion to even make sense we need to define the proportionality constants between different kinds of energy, and those constants are naturally defined so as to make the First Law true. In other words, we determine the proportionality between heat and mechanical work by observing these quantities and assuming that those two changes represent equal quantities of something called "energy". But this assumption is essentially equivalent to the First Law, so if we apply these operational definitions and constants of proportionality, the conservation of energy can be regarded as a tautology or a convention. This shows clearly that, just as in the case of Newton's laws, these propositions are actually principles rather than postulates, meaning that they first serve as organizing principles for our measurements and observations, and only subsequently do they serve as "laws" from which we may deduce further consequences. This is the sense in which fundamental physical principles always operate. Wein's letter of 1912 nominating Einstein and Lorentz for the Nobel prize commented on this same point, saying that "the confirmation of [special relativity] by experiment... resembles the experimental confirmation of the conservation of energy". Einstein himself acknowledged that he consciously modeled the formal structure of special relativity on thermodynamics. He wrote in his autobiographical notes Gradually I despaired of the possibility of discovering the true laws by means of constructive efforts based on known facts. The longer and the more desperately I tried, the more I came to the conviction that only the discovery of a universal formal principle could lead us to assured results. The example I saw before me was thermodynamics. The general principle was there given in the proposition: The laws of nature are such that it is impossible to construct a perpetuum mobile (of the first and second kinds).

This principle is a meta-law, i.e., it does not express a particular law of nature, but rather a general principle to which all the laws of nature conform. In 1907 Ehrenfest suggested that special relativity constituted a closed axiomatic system, but Einstein quickly replied that this was not the case. He explained that the relativity principle combined with the principle of invariant light speed is not a closed system at all, but rather it provides a coherent framework within which to conduct physical investigations. As he put it, the principles of special relativity "permit certain laws to be traced back to one another (like the second law of thermodynamics)." Not only is there a close formal similarity between the axiomatic structures of thermodynamics and special relativity, each based on two fundamental principles, these two theories are also substantively extensions of each other. The first law of thermodynamics can be placed in correspondence with the basic principle of relativity, which suggests the famous relation E = mc2, thereby enlarging the realm of applicability of the first law. The second law of thermodynamics, like Einstein's second principle of invariant light speed, is more sophisticated and more subtle. A physical process whose

net effect is to remove heat from a body and produce an equivalent amount of work is called perpetual motion of the second kind. It isn't obvious from the first law that such a process is impossible, and indeed there were many attempts to find such a process - just as there were attempts to identify the rest frame of the electromagnetic ether - but all such attempts failed. Moreover, they failed in such a way as to make it clear that the failures were not accidental, but that a fundamental principle was involved. In the case of thermodynamics this was ultimately formulated as the second law, one statement of which (as alluded to by Einstein in the quote above) is simply that perpetual motion of the second kind is impossible - provided the various kinds of energy are defined and measured in the prescribed way. (This theory was Einstein's bread and butter, not only because most of his scientific work prior to 1905 had been in the field of thermodynamics, but also because a patent examiner inevitably is called upon to apply the first and second laws to the analysis of hopeful patent applications.) Compare this with Einstein's second principle, which essentially asserts that it's impossible to measure a speed in excess of the constant c - provided the space and time intervals are defined and measured in the prescribed way. The strength of both principles is due ultimately to the consistency and coherence of the ways in which they propose to analyze the processes of nature. Needless to say, our physical principles are not arbitrarily selected assumptions, they are hard-won distillations of a wide range of empirical facts. Regarding the justification for the principles on which Einstein based special relativity, many popular accounts give a prominent place to the famous experiments of Michelson and Morley, especially the crucial version performed in 1889, often presenting this as the "brute fact" that precipitated relativity. Why, then, does Einstein’s 1905 paper fail to cite this famous experiment? It does mention at one point “the various unsuccessful attempts to measure the Earth’s motion with respect to the ether”, but never refers to Michelson's results specifically. The conspicuous absence of any reference to this important experimental result has puzzled biographers and historians of science. Clearly Einstein’s intent was to present the most persuasive possible case for the relativity of space and time, and Michelson's results would (it seems) have been a very strong piece of evidence in his favor. Could he simply have been unaware of the experiment at the time of writing the paper? Einstein’s own recollections on this point were not entirely consistent. He sometimes said he couldn’t remember if he had been aware in 1905 of Michelson's experiments, but at other times he acknowledged that he had known of it from having read the works of Lorentz. Indeed, considering Einstein’s obvious familiarity with Lorentz’s works, and given all the attention that Lorentz paid to Michelson’s ether drift experiments over the years, it’s difficult to imagine that Einstein never absorbed any reference to those experiments. Assuming he was aware of Michelson's results prior to 1905, why did he chose not to cite them in support of his second principle? Of course, his paper includes no formal “references” at all (which in itself seems peculiar, especially to modern readers accustomed to extensive citations in scholarly works), but it does refer to some other experiments and theories by name, so an explicit reference to Michelson’s result would

not have been out of place. One possible explanation for Einstein’s reluctance to cite Michelson, both in 1905 and subsequently, is that he was sophisticated enough to know that his “theory” was technically just a re-interpretation of Lorentz’s theory - making identical predictions - so it could not be preferred on the basis of agreement with experiment. To Einstein the most important quality of his interpretation was not its consistency with experiment, but it’s inherent philosophical soundness. In other words, conflict with experiment was bad, but agreement with experiment by means of ad hoc assumptions was hardly any better. His critique of Lorentz’s theory (or what he knew of it at the time) was not so much that it was empirically "wrong" (which it wasn’t), but that the length contraction and time dilation effects had been inserted ad hoc to match the null results Michelson. (It’s debatable whether this critique was justified, in view of the discussion in Section 1.5.) Therefore, Einstein would naturally have been concerned to avoid giving the impression that his relativistic theory had been contrived specifically to conform with Michelson’s results. He may well have realized that any appeal to the Michelson-Morley experiment in order to justify his theory would diminish rather than enhance its persuasiveness. This is not to suggest that Einstein was being disingenuous, because it’s clear that the principles of special relativity actually do emerge very naturally from just the first-order effects of magnetic induction (for example), and even from more basic considerations of the mathematical intelligibility of Galilean versus Lorentzian transformations (as stressed by Minkowski in his famous 1908 lecture). It seems clear that Einstein’s explanations for how he arrived at special relativity were sincere expressions of his beliefs about the origins of special relativity in his own mind. He was focused on the phenomenon of magnetic induction and the unphysical asymmetry of the pre-relativistic explanations. This was combined with a strong instinctive belief in the complete relativity of physics. He told Shankland in 1950 that the experimental results which had influenced him the most were stellar aberration and Fizeau's measurements on the speed of light in moving water. "They were enough," he said.

3.2 Natural and Violent Motions Mr Spenser in the course of his remarks regretted that so many members of the Section were in the habit of employing the word Force in a sense too limited and definite to be of any use in a complete theory. He had himself always been careful to preserve that largeness of meaning which was too often lost sight of in elementary works. This was best done by using the word sometimes in one sense and sometimes in another, and in this way he trusted that he had made the word occupy a sufficiently large field of thought. James Clerk Maxwell

The concept of force is one of the most peculiar in all of physics. It is, in one sense, the most viscerally immediate concept in classical mechanics, and seems to serve as the essential "agent of causality" in all interactions, and yet the ontological status of force has always been highly suspect. We sometimes regard force as the cause of changes in motion, and imagine that those changes would not occur in the absence of the forces, but this causative aspect of force is an independent assumption that does not follow from any quantifiable definition, since we could equally well regard force as being caused by changes in motion, or even as merely a descriptive parameter with no independent ontological standing at all. In addition, there is an inherent ambiguity in the idea of changes in motion, because it isn't obvious what constitutes unchanging (i.e., unforced) motion. Aristotle believed it was necessary to distinguish between two fundamentally distinct kinds of motion, which he called natural motions and violent motions. The natural motions included the apparent movements of celestial objects, the falling of leaves to the ground, the upward movement of flames and hot gases in the atmosphere, or of air bubbles in water, and so on. According to Aristotle, the cause of such motions is that all objects and substances have a natural place or level (such as air above, water below), and they proceed in the most direct way, along straight vertical paths, to their natural places. The motion of the celestial bodies is circular because this is the most perfect kind of unchanging eternal motion, whereas the necessarily transitory motions of sublunary objects are rectilinear. It may not be too misleading to characterize Aristotle's concept of sublunary motion as a theory of buoyancy, since the natural place of light elements is above, and the natural place of heavy elements is below. If an object is out of place, it naturally moves up or down as appropriate to reach its proper place. Aristotle has often been criticized for saying (or seeming to say) that the speed at which an object falls (through the air) is proportional to its weight. To the modern reader this seems absurd, as it is contradicted by the simplest observations of falling objects. However, it's conceivable that we misinterpret Aristotle's meaning, partly because we're so accustomed to regarding the concept of force as the cause of motion, rather than as an effect or concomitant attribute of motion. If we consider the downward force (which Aristotle would call the weight) of an object to be the force that would be required to keep it at its current height, then the "weight" of an object really is substantially greater the faster it falls. More strength is required to catch a falling object than to hold the same object at rest. Some Aristotelian scholars have speculated that this was Aristotle's actual meaning, although his writing's on the subject are so sketchy that we can't know for certain. In any case, it illustrates that the concept and significance of force in a physical theory is often murky, and it also shows how thoroughly our understanding of physical phenomena is shaped by the distinction between forces (such as gravity) that we consider to be causes of motion, and those (such as impact forces) that we consider to be caused by motion. Aristotle also held that the speed of motion was not only proportional to the "weight" (whatever that means) but inversely proportional to the resistance of the medium. Thus his proposed law of motion could be expressed roughly as V = W/R, and he used this to

argue against the possibility of empty space, i.e., regions in which R = 0, because the velocity of any object in such a region would be infinite. This doesn't seem like a very compelling argument, since we could easily counter that the putative vacuum would not be the natural place of any object, so it would have no "weight" in that direction either. Nevertheless, perhaps to avoid wrestling with the mysterious fraction 0/0, Aristotle surrounded the four sublunary elements of Earth, Water, Air, and Fire with a fifth element (quintessence), the lightest of all, called aether. This aether filled the super-lunary region, ensuring that we would never need to divide by zero. In addition to natural motions, Aristotle also considered violent motions, which were any motions resulting from acts of volition of living beings. Although his writings are somewhat obscure and inconsistent in this area, it seems that he believed such beings were capable of self-motion, i.e., of initiating motion in the first instance, without having been compelled to motion by some external agent. Such self-movers are capable of inducing composite motions in other objects, such as when we skip a stone on the surface of a pond. The stone's motion is compounded of a violent component imparted by our hand, and the natural component of motion compelling it toward its natural place (below the air and water). However, as always, we must be careful not to assume that this motion is to be interpreted as the causative result of the composition of two different kinds of forces. It was, for Aristotle, simply the kinematic composition of two different kinds of motion. The bifurcation of motion into two fundamentally different types, one for natural motions of non-living objects and another for acts of human volition – and the attention that Aristotle gave to the question of unmoved movers, etc. – is obviously related to the issue of free will, and demonstrates the strong tendency of scientists in all ages to exempt human behavior from the natural laws of physics, and to regard motions resulting from human actions as original, in the sense that they need not be attributed to other motions. We'll see in Section 9 that Aristotle's distinction between natural and violent motions plays a key role in the analysis of certain puzzling aspects of quantum theory. We can also see that the ontological status of "force" in Aristotle's physics is ambiguous. In some circumstances it seems to be more an attribute of motion rather than a cause of motion. Even if we consider the quantitative physics of Galileo, Newton, and beyond, it remains true that "force", while playing a central role in the formulation, serves mainly as an intermediate quantity in the calculations. In fact, the concept of 'force' could almost be eliminated entirely from classical mechanics. (See section 4 for further discussion of this.) Newton wrestled with the question of whether force should be regarded as an observable or simply a relation between observables. Interestingly, Ernst Mach regarded the third law as Newton's most important contribution to mechanics, even though other's have criticized it as being more a definition than a law. Newton’s struggle to find the "right" axiomatization of mechanics can be seen by reading the preliminary works he wrote leading up to The Principia, such as "De motu corporum in gyrum" (On the motion of bodies in an orbit). At one point he conceived of a system with five Laws of Motion, but what finally appeared in Principia were eight Definitions

followed by three Laws. He defined the "quantity of matter" as the measure arising conjointly from the density and the volume. In his critical review of Newtonian mechanics, Mach remarked that this definition is patently circular, noting that "density" is nothing but the quantity of matter per volume. However, all definitions ultimately rely on undefined (irreducible) terms, so perhaps Newton was entitled to take density and volume as two such elements of his axiomatization. Furthermore, by basing the quantity of matter on explicitly finite density and volume, Newton deftly precluded point-like objects with finite quantities of matter, which would imply the existence of infinite forces and infinite potential energy according to his proposed inverse-square law of gravity. The next basic definition in Principia is of the "quantity of motion", defined as the measure arising conjointly from the velocity and the quantity of matter. Here we see that "velocity" is taken as another irreducible element, like density and volume. Thus, Newton's ontology consists of one irreducible entity, called matter, possessing three primitive attributes, called density, volume, and velocity, and in these terms he defines two secondary attributes, the "quantity of matter" (which we call "mass") as the product of density and volume, and the "quantity of motion" (which we call "momentum") as the product of velocity and mass, meaning it is the product of velocity, density, and volume. Although the term "quantity of motion" suggests a scalar, we know that velocity is a vector, (i.e., it has a magnitude and a direction), so it's clear that momentum as Newton defined it is also is a vector. After going on to define various kinds of forces and the attributes of those forces, Newton then, as we saw in Section 1.3, took the law of inertia and relativity as his First Law of Motion, just as Descartes and Huygens had done. Following this we have the "force law", i.e., Newton's Second Law of Motion: The change of motion is proportional to the motive force impressed; and is made in the direction of the right line in which the force is impressed. Notice that this statement doesn't agree precisely with either of the two forms in which the Second Law is commonly given today, namely, as F = dp/dt or F = ma. The former is perhaps closer to Newton's actual statement, since he expressed the law in terms of momentum rather than acceleration, but he didn't refer to the rate of change of momentum. No time parameter appears in the statement at all. This is symptomatic of a lack of clarity (as in Aristotle’s writings) over the distinction between "impulse force" and "continuous force". Recall that our speculative interpretation of Aristotle's downward "weight" was based on the idea that he actually had in mind something like the impulse force that would be exerted by the object if it were abruptly brought to a halt. Newton's Second Law, as expressed in the Principia, seems to refer to such an impulse, and this is how Newton used it in the first few Propositions, but he soon began to invoke the Second Law with respect to continuous forces of finite magnitude applied over a finite length of time – more in keeping with a continuous force of gravity, for example. This shows that even in the final version of the axioms and definitions laid down by Newton, he did not completely succeed in clearly delineating the concept of force that he had in mind. Of course, in each of his applications of the Second Law, Newton made the necessary dimensional adjustments to appropriately account for the temporal aspect that was missing from the statement of the Law itself, but this was done ad hoc, with no clear

explanation. (His ability to reliably incorporate these factors in each context testifies to his solid grasp of the new dynamics, despite the imperfections of his formal articulation of it.) Subsequent physicists clarified the quantitative meaning of Newton’s second law, explicitly recognizing the significance of time, by expressing the law either in the form F = d(mv)/dt or else in what they thought was the equivalent form F = m(dv/dt). Of course, in the context of special relativity these two are not equivalent, and only the former leads to a coherent formulation of mechanics. (It’s also worth noting that, in the context of special relativity, the concept of force is largely an anachronism, and it is introduced mainly for the purpose of relating relativistic descriptions to their classical counterparts.) The third Law of Motion in the Principia is regarded by many people as one of Newton's greatest and most original contributions to physics. This law states that To every action there is always opposed an equal reaction: or, the mutual actions of two bodies upon each other are always equal, and directed to contrary parts. Unfortunately the word "action" is not found among the previously defined terms, but in the subsequent text Newton clarifies the intended meaning. He says "If a body impinge upon another, and by its force change the motion of the other, that body also... will undergo an equal change in its own motion towards the contrary part." In other words, the net change in the "quantity of motion" (i.e., the sum of the momentum vectors) is zero, so momentum is conserved. More subtly, Newton observes that "If a horse draws a stone tied to a rope, the horse will be equally drawn back towards the stone". This is true even if neither the horse nor the stone are moving (which of course implies that they are each subject to other forces as well, tending to hold them in place). The illustrates how the concept of force enables us to conceptually decompose a null net force into non-null components, each representing the contributions of different physical interactions. In retrospect we can see that Newton's three "laws of motion" actually represent the definition of an inertial coordinate system. For example, the first law imposes the requirement that the spatial coordinates of any material object free of external forces are linear functions of the time coordinate, which is to say, free objects move with a uniform speed in a straight line with respect to an inertial coordinate system. Rather than seeing this as a law governing the motions of free objects with respect to a given system of coordinates, it is more correct to regard it as defining a class of coordinates systems in terms of which a recognizable class of motions have particularly simple descriptions. It is then an empirical question as to whether the phenomena of nature possess the attributes necessary for such coordinate systems to exist. The significance of “force” was already obscure in Newton’s three laws of mechanics, but it became even more obscure when he proposed the law of universal gravitation, according to which every particle of matter exerts a force of attraction on every other particle of matter, with a strength proportional to its mass and inversely proportional to the square of the distance. The rival Cartesians expected all forces to be the result of local contact between bodies, as when two objects press directly against each other, but Newton’s conception of instantaneous gravity between distant objects seems to defy

representation in those terms. In an effort to reconcile universal gravitation with semiCartesian ideas of force, Newton’s young friend Nicolas Fatio hypothesized an omnidirectional flux of small “ultra-mundane” particles, and argued that the mutual shadowing effect could explain why massive bodies are forced together. The same idea was later taken up by Lesage, but many inconsistencies were pointed out, making it clear that no such theory could accurately account for the phenomena. The simple notion of force at a distance was so successful that it became the model for all mutual forces between objects, and the early theories of electricity and magnetism were expressed in those terms. However, reservations about the intelligibility of instantaneous action at a distance remained. Eventually Faraday and Maxwell introduced the concept of disembodied “lines of force”, which later came to be regarded as fields of force, almost as if force was an entity in its own right, capable of flowing from place to place. In this way the Maxwellians (perhaps inadvertently) restored the Cartesian ideas that all space must be occupied and that all forces must be due to direct local contact. They accomplished this by positing a new class of entity, namely the field. Admittedly our knowledge of the electromagnetic field is only inferred from the behavior of matter, but it was argued that explanations in terms of fields are more intelligible than explanations in terms of instantaneous forces at a distance, mainly because fields were considered necessary for strict conservation of energy and momentum once it was recognized that electromagnetic effects propagate at a finite speed. However, the explanation of phenomena in terms of fields, characterized by partial differential equations, was incomplete, because it was not possible to represent stable configurations of matter in these terms. Maxwell’s field equations are linear, so there was no hope of them possessing solutions corresponding to discrete electrical charges or particles of matter. Hence it was still necessary to retain the laws of mechanics of discrete entities, characterized by total differential equations. The conceptual dichotomy between Newton’s physics of particles and Maxwell’s physics of fields is clearly shown by the contrast between total and partial differential equations, and this contrast was seen (by some people at least) as evidence of a fundamental flaw. In a 1936 retrospective essay Einstein wrote This is the basis on which H. A. Lorentz obtained his synthesis of Newton’s mechanics and Maxwell’s field theory. The weakness of this theory lies in the fact that it tried to determine the phenomena by a combination of partial differential equations (Maxwell’s field equations for empty space) and total differential equations (equations of motions of points), which procedure was obviously unnatural. The difference between total and partial differential equations is actually more profound than it may appear at first glance, because (as alluded to in section 1.1) it entails different assumptions about the existence of free-will and acts of volition. If we consider a pointlike particle whose spatial position x(t) is strictly a function of time, and we likewise consider the forces F(t) to which this particle is subjected as strictly a function of time, then the behavior of this particle can be expressed in the form of total differential equations, because there is just a single independent variable, namely the time coordinate.

Every physically meaningful variable exists as one of a countable number of explicit functions of time, and each of the values is realized at it’s respective time. Thus the total derivatives are evaluated over actualized values of the variables. In contrast, the partial derivatives over immaterial fields are inherently hypothetical, because they represent the variations in some variable of a particle not as a function of time along the particle’s actual path, but transversely to the particle’s path. For example, rather than asking how the force experienced by a particle changes over time, we ask how the force would change if at this instant of time the particle was in a slightly different position. Such hypotheticals have meaning only assuming an element of contingency in events, i.e., only if we assume the paths of material objects could be different than they are. Of course, if we were to postulate a substantial continuous field, we could have nonhypothetical partial derivatives, which would simply express the facts implicit in the total derivatives for each substantial part of the field. However, the intelligibility of a truly continuous extended substance is questionable, and we know of no examples of such a thing in nature. Given that the elementary force fields envisaged by the Maxwellians were eventually concede to be immaterial, and their properties could only be inferred from the state variables of material entities, it’s clear that the partial derivatives over the field variables are not only hypothetical, but entail the assumption of freedom of action. In the absence of freedom, any hypothetical transverse variations in a field (i.e., transverse to the actual paths of material entities) would be meaningless. Only actual variations in the state variables of material entities would have meaning. Thus the contrast between total and partial differential equations reflects two fundamentally different conceptual frameworks, the former based on determinism and the latter based on the possibility of free acts. This is closely analogous to Aristotle’s dichotomy between natural and violent motions. As noted above, Einstein regarded this dualism as unnatural, and his intuition led him to expect that the field concept, governed by partial differential equations, would ultimately prove to be sufficient for a complete description of phenomena. In the same essay mentioned above he wrote What appears certain to me, however, is that, in the foundations of any consistent field theory, there should not be, in addition to the concept of the field, any concept concerning particles. The whole theory must be based solely on partial differential equations and their singularity-free solutions. It may seem ironic that he took this view, considering that Einstein was such a staunch defender of strict causality and determinism, but by this time he was wholly committed to the concept of a continuous field as the ultimate ontological entity, more fundamental even than matter, and possessing a kind of relativistic substantiality, subject to deterministic laws. In a sense, he seems to have come to believe that the field was not a hypothetical entity inferred from the observed behavior of material bodies, but rather that material bodies were hypothetical entities inferred from the observed behavior of fields. An important first step in this program was to eliminate the concept of forces acting between bodies, and to replace this with a field-theoretic model. He (arguably)

accomplished this for gravitation with the general theory of relativity, which completely dispenses with the concept of a "force of gravity", and instead interprets objects under the influence of gravity as simply proceeding, unforced, along the most natural (geodesic) paths. Thus the concept of force, and particularly gravitational force, which was so central to Newton's synthesis, was simply discarded as having no absolute significance. However, the concept of force is still very important in physics, partly because we continue to employ the classical formulation of mechanics in the limit of low speeds and weak gravity, but more importantly because it has not proven possible (despite the best efforts of Einstein and others) to do for the other forces of nature what general relativity did for gravity, i.e., to express the apparently forced (violent) motions as natural paths through a modified geometry of space and time.

3.3 De Mora Luminis I see my light come shining, From the west unto the east. Any day now, any day now, I shall be released. Bob Dylan, 1967 We are usually not aware of any delay between the occurrence of an event and its visual appearance in the eye of a distant observer. In fact, a single visual "snapshot" is probably the basis for most people's intuitive notion of an "instant". However, the causal direction of an instantaneous interaction is inherently ambiguous, so it's perhaps not surprising that ancient scholars considered two competing models of vision, one based on the idea that every object is the source of images of itself, emanating outwards to the eye of the observer, and the other claiming that the observer's eye is the source of visual rays emanating outwards to "feel" distant objects. An interesting synthesis of these two concepts is the idea, adopted by Descartes, of light as a kind of pressure in an ideal incompressible medium that conveys forces and pressures instantaneously from one location to another. However, even with Descartes we find the medium described as "incompressible, or nearly incompressible", revealing the difficulty of reconciling instantaneous force at a distance with our intuitive idea of causality. Fermat raised this very objection when he noted (in a letter on Descartes' Dipotrics) that if we assume instantaneous transmission of light we are hardly justified in analyzing such transmissions by means of analogies with motion through time. Perhaps urged by the sense that any causal action imposed from one location on another must involve a progression in time, many people throughout history have speculated that light may propagate at a finite speed, but all efforts to discern a delay in the passage of light (mora luminus) failed. One of the earliest such attempts of which we have a written account is the experiment proposed by Galileo, who suggested (in his Dialogue Concerning Two New Sciences) relaying a signal back and forth with lamps and shutters

located on separate hilltops. Based on the negative results from this type of crude experiment, Galileo could only confirm what everyone already knew, namely, that the propagation of light is "if not instantaneous, then extraordinarily fast". He went on to suggest that it might be possible to discern, in distant clouds, some propagation time for the light emitted by a lightning flash. We see the beginning of this light – I might say its head and source – located at a particular place among the clouds; but it immediately spreads to the surrounding ones, which seems to be an argument that at least some time is required for propagation. For if the illumination were instantaneous and not gradual, we should not be able to distinguish its origin – its center, so to speak – from its outlying portions. The idea of using the clouds in the night sky as a giant bubble chamber was characteristic of Galileo’s talent for identifying opportunities in natural phenomena for testing ideas, as well as his attentiveness to subtle qualitative impressions, such as the sense of being able to distinguish the “center” of the illumination given off by a flash of lightning, even though we can’t quantify the delay time. It also shows that Galileo was inclined to think light propagated at a finite speed, but of course he rightly qualified this lightning-cloud argument by admitting that “really these matters lie far beyond our grasp”. Today we would say the perceived “spreading out” of a lightning strike through the clouds is due to propagation of the electrical discharge process. Even for clouds located ten miles apart, the time for light itself to propagate from one cloud to the other is only one 18,600th of a second, presumably much too short to give any impression of delay to human senses. Interestingly, Galileo also contributed (posthumously) to the first successful attempt to actually observe a delay attributable to the propagation of light at a finite speed. In 1610, soon after the invention of the telescope, he discovered the four largest moons of Jupiter, illustrated below:

In hopes of gaining the patronage of the Grand Duke Cosimo II, Galileo named Jupiter's

four largest moons the "Medicean Stars", but today they're more commonly called the Galilean satellites. At their brightest these moons would be just bright enough (with magnitudes between 5 and 6) to be visible from Earth with the naked eye - except that they are normally obscured by the brightness of Jupiter itself. (Interestingly, there is some controversial evidence suggesting that an ancient Chinese astronomer may actually have glimpsed one of these moons 2000 years before Galileo.) Of course, from our vantage point on Earth, we must view the Jupiter system edgewise, so the moons appear as small stars that oscillate from side to side along the equatorial plane of Jupiter. If they were all perpendicular to the Earth's line of sight, and all on the same side of Jupiter, simultaneously, they would look like this:

By the 1660's, detailed tables of the movements of these moons had been developed by Borelli (1665) and Cassini (1668). Naturally these tables were based mainly on observations taken around the time when Jupiter is nearly "in opposition", which is to say, when the Earth passes directly between Jupiter and the Sun, because this is when Jupiter appears high in the night sky. The mean orbital periods of Jupiter's four largest moons were found to be 1.769 days, 3.551 days, 7.155 days, and 16.689 days, and these are very constant and predictable (especially for the two inner moons), like a giant clockwork. (In fact, there were serious attempts in the 18th century to develop a system of tables and optical instruments so that the "Jupiter clock" could be used by sailors at sea to determine Greenwich Meridian time, from which they could infer their longitude.) Based on these figures it was possible to predict within minutes the times of eclipses and passages (i.e., the passings behind and in front of Jupiter) that would occur during the viewing opportunities in future "oppositions". In particular, the innermost satellite, Io (which is just slightly larger than our own Moon), completes one revolution around Jupiter every 42.456 hours. Therefore, when viewed from the Earth, we expect to see Io pass behind Jupiter once every 42 hours, 27 minutes, and 21 seconds - assuming the light from each such eclipse takes the same amount of time to reach the Earth. By the 1670's people began to make observations of Jupiter's moons from the opposite side of the Earth's orbit, i.e., when the Earth was on the opposite side of the Sun from Jupiter, and they observed a puzzling phenomenon. Obviously it's more difficult to make measurements at these times, because the Jovian system is nearly in conjunction with the Sun, but at dawn and dusk it is possible to observe Jupiter even when it is fairly close to conjunction. These observations, taken about 6 months away from the optimum viewing times, reveal that the eclipses and passages of Jupiter's innermost moon, Io, which could be predicted so precisely when Jupiter is in opposition, are consistently late by about 17 minutes relative to their predicted times of occurrence. (Actually the first such estimate, made by the Danish astronomer Ole Roemer in 1675, was 22 minutes.) This is not to say that the time intervals between successive eclipses is increased by 17 minutes, but that the absolute time of occurrence is 17 minutes later than was predicted six months earlier based on the observed orbital period at that time. Since Io has a period of 1.769 days, it

completes about 103 orbits in six months, and it appears to lose a total of 17 minutes during those 103 orbits, which is an average of about 9.9 seconds per orbit. Nevertheless, at the subsequent "opposition" viewing six months later, Io is found to be back on schedule! It's as if a clock runs slow in the mornings and fast in the afternoons, so that on average it never loses any time from day to day. While mulling over this data in 1675, it occurred to Roemer that he could account for the observations perfectly if it is assumed that light propagates at a finite speed. At last someone had observed the mora luminus. Light travels at a finite speed, which implies that when we see things we are really seeing how they were at some time in the past. The further away we are from an object, the greater the time delay in our view of that object. Applying this hypothesis to the observations of Jupiter's moons, Roemer considered the case when Jupiter was in opposition on, say, January 1, so the light from the Jovian eclipses was traveling from the orbit of Jupiter to the orbit of the Earth, as shown in the figure below.

The intervals between successive eclipses around this time will be very uniform near the opposition point, because the eclipses themselves are uniform and the distance from Jupiter to the Earth is fairly constant during this time. However, after about six and a half months (denoted by July 18 in the figure), Jupiter is in conjunction, which means the Earth is on the opposite side of it's orbit from Jupiter. The light from the "July 18" eclipse will still cross the Earth's orbit (on the near side) at the expected time, but it must then travel an additional distance, equal to the diameter of the Earth's orbit, in order to reach the Earth. Hence we should expect it to be "late" by the amount of time required for light to travel the Earth's orbital diameter. Combining this with a rough estimate of the distance from the Earth to the Sun, Huygens reckoned that light must travel at about 209,000 km/sec. A subsequent estimate by Newton gave a value around 241,000 km/sec. In fact, the Scholium to Proposition 96 of Newton's Principia includes the statement For it is now certain from the phenomenon of Jupiter's satellites, confirmed by the observations of different astronomers, that light is propagated in succession, and requires about seven or eight minutes to travel from the sun to the earth. The early quantitative estimates of the speed of light were obviously impaired by the lack

of precise knowledge of the Earth-Sun distance. Using modern techniques, the Earth's orbital diameter is estimated to be about 2.98 x 1011 meters, and the observed time delay in the eclipses and passages of Jupiter's moons when viewed from the Earth with Jupiter in conjunction is about 16.55 minutes = 993 seconds, so we can deduce from these observations that the speed of light is about 2.98 x 1011 / 993  3 x 108 meters/sec. Of course, Roemer's hypothesis implies a specific time delay for each point of the orbit, so it can be corroborated by making observations throughout the year. We find that most of the discrepancy occurs during the times when the distance between Jupiter and the Earth is changing most rapidly, which is when the Earth-Sun axis is nearly perpendicular to the Jupiter-Sun axis. At one of these positions the Earth is moving almost directly toward Jupiter, and at the other it is moving almost directly away from Jupiter, as shown in the figure below.

The Earth's speed relative to Jupiter at these points is essentially just its orbital speed, which is the circumference of its orbit divided by one year. Thus we have

which is equivalent to about 3 x 104 meters/sec. If we choose units so that c = 1, then we have v = 0.0001. From this point of view the situation can be seen as a simple application of the Doppler effect, and the frequency of the eclipses as viewed on Earth can be related to the actual frequency (which is what we observe at conjunction and opposition) according to the formulas

The frequencies are inversely proportional to the time intervals between eclipses. These formulas imply that, for the moon Io, whose orbital period is 1.769 days = 2547.3600 minutes, the time interval between consecutive observed eclipses when the Earth is

moving directly toward Jupiter (indicated as "Jan" in the above figure) is 2547.1052 minutes, and the time intervals between successive observed eclipses six months later is 2547.6147 minutes. Thus the interval between observed eclipses is 15.2 seconds shorter than nominal in the former case, and it is 15.3 seconds longer than nominal in the latter case, making a total difference of 30.5 seconds between the inter-arrival times at the two extremes, separated by six months. It would have been difficult to keep time this accurately in Roemer's day, but differences of this size are easily measured with modern clocks. By the way, the other moons of Jupiter do not conform so nicely to Roemer's hypothesis, but this is because their orbital motions are inherently more irregular due to their mutual gravitational interactions. Despite the force of Roemer's analysis, and the early support of both Huygens and Newton, most scientists remained skeptical of the idea of a finite speed of light. It was not until 50 years later, when the speed of light was evaluated in a completely different way, arriving at nearly the same value, that the idea became widely accepted. This occurred in 1729, when the velocity of light was estimated by James Bradley based on observations of the aberration of starlight, which he argued must depend on the ratio of the speed of light to the orbital speed of the Earth. Based on the best measurements of the limiting starlight aberration 20.4"  0.1" by Otto Struve, and taking the speed of the Earth to be about 30.56 km/sec from Encke's solar parallax estimate of 8.57"  0.04", this implied a light speed of about 308,000 km/sec. Unfortunately, Encke's parallax estimates had serious problems, and he greatly underestimated his error band. The determination of the Earth-Sun distance was a major challenge for scientists in the 18th century. Interestingly, the primary mission of Captain James Cook when he embarked in the ship Endeavour on his famous voyage in 1768 was to observe the transit of the planet Venus across the disk of the Sun from the vantage point of Tahiti in the South seas, with the aim of determining the distance from the Sun to the Earth by parallax. Roughly once each century two such conjunctions are visible from the Earth, occurring eight years apart. Edmund Halley had urged that when the next opportunities arose on June 6, 1761, and June 3, 1769, it be observed from as many vantage points as possible on the Earth's surface to make the best possible determination. The project was undertaken by people from many countries. Le Gentil traveled to India for the 1761 transit, but since England and France were antagonists at the time, he had to dodge the English war ships, causing him to reach India just after June 6, missing the first transit of Venus. Determined not to miss the second one, he remained in India for the next eight years (!) "doing various useful work" (according to Pannekoek) until June 3 1769. Alas, when the day arrived, it was too cloudy to see anything. Cook's observations fared somewhat better. The French government actually issued orders to its war ships to leave Cook alone, since he was "on a mission for the benefit of all mankind". The Endeavour arrived in Tahiti on April 13, 1769, and the scientists were able to make observations in the clear on June 3. Unfortunately, the results were disappointing, not only in Tahiti, but all over the world. It turned out to be extremely difficult to judge precisely (to within, say, 10 seconds) when one edge of Venus passed the border of the Sun. The black disk of the planet appeared to "remain connected like a

droplet" to the border of the Sun, until suddenly the connection was broken and the planet was seen to be well past the border. Observers standing right next to each other recorded times differing by tens of seconds. Consequently the observations failed to yield an improved estimate of the Earth-Sun distance. The first successful quantification of c based solely on terrestrial measurements was probably Fizeau's in 1849, using a toothed wheel, and then Foucault's experiment in 1862 using rotating mirrors. The toothed-wheel didn't work very well, and it was hard to say how accurate it was, but the rotating mirrors led to a value of about 298,000  500 km/sec, significantly below the earlier estimates. Foucault was confident the discrepancy with earlier results couldn't be explained by an error in the aberration angle, so he inferred (correctly) that Encke's Solar parallax estimate (and therefore the orbital velocity of the Earth) was in error, and proposed a value of 8.8", which was subsequently confirmed and refined by new observations, as well as a re-analysis of Encke's 1769 data using better longitudes and yielding an estimate of 8.83". To increase the speed of switching the light signal, Kerr cells were used. These rely on the fact that the refractivity of certain substances can be made to vary with an applied electric voltage. Further refinements led to large-baseline devices, called geodimeters, originally intended for use in geodesic surveying. Here is a summary of the major published determinations of the speed of optical light based on one or another of these techniques:

Measurements of the speed of electromagnetic waves in the radio frequency have also

been made, with the results summarized below:

In addition, the speed of light can be determined indirectly by measuring the ratio of electric to magnetic units, which amounts to measuring the permittivity of the vacuum. Some result given by this method are summarized below:

(Several of the above values include corrections for various group-velocity indices.) A plot of the common logarithm of the tolerance versus the year for the 19 optical light speed measurements is shown below:

Interestingly, comparing each of the measured values with Evenson's 1973 value, we find that more than half of them were in error by more than their published tolerances. This is not so surprising when we note that most of the tolerances were quoted as "one sigma" error bands rather than as absolute limits. Indeed, if we consider the two-sigma band, there were only four cases of over-optimism, and of those, all but Foucault's 1862 result are within three sigma, and even Foucault is within four sigma. This is roughly in agreement with what one would expect, especially for delicate and/or indirect measurements. Also, the aggressive error estimates in this field have had the beneficial effect of spurring controversies between different researchers, forcing them to repeat experiments and refine their techniques in order to resolve the disagreements. In this way,

knowledge of the speed of light progressed in less than 400 years from Galileo's assessment, "extraordinarily fast", to the best modern value, 299,792.4574  0.0012 km/sec. Today the unit of length is actually defined as a specific number of wavelengths of light of a certain frequency, based on the known value of the speed of light, so in effect we now define the meter such that the speed of light is exactly 299792.4574 km/sec. Incidentally, Maxwell once suggested (in his article on Ether for the ninth edition of the Encyclopedia Britannica) that Roemer's method could be used to test for the isotropy of light speed, i.e., to test whether the speed of light is the same in all directions. After noting that any purely terrestrial measurement would yield an effect only of the second order in v/c, which he regarded as “quite insensible” (a remark that spurred Albert Michelson to successfully measure just such a quantity only two years later), he wrote The only practicable method of determining directly the relative velocity of the aether with respect to the solar system is to compare the values of the velocity of light deduced from the observation of the eclipses of Jupiter's satellites when Jupiter is seen from the earth at nearly opposite points of the ecliptic. Notice that, for this type of observation, the relevant speed is not the speed of the earth in its orbit around the sun, but rather the speed of the entire solar system. Roemer's method can be regarded as a means of measuring the speed of light in the direction from Jupiter to the Earth, and since Jupiter has an orbital period of about 12 years, we can use this method to evaluate the speed of light several times over a 12 year period, and thus evaluate the speed in all possible directions (in the plane of the ecliptic). If the sun was stationary, we would not expect to find any differences, but it was already suspected in Maxwell’s time that the sun itself is in motion. The best modern estimate is that our solar system is moving with a speed of about 3.7 x 105 meters per second with respect to the cosmic microwave background radiation (i.e., the frame in which the radiation is roughly isotropic). If we assume a pre-relativistic model in which light propagates at a fixed speed with respect to the background radiation, and in which frames are related by Galilean transformations, we could in principle determine the "absolute speed" of the solar system. The magnitude of the effect is given by computing how much difference would be expected in the time for light to traverse one orbital diameter of the Earth at an effective speed of c+V and cV, where V is the presumed absolute speed of the Earth. This gives a maximum difference of about 2.45 seconds between two measurements taken six years apart. (These two measurements each occur over a 6 month time span as explained above.) In practice it would be necessary to account for many other uncontrolled variables, such as the variations in the orbits of the Earth and Jupiter over the six year interval. These would need to be known to much better than 1 part in 400 to give adequate resolution. To the best of my knowledge, this experiment has never been performed, because by the time sufficiently accurate clocks were available the issue of light's invariance with respect to inertial coordinate systems had already been established by more accurate terrestrial measurements, together with an improved understanding of the meaning of inertial coordinates. Today we are more likely to establish a system of coordinates optically, and

then test to verify the isotropy of mechanical inertia with respect to those coordinates. 3.4 Stationary Paths Then with no throbbing fiery pain, No cold gradations of decay, Death broke at once the vital chain, And free’d his soul the nearest way. Samuel Johnson, 1783 The apparent bending of visual images of objects partially submersed in water was noted in antiquity, but it wasn't until Kepler's Dioptrice, published in 1611, that anyone attempted to actually quantify the effect. Kepler discovered that, at least for rays nearly perpendicular the surface, the ratio of the angles of incidence and refraction is (nearly) proportional to the ratio of what we now call the indices of refraction of the media. (Originally these indices were just empirically determined constants for each substance, but Newton later showed that for most transparent media the refractive index could be taken as unity plus a term proportional to the medium's density.) Incidentally, Kepler also noticed that with suitable materials and angles of incidence, the refracted angle can be made to exceed 90 degrees, resulting in total internal reflection, which is the basic principle of modern fiber optics. In 1621, Willebrord Snell performed a series of careful measurements and found that when a ray of light passes through a surface at which the index of refraction changes abruptly, the angles made by the incident and transmitted rays with the respective outward normals to the surface are related according to the simple formula (now called Snell's Law)

where n1 and n2 are the indices of refraction (still regarded simply as empirical constants for any given medium) on the incident and transmitted sides of the boundary, and θ1 and θ2 are the angles that the incident ray and the transmitted ray make with the normal to the boundary as shown below.

Soon thereafter, Descartes published his La Dioptrique (1637), in which he presented a rationalization of Snell's law based on the idea that light is a kind of pressure transmitted

instantaneously (or nearly so) through an elastic medium. Descartes' theory led to a fascinating scientific dispute over the correct interpretation of light. According to Descartes' mechanistic description, a dense medium must transmit light more effectively, i.e., with more "force", than a less dense medium. (He sometimes described light rays in terms of a velocity vector rather than a force vector, but in either case he reasoned that the magnitude of the vector, which he called the light's determination, increased in proportion to the density of the medium.) Also, Descartes argued that the tangential component of the ray vector remains constant as the ray passes through a boundary. On the basis of these two (erroneous) premises, the parallelogram of forces for a ray of light passing from a less dense to a more dense medium is as shown below.

The magnitude of the incident force is f, and the magnitude of the refracted force if F, each of which is decomposed into components normal and tangential to the surface. Since Descartes assumes ft = Ft, it follows immediately that f sin(θ1) = F sin(θ2). If, as Descartes often did, we regard the force (determination) of the light as analogous to the speed of light, then this corresponds to the relation v1 sin(θ1) = v2 sin(θ2) where v1 and v2 are the speeds of light in the two media. Fermat criticized Descartes' derivation, partly on mathematical grounds, but also because he disagreed with the basic physical assumptions. In particular, Fermat believed that light must not only travel at a finite speed, it must travel slower (not faster) in a denser medium. Thus he argued that the derivation of Snell's law presented by Descartes was invalid, and he suspected the law itself might even be wrong. In his attempts to derive the "true" law of refraction, Fermat recalled the derivation of the law of reflection given by Hero of Alexandria in ancient times. (Actually, Fermat got this idea by way of his friend Marin Careau, who had repeated Hero's derivation in a treatise on optics in 1657.) Hero asserted that light moves in a straight line in empty space, and reflects at equal angles when striking a mirror, for the simple reason that light prefers always to move along the shortest possible path. As Archimedes had pointed out, the shortest path between two given points in space is a straight line, and this (according to Hero) explains why light rays are straight. More impressively, Hero showed that when travelling from some point A to the surface of a plane mirror and then back out to some point B, the shortest path is the one for which the angles of incidence and reflection are equal. These are ingenious observations, but unfortunately the same approach doesn't explain refraction, because in that case the shortest path between a point above water and a point below water (for example) would always be simply a straight line, and there would be no

refraction at all. At this point Fermat's intuition that light propagates with a characteristic finite speed, and that it moves slower in denser media, came to his aid, and he saw that both the laws of reflection and refraction (as well as rectilinear motion in free space) could be derived from the same principle if, instead of light traveling along a path that minimizes the spatial distance, we suppose it travels along the path that minimizes the temporal distance, i.e., light follows the path to its destination that will take the least possible time. This conceptual step is fascinating for several reasons. For one thing, we don't know on what basis Fermat "intuited" that the speed of light is not only finite (which had never yet been demonstrated), but that it possesses a fixed characteristic speed (which it must if a law of “least time” is to have any meaning), and that the speed is lower in more dense media (precisely opposite the view of Descartes and subsequently Newton and Maupertuis). Furthermore, applying the principle of least time rather than least distance to the law of propagation of light clearly casts the propagation of light into the arena of four-dimensional spacetime, and it essentially amounts to an assertion that the laws of motion should be geodesic paths with a suitable spacetime metric. Thus, Fermat's optical principle can be seen as a remarkable premonition of important elements of both special and general relativity. To derive the law of refraction for a ray of light traveling through the boundary between two homogeneous media, Fermat argued that a ray traveling from point 1 to point 2 in the figure below would follow the path that minimized the total time of the journey.

Letting v1 denote the speed of light in medium 1, and v2 denote the speed of light in medium 2, the total time of the journey is d1/v1 + d2/v2, which can be written in terms of the unknown x as

Differentiating with respect to x gives

Setting this to zero gives the relation

which is equivalent to Snell's law n1sin(θ1) = n2 sin(θ2) provided we assume the refractive index of a medium is proportional to the inverse of the velocity of light in that medium. Of course, since calculus hadn't been invented yet, Fermat's solution of the problem involved considerably more labor (and ingenuity) than shown above, but eventually he arrived at this result, which surprisingly was experimentally indistinguishable from the formula arising from Descartes' derivation, despite the fact that it was based on an opposite set of assumptions, namely, that the velocity (or the "force") of light in a given medium is directly proportional to the refractive index of that medium! It may seem strange that two opposite hypotheses as to the speed of light should lead to the same empirical result, but in fact without the ability to directly measure the speed of light in various media we cannot tell from the refractivities of materials whether the index is proportional to velocity or to the reciprocal of velocity. Even though both assumptions lead to the same law of refraction, the dispute over the correct derivation of this law continued unabated, because each side regarded the other side's interpretation as a travesty of science. Among those who believed light travels faster in denser media are Hooke and Newton, whereas Huygens derived the law of refraction based on his wave theory of light (see Section 8.9) and concluded that Fermat's hypothesis was correct, i.e., the speed of light was less in denser media. More than a century later (around 1747) Maupertuis applied his "principle of least action" to give an elegant (albeit spurious) derivation of Snell's law from the hypothesis that light travels faster in denser media. Maupertuis believed that the wisdom and economy of God is manifest in all the operations of nature, which necessarily proceed from start to finish in just such a way as to minimize the "quantity of action". In a sense, this is closely akin to Fermat's principle of least time, since they are both primitive examples of what we would now call the calculus of variations. However, Maupertuis developed an allencompassing view of his "least action" principle, with mystical and religious implications, and he argued that it was the universal governing principle in all areas of physics, including mechanics, optics, thermodynamics, and all other natural processes. Of course, the notion that the phenomena of nature must follow the "best possible" course was not new. Plato's Phaedo quotes Socrates as saying If then one wished to know the cause of each thing, why it comes to be or perishes or exists, one had to find what was the best way for it to be, or to be acted upon, or to act. On these premises then, it befitted a man to investigate only, about this and other things, what is best... he would tell me, first, whether the earth is flat or round, and then explain why it is so of necessity, saying which is better, and that it

was better to be so... I was ready to find out in the same way about the sun and the moon and the other heavenly bodies, about their relative speed, their turnings, and whatever else happened to them, how it is best that each should act or be acted upon... The innovation of Maupertuis was to suggest a quantitative measure for the vague notion of "what is best" for physical all processes, and to demonstrate that this kind of reasoning can produce valid quantitative results in a wide range of applications. His proposal was to minimize the product of mass, velocity, and displacement. (Subsequently Lagrange clarified this by defining the action of a system as the spatial path integral of the product of mass and velocity.) For a system whose mass does not change, Maupertuis regarded the action as simply proportional to the product of velocity and distance traveled. To derive the law of refraction for a ray of light traveling through the boundary between two homogeneous media, Maupertuis argued that a ray traveling from point 1 to point 2 in the figure above would follow the path that minimized the total "action" v1d1 + v2d2 of the journey. This is identical to the quantity that Fermat minimized, except that the speeds appear in place of their reciprocals. Since v1 and v2 are constants, the differentiation proceeds as before, except for the inverted speed constants, and we arrive at the relation

which is equivalent to Snell's law n1sin(θ1) = n2 sin(θ2) provided we assume the refractive index of a medium is proportional to the velocity of light in that medium, more or less consistent with the views of Descartes, Hooke, Newton. Since the deviation of the refractive index from unity is known empirically to be roughly proportional to the density of the medium, this would imply that light travels faster in denser media, which Newton and the others found quite plausible. No amount of experimenting with the relative refractions of various media would suffice to distinguish between these two possibilities (the refractive index being proportional to the velocity or the reciprocal velocity). Only a direct measurement of the speed of light in two media with different refractive indices could accomplish this. Such a measurement was not achieved until 1850, when Focault passed rays of light through a tube, and by using a rapidly rotating mirror was able to show conclusively that light takes longer to traverse the tube when it is filled with water than when filled with air. So, after 200 years of theorizing and speculation, the question was finally settled in favor of Fermat and Huygens, i.e., the index of refraction is inversely proportional to the speed of light in the medium. It's worth noting that although Fermat was closer to the truth, his principle of "least time" is not strictly correct, because the modern formulation of "Fermat's Principle" states that light travels along a path for which the time is stationary, (i.e., such that slight transverse changes in the path don't affect its length), not necessarily minimal. In fact, it may even be maximal, as can be verified by looking at yourself in the concave surface of a shiny spoon. The "reason" that light prefers stationary paths can be found in the theory of quantum electrodynamics and Feynman's "sum over all paths" interpretation, which

shows that if neighboring paths take different amounts of time, the neighboring rays arrive at the destination out of phase, and cancel each other out, whereas they reinforce each other if the neighboring paths take the same amount of time, or differ by some whole multiple of the wave. A stark demonstration of this is given by diffraction gratings, in which the canceling regions of a mirror are scraped away, resulting in reflective properties that violate Hero's law of equal angles. The modified version of Fermat’s Principle (requiring stationary rather than minimal paths) has proven to be a remarkably useful approach to the formulation of all kinds of physical problems involving motion and change. Also, subsequent optical experiments confirmed Fermat’s intuition that the index of refraction for a given medium was inversely proportional to the (phase) velocity v of light in the medium. The modern definition of the refractive index is n = c/v, where the constant of proportionality c is the speed of light in a vacuum. (The fact that Fermat and Descartes could reach identical conclusions, even though one assumed the index of refraction was proportional to v while the other assumed it was proportional to 1/v is less surprising when we recall that this is precisely the crucial symmetry for relativistic velocity-composition, as described in Section 1.8.) In any case, it's clear that Fermat's model of optics based on his principle of least time, when interpreted as a metrical theory, entails or suggests many of the important elements of the modern theory of relativity, including the fundamental assumption of a characteristic speed of light for each medium, the concept of a unified space-time as the effective arena of motion, and the assumption that natural motions follow geodesic paths.

3.5 A Quintessence of So Subtle A Nature For his art did express A quintessence even from nothingness, From dull privations and lean emptiness; He ruined me, and I am re-begot Of absence, darkness, death; things which are not. John Donne, 1633 Descartes (like Aristotle before him) believed that nature abhors a vacuum, and insisted that the entire universe, even regions that we commonly call "empty space", must be filled with (or, more precisely, must consist of) some kind of substance. He believed this partly for philosophical reasons, which might be crudely summarized as "empty space is nothingness, and nothingness doesn't exist". He held that matter and space are identical and co-extant (ironically similar to Einstein's later notion that the gravitational field is identical with space). In particular, Descartes believed an all-pervasive substance was necessary to account for the propagation of light from the Sun to the Earth (for example), because he rejected any kind of "action at a distance", and he regarded direct mechanical contact (taken as a primitive operation) as the only intelligible means by which two

objects can interact. He conceived of light as a kind of pressure, transmitted instantaneously from the source to the eye through an incompressible intervening medium. Others (notably Fermat) thought it more plausible that light propagated with a finite velocity, which was corroborated by Roemer's 1675 observations of the moons of Jupiter. The discovery of light's finite speed was a major event in the history of science, because it removed any operational means of establishing absolute simultaneity. The full significance of this took over two hundred years to be fully appreciated. More immediately, it was clear that the conception of light as a simple pressure was inadequate to account for the different kinds of light, i.e., the phenomenon of color. To remedy this, Robert Hooke suggested that the (longitudinal) pressures transmitted by the ether may be oscillatory, with a frequency corresponding to the color. This conflicted with the views of Newton, who tended to regard light as a stream of particles in an empty void. Huygens advanced a fairly well-developed wave theory, but could never satisfactorily answer Newton's objections about the polarization of light through certain crystals ("Iceland spar"). This difficulty, combined with Newton's prestige, made the particle theory dominant during the 1700's, although many people, notably Jean Bernoulli and Euler, held to the wave theory. In 1800 Thomas Young reconciled polarization with the wave theory by postulating the light actually consists of transverse rather than longitudinal waves, and on this basis along with Fresnel's explanation of diffraction in terms of waves - the wave theory gained wide acceptance. However, Young's solution of the polarization problem immediately raised a new one, namely, how a system of transverse waves could exist in the ether, which had usually been assumed to be akin to a tenuous gas or fluid. This prompted generations of physicists, including Navier, Stokes, Kelvin, Malus, Arago, and Maxwell to become actively engaged in attempts to explain optical phenomena in terms of a material medium; in fact, this motivated much of their work in developing the equations of state for elastic media, which have proven to be so useful for the macroscopic treatment of fluids. However, despite the fruitfulness of this effort for the development of fluid dynamics, no one was ever able to accurately account for all optical and electromagnetic phenomena in terms of the behavior of an ordinary fluid medium, with or without viscosity and/or compressibility. There were a number of reasons for this failure. First, an ordinary fluid (even a viscous fluid) can't sustain shear stresses at rest, so it can propagate only longitudinal waves, as opposed to the transverse wave structure of light implied by the phenomenon of polarization. This implies either that the luminiferous ether must be a solid, or else we must postulate some kind of persistent dynamics (such as vortices) in the fluid so that it can sustain shear stresses. Unfortunately, both of these alternatives lead to difficulties. The assumption of a solid ether is difficult to reconcile with the fact that the equations of state for ordinary elastic solids always yield longitudinal waves accompanying any transverse waves - typically with different velocities. Such longitudinal disturbances are never observed with respect to optical phenomena. On the other hand, the assumption of a fluid ether with persistent flow patterns to sustain the required shear stresses entails a highly coordinated and organized system of flow cells that could persist only with the

active participation of countless tiny “Maxwell demons” working furiously at each point to sustain it. Lacking this, the vortices are inherently unstable (even in an ideal perfect inviscid fluid, in which vorticity is strictly conserved), so these flow cells could not exert the effects on ordinary matter that they must if they are to serve as the mechanism of electromagnetic forces. Even the latter-day concept of an ether consisting of a superfluid (i.e., the viscosity-free quantum hydrodynamical state achieved by some substances such as helium when cooled to near absolute zero) faces the same problem of sustaining its specialized state while simultaneously interacting with ordinary matter in the required ways. As Maxwell acknowledged No theory of the constitution of the ether has yet been invented which will account for such a system of molecular vortices being maintained for an indefinite time without their energy being gradually dissipated into that irregular agitation of the medium which, in ordinary media, is called heat. Thus, ironically, the concept of transverse waves - proposed by Young and Fresnel as a means of accounting for polarization of light in term of a mechanical wave propagating in some kind of material ether - immediately led to considerations that ultimately undermined confidence in the physicality and meaningfulness of that ether. Even aside from the difficulty of accounting for exclusively transverse waves in a material medium, the idea of a substantial ether filling all of space had always faced numerous difficulties. For example, Newton had shown (in his demolition of Descartes' vortex theory) that the evidently drag-free motion of the planets and comets was flatly inconsistent with the presence of any significant density of interstitial fluid. This problem is especially acute when we remember that, in order to account for the high speed of light, the density and rigidity of the putative ether must be far greater than that of steel. Serious estimates of the density of the ether varied widely, but ran as high as 1000 tons per cubic millimeter. It is then necessary to explain the interaction between this putative material ether and all other known substances. Since the speed of light changes in different material media, there is clearly a significant interaction, and yet apparently this interaction does not involve any appreciable transfer of ordinary momentum (since otherwise the unhindered motions of the planets are inexplicable). One interesting suggestion was that it might be possible to account for the absence of longitudinal waves by hypothesizing a fluid that possesses vanishingly little resistance to compression, but extremely high rigidity with respect to transverse stresses. In other words, the shear stresses are very large, while the normal stresses vanish. The opposite limit is easy to model with the Navier-Stokes equation by setting the viscosity to zero, which gives an ideal non-viscous fluid with no shear stresses and with the normal stresses equal to the pressure. However, we can't use the ordinary Navier-Stokes equations to represent a substance of high viscosity and zero pressure, because this would simply zero density, and even if we postulate some extremely small (but non-zero) pressure, the normal stresses in the Navier-Stokes equations have components that are proportional to the viscosity, so we still wouldn't be rid of them. We'd have to postulate some kind of adaptively non-isotropic viscosity, and then we wouldn't be dealing with anything that

could reasonably be called an ordinary material substance. As noted above, the intense efforts to understand the dynamics of a hypothetical luminiferous ether fluid led directly to modern understanding of fluid dynamics, as modeled by the Navier-Stokes equation for fluids of arbitrary viscosity and compressibility. This equation can be written in vector form as

where p is the pressure, ρ is the density, F the external force vector (per unit mass), ν is the kinematic viscosity, and V is the fluid velocity vector. If the fluid is incompressible then the divergence of the velocity is zero, so the last term vanishes. It’s interesting to consider whether anything can be inferred about the vacuum from this equation. By definition, a vacuum has vanishing density, pressure, and viscosity - at least in the ordinary senses of those terms. Setting these quantities to zero, and in the absence of any external force F, the above equation reduces to dV/dt = p/ρ. Since both p and ρ are to equal zero, this equations can only be evaluated on the basis of some functional relationship between those two variables. For example, we may assume the ideal gas law, p = ρRT where R is the gas constant and T is temperature. In that case we can evaluate the limit of p/ρ as p and ρ approach zero to give

This rather ghostly proposition apparently describes the disembodied velocity and temperature of a medium possessing neither density nor heat capacity. In a sense it is a medium of pure form and no substance. Of course, this is physically meaningless unless we can establish a correspondence between the terms and some physically observable effects. It was hoped by Stokes, Maxwell, and others that some such identification of terms might enable a limiting case of the Navier-Stokes equation to represent electromagnetic phenomena, but the full delineation of Maxwell's equations for electromagnetism make it clear that they do not describe the movement of any ordinary material substance, which of course was the basis for Navier-Stokes equation. Another interesting suggestion was that the luminiferous ether might consist of a substance whose constituent parts, instead of resisting changes in their relative distances (translation), resist changes in orientation. A theory along these lines was proposed by MacCullagh in 1839, and actually led to some of the same formulas as Maxwell's electromagnetic theory. This is an intriguing fact, but it doesn't represent an application (or even an adaptation) of the equations of motion for either an ordinary elastic substance, whether gas, fluid, or solid. It's more properly regarded as an abstract mathematical model with only a superficial resemblance to descriptions of the behavior of material substances. Some of the simplest material ether theories were ruled out simply on the basis of first-

order optical phenomena, especially stellar aberration. For example, Stokes' theory of complete convection could correctly model aberration (to first order) only with a set of special hypotheses as to the propagation of light, hypotheses that Lorentz later showed to be internally inconsistent. (Stokes erroneously assumed the velocity of a potential flow stream around a sphere is zero at the sphere’s surface.) Fresnel's theory of partial convection was (more or less) adequate, up until it became possible to measure secondorder effects, at which point it too was invalidated. But regardless of their empirical failures, none of these theories really adhered to the laws of ordinary fluid mechanics. William Thomson (Lord Kelvin), who was perhaps the most persistent of all in the attempt to represent electromagnetic phenomena in terms of the mechanics of ordinary macroscopic substances, aptly summarized the previous half-century of progress in this line of research at a jubilee in his honor in 1896: One word characterizes the most strenuous efforts for the advancement of science that I have made perseveringly during fifty-five years; that word is failure. I know no more of electric and magnetic force, or of the relation between ether, electricity, and ponderable matter… than I knew… fifty years ago. We might think this assessment was too harsh, especially considering that virtually the entire science of classical electromagnetism - based on Maxwell’s equations - was developed during the period in question. However, in the course of this development Maxwell and his followers had abandoned the effort to find mechanical analogies, and Kelvin equated progress with finding a mechanical analogy. The failure to find any satisfactory mechanical model for electromagnetism led to the abandonment of the principle of qualitative similarity, which is to say, it led to the recognition that the ether must be qualitatively different from ordinary substances. This belief was firmly established once Maxwell showed that longitudinal waves cannot propagate through transparent substances or free space. In so doing, he was finally able to show that all electromagnetic and optical phenomena can be explained by a single system of "stresses in the ether", which, however, he acknowledged must obey quite different laws than do the elastic stresses in ordinary material substances. E. T. Whittaker’s book “Aether and Electricity” includes a review of the work of Kelvin and others to find a mechanical model of the ether concludes that Towards the close of the nineteenth century… it came to be generally recognized that the aether is an immaterial medium, sui generis, not composed of identifiable elements having definite locations in absolute space. Thus by the time of Lorentz it had become clear that the "ether" was simply being arbitrarily assigned whatever formal (and often non-materialistic) properties it needed in order to make it compatible with the underlying electromagnetic laws, and therefore the "corporeal" ether concept was no longer exerting any positive heuristic benefit, but was simply an archaic appendage that was being formalistically superimposed on top of the real physics for no particular reason. Moreover, although the Navier-Stokes equation is as important today for fluid dynamics

as Maxwell's equations are for electrodynamics, we've also come to understand that real fluids and solids are not truly continuous media. They actually consist of large numbers of (more or less) discrete particles. As it became clear that the apparently continuous dynamics of fluids and solids were ultimately just approximations based on an aggregate of more primitive electromagnetic interactions, the motivation for trying to explain the latter as an instance of the former came to be seriously questioned. It is rather like saying gold consists of an aggregate of sub-atomic particles, and then going on to say that those sub-atomic particles are made of gold! The effort to explain electromagnetism in terms of a material fluid such as we observe on a macroscopic level, when in fact the electromagnetic interaction is a much more primitive phenomenon, appears today to have been fundamentally misguided, an attempt to model a low-level phenomenon as an instance of a higher level phenomenon. During the last years of the 19th century a careful and detailed examination of electrodymanic phenomena enabled Lorentz, Poincare, and others to develop a theory of the electromagnetic ether that accounted for all known observations, but only by concluding that "the ether is undoubtedly widely different from all ordinary matter". This is because, in order to simultaneously account for aberration, polarization and transverse waves, the complete absence of longitudinal waves, and the failure of the Michelson/ Morley experiment to detect any significant ether drift, Lorentz was forced to regard the ether as strictly motionless, and yet subject to non-vanishing stresses, which is contradictory for ordinary matter. Even in Einstein's famous essay on "The Ether and Relativity" he points out that although "we may assume the existence of an ether, we must give up ascribing a definite state of motion to it, i.e. we must take from it the last mechanical characteristic...". He says this because, like Lorentz, he understood that electromagnetic phenomena simply do not conform to the behavior of disturbances is any ordinary material substance - solid, liquid, or gas. Obviously if we wish to postulate some new kind of “substance” whose properties are not constrained to be those of an ordinary substance, we can "back out" whatever properties are needed to match the equations of any field theory (which is essentially what Lorentz did), but this is just an exercise in re-stating the equations in ad hoc verbal terms. Such a program has no heuristic or explanatory content. The question of whether electromagnetic phenomena could be accurately modeled as disturbances in an ordinary material medium was quite meaningful and deserved to be explored, but the answer is unequivocally that the phenomena of electromagnetism do not conform to the principles governing the behavior of ordinary material substances. In fact, we now understand that the latter are governed by the former, i.e., elementary electromagnetic interactions underlie the macroscopic behavior of ordinary material substances. We shouldn't conclude this review of the ether without hearing Maxwell on the subject, since he devoted his entire treatise on electromagnetism to it. Here is what he says in the final article of that immense work: The mathematical expressions for electrodynamic action led, in the mind of Gauss, to the conviction that a theory of the propagation of electric action [as a

function of] time would be found to be the very keystone of electrodynamics. Now, we are unable to conceive of propagation in time, except either as the flight of a material substance through space, or as the propagation of a condition of motion or stress in a medium already existing in space... If something is transmitted from one particle to another at a distance, what is its condition after it has left the one particle and before it has reached the other? ...whenever energy is transmitted from one body to another in time, there must be a medium or substance in which the energy exists after it leaves one body and before it reaches the other, for energy, as Torricelli remarked, 'is a quintessence of so subtle a nature that it cannot be contained in any vessel except the inmost substance of material things'. Hence all these theories lead to the conception of a medium in which the propagation takes place, and if we admit this medium as an hypothesis, I think it ought to occupy a prominent place in our investigations, and that we ought to endeavour to construct a mental representation of all the details of its action, and this has been my constant aim in this treatise. Surely the intuitions of Gauss and Torricelli have been vindicated. Maxwell's dilemma about how the energy of light "exists" during the interval between its emission and absorption was resolved by the modern theory of relativity, according to which the absolute spacetime interval between the emission and absorption of a photon is identically zero, i.e., photons are transmitted along null intervals in spacetime. The quantum phase of events, which we identify as the proper time of those events, does not advance at all along null intervals, so, in a profound sense, the question of a photon's mode of existence "after it leaves one body and before it reaches the other" is moot (as discussed in Section 9). Of course, no one from Torricelli to Maxwell imagined that the propagation of light might depend fundamentally on the existence of null connections between distinct points in space and time. The Minkowskian structure of spacetime is indeed a quintessence of a most subtle nature. 3.6 The End of My Latin Leaving the old, both worlds at once they view That stand upon the threshold of the new. Edmund Waller, 1686 In his book "The Theory of Electrons" (1909) Hendrik Lorentz wrote Einstein simply postulates what we have deduced, with some difficulty and not altogether satisfactorily, from the fundamental equations of the electromagnetic field. This statement implies that Lorentz's approach was more fundamental, and therefore contained more meaningful physics, than the explicitly axiomatic approach of Einstein. However, a close examination of Lorentz's program reveals that he, no less than Einstein, simply postulated relativity. To understand what Lorentz actually did - and did not -

accomplish, it's useful to review the fundamental conceptual issues that he faced. Given any set of equations describing some class of physical phenomena with reference to a particular system of space and time coordinates, it may or may not be the case that the same equations apply equally well if the space and time coordinates of every event are transformed according to a certain rule. If such a transformation exists, then those equations (and the phenomena they describe) are said to be covariant with respect to that transformation. Furthermore, if those equations happen to be covariant with respect to a complete class of velocity transformations, then the phenomena are said to be relativistic with respect to those transformations. For example, Newton's laws of motion are relativistic, because they apply not only with respect to one particular system of coordinates x,t, but with respect to any system of coordinates x',t' related to the former system according to a complete set of velocity transformations of the form

From the time of Newton until the beginning of the 19th century many scientists imagined that all of physics might be reducible to Newtonian mechanics, or at least to phenomena that are covariant with respect to the same coordinate transformations as are Newton's laws, and therefore the relativity of Newtonian physics was regarded as complete, in the sense that velocity had no absolute significance, and each one of an infinite set of relatively moving coordinate systems, related by (1), was equally suitable for the description of all physical phenomena. This is called the principle of relativity, and it's important to recognize that it is just a hypothesis, similar to the principle of energy conservation. It is the result of a necessarily incomplete induction from our observations of physical phenomena, and it serves as a tremendously useful organizing principle, but only as long as it remains empirically viable. Admittedly we could regard complete relativity as a direct consequence of the principle of sufficient cause - within a conceptual framework of distinct entities moving in an empty void - but this is still a hypothetical proposition. The key point to recognize is that although we can easily derive the relativity of Newton's laws under the transformations (1), we cannot derive the correctness of Newton's laws, nor can we derive the complete relativity of physics from the presumptive relativity of the dynamics of material bodies. By the end of the 19th century the phenomena of electromagnetism had become wellenough developed so that the behavior of the electromagnetic field - at least on a macroscopic level - could be described by a set of succinct equations, analogous to Newton's laws of motion for material objects. According to the principle of relativity (in the context of entities in an empty void) it was natural to expect that these new laws would be covariant with the laws of mechanics. It therefore was somewhat surprising when it turned out that the equations which describe the electromagnetic field are not covariant under the transformations (1). Apparently the principle of complete relativity was violated. On the other hand, if mechanics and electromagnetism are really not corelativistic, it ought to be possible to detect the effects of an absolute velocity, whereas all attempts to detect such a thing failed. In other words, the principle of complete relativity of velocity continued to survive all empirical tests involving comparisons of the effects of

velocity on electromagnetism and mechanics, despite the fact that the (supposed) equations governing these two classes of phenomena were not covariant with respect to the same set of velocity transformations. At about this time, Lorentz derived the fact that although Maxwell's equations (taking the permissivity and permeability of the vacuum to be invariants) of the electromagnetic field are not covariant with respect to (1), they are covariant with respect to a complete set of velocity transformations, namely, those of the form

for a suitable choice of space and time units, where γ = (1v2)-1/2. This was a very important realization, because if the equations of the electromagnetic field were not covariant with respect to any complete set of velocity transformations, then the principle of relativity could only have been salvaged by the existence of some underlying medium. The situation would have been analogous to finding a physical process in which energy is not conserved, leading us to seek for some previously undetected mode of energy. Of course, even recognizing the covariance of Maxwell's equations with respect to (2), the principle of relativity was still apparently violated because it still appeared that mechanics and electromagnetism were incompatible. Recall that Lorentz took Maxwell's equations to be "the fundamental equations of the electromagnetic field" with respect to the inertial rest frame of the luminiferous ether. Needless to say, these equations were not logically derived from more fundamental principles, they were developed by a rational-inductive method whereby observed phenomena were analyzed into a small set of simple patterns, which were then formalized into mathematical expressions. Even the introduction of the displacement current was just a rational hypothesis. Admittedly the historical development of Maxwell's equations was guided to some extent by mechanistic analogies, but the mechanical world-view is itself a high-level conceptual framework based on an extensive set of abstract assumptions regarding dimensionality, space, time, plurality, persistent identities, motion, inertia, and various conservation laws and symmetries. Thus even if a completely successful mechanical model for the electromagnetic field existed, it would still be highly hypothetical. Moreover, it was already clear by 1905 that Maxwell's equations are not fundamental, since the simple wave model of electromagnetic radiation leads to the ultra-violet catastrophe, and in general cannot account for the micro-structure of radiation, leading to such things as the photo-electric effect and other quantum phenomena. (Having just completed a paper on the photo-electric effect prior to starting his 1905 paper on special relativity, Einstein was very much aware that Maxwell's equations were not fundamental, and this influenced his choice of foundations on which to base his interpretation of electrodynamics. It's worth noting that although Lorentz derived the transformations (2) from the full set of Maxwell's equations (with the permissivity and permeability interpreted as invariants), these transformations actually follow from just one aspect of Maxwell's equations, namely, the invariance of the speed of light. Thus from the

standpoint of logical economy, as well as to avoid any commitment to the fundamental correctness of Maxwell's equations, it is preferable to derive the Lorentz transformation from the minimum set of premises. Of course, having done this, it is still valuable to show that, as a matter of fact, Maxwell's equations are fully covariant with respect to these transformations. To summarize the progress up to this point, Lorentz derived the general transformations (2) relating two systems of space and time coordinates such that if an electromagnetic field satisfies Maxwell's equations with respect to one of the systems, it also satisfies Maxwell's equations with respect to the other. Now, this in itself certainly does not constitute a derivation of the principle of relativity. To the contrary, the fact that (2) is different from (1) leads us to expect that the principle of relativity is violated, and that it ought to be possible to detect effects of absolute velocity, or, alternatively, to detect some underlying medium that accounts for the difference between (2) and (1). Lorentz knew that all attempts to detect an absolute velocity (or underlying medium) had failed, implying that the principle of complete relativity was intact, so something was wrong with the formulations of the laws of electromagnetism and/or the laws of mechanics. Faced with this situation, Lorentz developed his "theorem of corresponding states", which asserts that all physical phenomena transform according to the transformation law for electrodynamics. This "theorem" is equivalent to the proposition that physics is, after all, completely relativistic. Since Lorentz presented this as a "theorem", it has sometimes misled people (including, to an extent, Lorentz himself) into thinking that he had actually derived relativity, and that, therefore, his approach was more fundamental or more constructive than Einstein's. However, an examination of Lorentz's "theorem" reveals that it was explicitly based on assumptions (in addition to the false assumption that Maxwell's equations are the fundamental equations of the electromagnetic field) which, taken together, are tantamount to the assumption of complete relativity. The key step occurs in §175 of The Theory of Electrons, in which Lorentz writes We are now prepared for a theorem concerning corresponding states of electromagnetic vibration, similar to that of §162, but of a wider scope. To the assumptions already introduced, I shall add two new ones, namely (1) that the elastic forces which govern the vibratory motions of the electrons are subjected to the relation [300], and (2) that the longitudinal and transverse masses m' and m" of the electrons differ from the mass m0 which they have when at rest in the way indicated by [305]. Lorentz's equation [300] is simply the transformation law for electromagnetic forces, and his equations [305] give the relativistic expressions for the transverse and longitudinal masses of a particle. Lorentz has previously presented these expressions as ...the assumptions required for the establishment of the theorem, that the systems S and S0 can be the seat of molecular motions of such a kind that, in both, the effective coordinates of the molecules are the same function of the effective time.

In other words, these are the assumptions required in order to make the theorem of corresponding states (i.e., the principle of relativity) true. Hence Lorentz simply postulates relativity, just as did Galileo and Einstein, and then backs out the conditions that must be satisfied by mechanical objects in order to make relativity true. Needless to say, if we assume these conditions, we can then easily prove the theorem, but this is tautological, because these conditions were simply defined as those necessary to make the theorem true. Not surprisingly, if someone just focuses on Lorentz's "proof", without paying attention to the assumptions on which it is based, he might be misled into thinking that Lorentz derived relativity from some more fundamental considerations. This arises from confusion over what Lorentz was actually doing. He was primarily deriving the velocity transformations with respect to which Maxwell's equations are covariant, after which he proceeded to determine how the equations of mechanics would need to be modified in order for them to be covariant with respect to these same transformations. He did not derive the necessity for mechanics to obey these revised laws, any more than Einstein or Newton did. He simply assumed it, and indeed he had no choice, because the laws of mechanics do not follow from the laws of electromagnetism. Why, then, does the myth persist (in some circles) that Lorentz somehow derived relativity? To answer this question, we need to examine Lorentz's derivation of the theorem of corresponding states in greater detail. First, Lorentz justified the contraction of material objects in the direction of motion (with respect to the ether frame) on the basis of his "molecular force hypothesis", which asserts that the forces responsible for maintaining stable configurations of matter transform according to the electromagnetic law. This can only be regarded as a pure assumption, rather than a conclusion from electromagnetism, for the simple reason that the molecular forces are necessarily not electromagnetic, at least not in the Maxwellian sense. Maxwell's equations are linear, and it is not possible to construct bound states from any superposition of linear solutions. Hence Lorentz's molecular force hypothesis cannot legitimately be inferred from electromagnetism. It is a sheer hypothesis, amounting to the simple assumption that all intrinsic mechanical aspects of material entities are covariant with electromagnetism. Second, and even more importantly, Lorentz justifies the applicability of the "effective coordinates" for the laws of mechanics of material objects by assuming that the inertial masses (both transverse and longitudinal) of material objects transform in the same way as do the "electromagnetic masses" of a charged particle arising from self-reaction. Admittedly it was once hoped that all inertial mass could be attributed to electromagnetic self-reaction effects, which would have provided some constructive basis for Lorentz's assumption, but we now know that only a very small fraction of the effective mass of an electron is due to the electromagnetic field. Again, it is simply not possible to account for bound states of matter in terms of Maxwellian electromagnetism, so it does not logically follow that the mechanics of material objects are covariant with respect to (2) simply because the electromagnetic field is covariant with respect to (2). Of course, we can hypothesize that this is case, but this is simply the hypothesis of complete physical relativity. Thus Lorentz did not in any way derive the fact that the laws of mechanics are covariant

with respect to the same transformations as are the laws of electromagnetism. He simply observed that if we assume they are (and if we assume every other physical effect, even those presently unknown to us, is likewise covariant), then we get complete physical relativity - but this is tautological. If all the laws of physics are covariant with respect to a single set of velocity transformations (whether they are of the form (1) or (2) or any other), then by definition physics is completely relativistic. The doubts about relativity that arose in the 19th century were due to the apparent fact that the laws of mechanics and the laws of electromagnetism were not covariant with respect to the same set of velocity transformations. Obviously if we simply assume that they are covariant with respect to the same transformations, then the disparity is resolved, but it's important to recognize that this represents just the assumption - not a derivation - of the principle of relativity. An alternative approach to preserving the principle of relativity would be to assume that electromagnetism and mechanics are actually both covariant with respect to the velocity transformations (1). This would necessitate modifications of Maxwell's equations, and indeed this was the basis for Ritz's emission theory. However, the modifications that Ritz proposed eventually led to conflict with observation, because according to the relativity based on (1) speeds are strictly additive and there is no finite upper bound on the speed of energy propagation. The failure of emission theories illustrates the important fact that there are two verifiable aspects of relativistic physics. The first is the principle of relativity itself, but this principle does not fully determine the observable characteristics of phenomena, because there is more than one possible relativistic pattern, and these patterns are observationally distinguishable. This is why relativistic physics is founded on two distinct premises, one being the principle of relativity, and the other being some empirical proposition sufficient to identify the particular pattern of relativity (Euclidean, Galilean, Lorentzian) that applies. Lorentz’s theorem of corresponding states represents the second of these premises, whereas the first is simply assumed, consistent with the apparent relativity of all observable phenomena. Einstein’s achievement in special relativity was essentially to show that Lorentz’s results (and more) actually follow unavoidably from just a small subset of his assumptions, and that these can be consistently interpreted as primitive aspects of space and time. The first published reference to Einstein's special theory of relativity appeared in a short note by Walter Kaufmann reporting on his experimental results involving the deflection of electrons in an electromagnetic field. Kaufmann's work was intended as an experimentum crucis for distinguishing between the three leading theories of the electron, those of Abraham, Bucherer, and Lorentz. In his note of 30 November 1905, Kaufmann wrote In addition there is to be mentioned a recent publication of Mr. A. Einstein on the theory of electrodynamics which leads to results which are formally identical with those of Lorentz's theory. I anticipate right away the general result of the measurements to be described in the following: the results are not compatible

with the Lorentz-Einstein fundamental assumptions. Kaufmann's results were originally accepted by most physicists as favoring the Abraham theory, but gradually people began to have doubts. Although the results disagreed with the Lorentz-Einstein model, the agreement with Abraham's theory was not particularly good either. This troubled Planck, so he conducted a careful analysis of Kaufmann's experiment and his analysis of the two competing theories. It was an interesting example of scientific "detective work" by Planck. Kaufmann in 1905 had measured nine characteristic deflections d1,d2,..,d9 for electrons passing though nine different field strengths. Then he had computed the nine values that would be predicted by Abraham's theory, and the nine values that would be predicted by Lorentz-Einstein. However, in order to derive the "predictions" from the theories for his particular experimental setup he needed to include an attenuation factor "k" on the electric field strength. This factor is actually quite a complicated function of the geometry of the plates and coils used to establish the electric field. Kaufamnn selected a particular value of "k" that he thought would be reasonable. Now, both the Abraham and the Lorentz-Einstein theory predicted the electron's velocity could never exceed c, but Planck noticed that Kaufmann's choice of k implied a velocity greater than c for at least one of the data points, and therefore was actually inconsistent with both theories. This caused Planck to suspect that perhaps Kaufmann's assumed value of k was wrong. Unfortunately the complexity of the experimental setup made it impossible to give a firm determination of the attenuation factor from first principles, but Planck was nevertheless able to extract some useful information from Kaufmann's data. Planck took the nine data points and "backed out" the values of k that would be necessary to make them agree with Abraham's theory. Then he did the same for the LorentzEinstein theory. All these values of k were well within the range of plausibility (given the uncertainty in the experimental setup), so nothing definite could be concluded, but Planck noted that the nine k-values necessary to match the Lorentz-Einstein theory to the measurements were all nearly equal, whereas the nine k-values necessary to match Abraham showed more variation. From this, one might actually infer a slight tilt in favor of the Lorentz-Einstein theory, simply by virtue of the greater consistency of k values. Naturally this inconclusive state of affairs led people to try to think of an experiment that would be more definitive. In 1908 Bucherer performed a variation of Kaufmann's experiment, but with an experimental setup taking Planck's analysis into account, so that uncertainty in the value of k basically "cancels out". Bucherer's results showed clear agreement with the Lorentz-Einstein theory and disagreed with the Abraham theory. Additional and more refined experiments were subsequently performed, and by 1916 it was clear that the experimental evidence did in fact support what Kaufmann had called "the Lorentz-Einstein fundamental assumptions". Incidentally, it's fascinating to compare the reactions of Lorentz, Poincare, and Einstein to Kaufmann's results. Lorentz was ready to abandon his entire model (and life's work)

since it evidently conflicted with this one experiment. As he wrote to Poincare in 1906, the length contraction hypothesis was crucial for the coherence of his entire theoretical framework, and yet Unfortunately my hypothesis of the flattening of electrons is in contradiction with Kaufmann's results, and I must abandon it. I am, therefore, at the end of my Latin. Poincare agreed that, in view of Kaufmann's results "the entire theory may well be threatened". It wasn't until the announcement of Bucherer's results that Lorentz regained confidence in his own theoretical model. Interestingly, he later cited those results as one of the main reasons for his eventual acquiescence with the relativity principle, noting that if Lorentz-covariance is actually as comprehensive as these experimental results show it to be, then the ether concept is entirely devoid of heuristic content. (On the other hand, he did continue to maintain that there were some benefits in viewing things from the standpoint of absolute space and time, even if we are not at present able to discern such things.) Einstein's reaction to Kaufmann's apparently devastating results was quite different. In a review article on relativity theory in 1907, Einstein acknowledged that his theory was in conflict with Kaufmann's experimental results, and he could find nothing wrong with either Kaufmann's experiment or his analysis, which seemed to indicate in favor of Abraham's theory over relativity. Nevertheless, the young patent examiner continued It will be possible to decide whether the foundations of the relativity theory correspond with the facts only if a great variety of observations is at hand... In my opinion, both [the alternative theories of Abraham and Bucherer] have rather slight probability, because their fundamental assumptions concerning the mass of moving electrons are not explainable in terms of theoretical systems which embrace a greater complex of phenomena. A theory is the more impressive the greater the simplicity of its premises, the more different kinds of things it relates, and the more extended is its area of applicability. This is a remarkable defense of a scientific theory against apparent experimental falsification. While not directly challenging the conflict between experiment and theory, Einstein nevertheless maintained that we should regard relativity as most likely correct, essentially on the basis of it's scope and conceptual simplicity. Oddly enough, when later confronted with similar attempts to justify other people's theories, Einstein was fond of saying that "a theory should be as simple as the facts allow - but no simpler". Yet here we find him serenely confident that the "facts" rather than his theory will ultimately be overturned, which turned out to be the case. This sublime confidence in the correctness of certain fundamental ideas was a characteristic of Einstein throughout his career. When asked what he would have done if the eclipse observations had disagreed with the prediction of general relativity for the bending of light, Einstein replied "Then I would have felt sorry for the dear lord, because the theory is correct."

3.7 Zeno and the Paradox of Motion We may say a thing is at rest when it has not changed its position between now and then, but there is no ‘then’ in ‘now’, so there is no being at rest. Both motion and rest, then, must necessarily occupy time. Aristotle, 350 BC The Eleatic school of philosophers was founded by the religious thinker and poet Xenophanes (born c. 570 BC), whose main teaching was that the universe is singular, eternal, and unchanging. "The all is one." According to this view, as developed by later members of the Eleatic school, the appearances of multiplicity, change, and motion are mere illusions. Interestingly, the colony of Elea was founded by a group of Ionian Greeks who, in 545 BC, had been besieged in their seaport city of Phocaea by an invading Persian army, and were ultimately forced to evacuate by sea. They sailed to the island of Corsica, and occupied it after a terrible sea battle with the navies of Carthage and the Etruscans. Just ten years later, in 535 BC, the Carthagians and Etruscans regained the island, driving the Phocaean refugees once again into the sea. This time they landed on the southwestern coast of Italy and founded the colony of Elea, seizing the site from the native Oenotrians. All this happened within the lifetime of Xenophanes, himself a wandering exile from his native city of Colophone in Ionia, from which he too had been force to flee in 545 BC. He lived in Sicily and then in Catana before finally joining the colony at Elea. It's tempting to speculate on how these events may have psychologically influenced the Eleatic school's belief in permanent unalterable oneness, denying the reality of change and plurality in the universe. The greatest of the Eleatic philosophers was Parmenides (born c. 539 BC). In addition to developing the theme of unchanging oneness, he is also credited with originating the use of logical argument in philosophy. His habit was to accompany each statement of belief with some kind of logical argument for why it must be so. It's possible that this was a conscious innovation, but it seems more likely that the habitual rationalization was simply a peculiar aspect of his intellect. In any case, on this basis he is regarded as the father of metaphysics, and, as such, a key contributor to the evolution of scientific thought. Parmenides's belief in the absolute unity and constancy of reality is quite radical and abstract, even by modern standards. He maintained that the universe is literally singular and unchangeable. However, his rationalism forced him to acknowledge that appearances are to the contrary, i.e., while he flatly denied the existence of plurality and change, he admitted the appearance of these things. Nevertheless, he insisted these were mere perceptions and opinions, not to be confused with "what is". Not surprisingly, Parmenides was ridiculed for his beliefs. One of Parmenides' students was Zeno, who is best remembered for a series of arguments in which he defends the intelligibility of the Eleatic philosophy by purporting to prove, by logical means, that change (motion) and plurality

are impossible. We can't be sure how the historical Zeno intended his arguments to be taken, since none of his writings have survived. We know his ideas only indirectly through the writings of Plato, Aristotle, Simplicus, and Proclus, none of whom was exactly sympathetic to Zeno's philosophical outlook. Furthermore, we're told that Zeno's arguments were a "youthful effort", and that they were made public without his prior knowledge or consent. Also, even if we accept that his purpose was to defend the Eleatic philosophy against charges of logical inconsistency, it doesn't follow that Zeno necessarily regarded his counter-charges as convincing. It's conceivable that he intended them as satires of (what he viewed as) the fallacious arguments that had been made against Parmenides' ideas. In any case, although we cannot know for sure how Zeno himself viewed his "paradoxes", we can nevertheless examine the arguments themselves, as they've come down to us, to see if they contain - or suggest - anything of interest. Of the 40 arguments attributed to Zeno by later writers, the four most famous are on the subject of motion: The Dichotomy: There is no motion, because that which is moved must arrive at the middle before it arrives at the end, and so on ad infinitum. The Achilles: The slower will never be overtaken by the quicker, for that which is pursuing must first reach the point from which that which is fleeing started, so that the slower must always be some distance ahead. The Arrow: If everything is either at rest or moving when it occupies a space equal to itself, while the object moved is always in the instant, a moving arrow is unmoved. The Stadium: Consider two rows of bodies, each composed of an equal number of bodies of equal size. They pass each other as they travel with equal velocity in opposite directions. Thus, half a time is equal to the whole time. The first two arguments are usually interpreted as critiques of the idea of continuous motion in infinitely divisible space and time. They differ only in that the first is expressed in terms of absolute motion, whereas the second shows that the same argument applies to relative motion. Regarding these first two arguments, there's a tradition among some high school calculus teachers to present them as "Zeno's Paradox", and then "resolve the paradox" by pointing out that an infinite series can have a finite sum. This may be a useful pedagogical device for beginning calculus students, but it misses an interesting and important philosophical point implied by Zeno's arguments. To see this, we can reformulate the essence of these two arguments in more modern terms, and show that, far from being vitiated by the convergence of infinite series, they actually depend on the convergence of the geometric series. Consider a ray of light bouncing between an infinite sequence of mirrors as illustrated below

On the assumption that matter, space, and time are continuous and infinitely divisible (scale invariant), we can conceive of a point-like massless particle (say, a photon) traveling at constant speed through a sequence of mirrors whose sizes and separations decrease geometrically (e.g., by a factor of two) on each step. The envelope around these mirrors is clearly a wedge shape that converges to a point, and the total length of the zigzag path is obviously finite (because the geometric series 1 + 1/2 + 1/4 + ... converges), so the particle must reach "the end" in finite time. The essence of Zeno's position against continuity and infinite divisibility is that there is no logical way for the photon to emerge from the sequence of mirrors. The direction in which the photon would be traveling when it emerged would depend on the last mirror it hit, but there is no "last" mirror. Similarly we could construct "Zeno's maze" by having a beam of light directed around a spiral as shown below:

Again the total path is finite, but has no end, i.e., no final direction, and a ray propagating along this path can neither continue nor escape. Of course, modern readers may feel entitled to disregard this line of reasoning, knowing that matter consists of atoms which are not infinitely divisible, so we could never construct an infinite sequence of geometrically decreasing mirrors. Also, every photon has some finite scattering wavelength and thus cannot be treated as a "point particle". Furthermore, even a massless particle such as a photon necessarily has momentum according to the quantum and relativistic relation p = h/λ, and the number of rebounds per unit time – and hence the outward pressure on the structure holding the mirrors in place - increases to infinity as the photon approaches the convergent point. However, these arguments merely confirm Zeno's position that the physical world is not scale-invariant or infinitely divisible (noting that Planck’s constant h represents an absolute scale). Thus, we haven't debunked Zeno,

we've merely conceded his point. Of course, this point is not, in itself, paradoxical. It simply indicates that at some level the physical world must be regarded as consisting of finite indivisible entities. We arrive at Zeno's paradox only when these arguments against infinite divisibility are combined with the complementary set of arguments (The Arrow and The Stadium) which show that a world consisting of finite indivisible entities is also logically impossible, thereby presenting us with the conclusion that physical reality can be neither continuous nor discontinuous. The more famous of Zeno's two arguments against discontinuity is "The Arrow", which focuses on the instantaneous physical properties of a moving arrow. He notes that if physical objects exist discretely at a sequence of discrete instants of time, and if no motion occurs in an instant, then we must conclude that there is no motion in any given instant. (As Bertrand Russell commented, this is simply "a plain statement of an elementary fact".) But if there is literally no physical difference between a moving and a non-moving arrow in any given discrete instant, then how does the arrow know from one instant to the next if it is moving? In other words, how is causality transmitted forward in time through a sequence of instants, in each of which motion does not exist? It's been noted that Zeno's "Arrow" argument could also be made in the context of continuous motion, where in any single slice of time there is (presumed to be) no physical difference between a moving and a non-moving arrow. Thus, Zeno suggests that if all time is composed of instants (continuous or discrete), and motion cannot exist in any instant, then motion cannot exist at all. A naive response to this argument is to point out that although the value of a function f(t) is constant for a given t, the function f(t) may be non-constant at t. But, again, this explanation doesn't really address the phenomenological issue raised by Zeno's argument. A continuous function (as emphasized by Weierstrass) is a static completed entity, so by invoking this model we are essentially agreeing with Parmenides that physical motion does not truly exist, and is just an illusion, i.e., "opinions", arising from our psychological experience of a static unchanging reality. Of course, to accomplish this we have expanded our concept of "the existing world" to include another dimension. If, instead, we insist on adhering to the view of the entire physical world as a purely spatial expanse, existing in and progressing through a sequence of instants, then we again run into the problem of how a quality that exists only over a range of instants can be causally conveyed through any given instant in which it has no form of existence. Before blithely dismissing this concern as non-sensical, it's worth noting that modern physics has concluded (along with Zeno) that the classical image of space and time was fundamentally wrong, and in fact motion would not be possible in a universe constructed according to the classical model. We now recognize that position and momentum are incompatible variables, in the sense that an exact determination of either one of them leaves the other completely undetermined. According to quantum mechanics, the eigenvalues of spatial position are incompatible with the eigenvalues of momentum so, just as Zeno’s arguments suggest, it really is inconceivable for an object to have a definite position and momentum (motion) simultaneously.

The theory of special relativity answers Zeno's concern over the lack of an instantaneous difference between a moving and a non-moving arrow by positing a fundamental restructuring the basic way in which space and time fit together, such that there really is an instantaneous difference between a moving and a non-moving object, insofar as it makes sense to speak of "an instant" of a physical system with mutually moving elements. Objects in relative motion have different planes of simultaneity, with all the familiar relativistic consequences, so not only does a moving object look different to the world, but the world looks different to a moving object. This resolution of the paradox of motion presumably never occurred to Zeno, but it's no exaggeration to say that special relativity vindicates Zeno's skepticism and physical intuition about the nature of motion. He was correct that instantaneous velocity in the context of absolute space and absolute time does not correspond to physical reality, and probably doesn't even make sense. From Zeno's point of view, the classical concept of absolute time was not logically sound, and special relativity (or something like it) is a logical necessity, not just an empirical fact. It's even been suggested that if people had taken Zeno's paradoxes more seriously they might have arrived at something like special relativity centuries ago, just on logical grounds. This suggestion goes back at least to Minkowski's famous lecture of "staircase wit" (see Section 1.7). Doubtless it's stretching the point to say that Zeno anticipated the theory of special relativity, but it's undeniably true that his misgivings about the logical consistency of motion in it's classical form were substantially justified. The universe does not (and arguably, could not) work the way people thought it did. In all four of Zeno's arguments on motion, the implicit point is that if space and time are independent, then logical inconsistencies arise regardless of whether the physical world is continuous or discrete. All of those inconsistencies can be traced to the implication that, if any motion is possible, then the range of conceivable relative velocities must be unbounded, corresponding to Minkowski's "unintelligible" G. What is the alternative? Zeno considers the premise that the range of possible relative velocities is bounded, i.e., there is some maximum achievable (conceivable) relative velocity, and he associates this possibility with the idea that space and time are not infinitely divisible. (It presumably didn't occur to him that another way of achieving this is to assume space and time are not independent.) This brings us to the last of Zeno's four main arguments on motion, "The Stadium", which has always been the most controversial, partly because the literal translation of its statement is somewhat uncertain. In this argument Zeno appears to be attacking the only remaining alternative to the unintelligible G, namely, the possibility of a finite upper bound on conceivable velocity. It's fascinating that he argues in much the same way that modern students do when they're first introduced to the concept of an invariant speed in the theory of special relativity. He says, in effect, that if someone is running towards me from the west at the maximum possible speed, and someone else is approaching me from the east at the maximum possible speed, then they are approaching each other at twice the maximum possible speed...which is a contradiction.

To illustrate the relevance of Zeno's arguments to a discussion of the consequences of special relativity, compare the discussion of time dilation in Section 2.13 of Rindler's "Essential Relativity" with Heath's review of Zeno's Stade paradox in Chapter VIII of "A History of Greek Mathematics". The resemblance is so striking that it's tempting to imagine that either Rindler consciously patterned his discussion on some recollection of Zeno's argument, or it's an example of Jung's collective unconscious. Here is a reproduction of Rindler's Figure 2.4, showing three "snapshots of two sequences of clocks A, B, C,... and A', B', C', ... fixed at certain equal intervals along the x axes of two frames S and S':

These three snapshots are taken at equal intervals by an observer in a third frame S", relative to which S and S' have equal and opposite velocities. Rindler describes the values that must appear on each clock in order to explain the seemingly paradoxical result that each observer considers the clocks of the others to be running slow, in accord with Einsteinian relativity. Compare this with the figure on page 277 of Heath:

where again we have three snapshots of a sequence of clocks (i.e., observers/athletes), this time showing the reference frame S" as well as the two frames S and S' that are moving with equal and opposite velocities relative to S". As Aristotle commented, this scenario evidently led Zeno to the paradoxical conclusion that "half the time is equal to its double", precisely as the freshman physics student suspects when he first considers the implications of relativity. Surely we can forgive Zeno for not seeing that his arguments can only be satisfactorily answered - from the standpoint of physics - by assuming Lorentzian invariance and the relativity of space and time. According to this view, with it's rejection of absolute simultaneity, we're inevitably led from a dynamical model in which a single slice of space progresses "evenly and equably" through time, to a purely static representation in which the entire history of each worldline already exists as a completed entity in the plenum of spacetime. This static representation, according to which our perceptions of change and motion are simply the product of our advancing awareness, is strikingly harmonious with the teachings of Parmenides, whose intelligibility Zeno's arguments were designed to defend. Have we now finally resolved Zeno's "youthful effort"? Given the history of "final resolutions", from Aristotle onwards, it's probably foolhardy to think we've reached the end. It may be that Zeno's arguments on motion, because of their simplicity and universality, will always serve as a kind of "Rorschach image" onto which people can project their most fundamental phenomenological concerns (if they have any). 3.8 A Very Beautiful Day Such a solemn air of silence has descended between us that I almost feel as if I am committing a sacrilege when I break it now with some

inconsequential babble. But is this not always the fate of the exalted ones of this world? Einstein to Habicht, 25 May 1905 In 1894 Einstein's parents and younger sister Maja moved to Italy, where his father hoped to start a new business. It was arranged for Albert, then 15, to remain in Munich to complete his studies at the Gymnasium (high school), but the young lad soon either dropped out or was invited to leave (recollections differ). He then crossed the Alps to reunite with his family in Italy. Lacking a high school diploma, his options for further education were limited, but his father still hoped for him to become an electrical engineer, which required a university degree. It so happens that the Zurich Polytechnic Institute had an unusual admissions policy which did not require a high school diploma, provided the applicant could pass the entrance examination, so after a year off in Italy, the 16 year old Albert was dispatched to Zurich to take the exam. He failed, having made (as he later admitted) "no attempt whatsoever to prepare myself". In fairness, it should be noted that the usual age for taking the exam was 18, but it seems he wasn't particularly eager to (as his father advised) "forget his philosophical nonsense and apply himself to a sensible trade like electrical engineering". Fortunately, the principal of the Polytechnic noted the young applicant's unusual strength in mathematics, and helped make arrangements for Einstein to attend a cantonal school in the picturesque town of Aarau, twenty miles west of Zurich. The headmaster of the school was Professor Jost Winteler, an ornithologist. During his time in Aarau Einstein stayed with the Winteler family, and always had fond memories of the time he spent there, in contrast with what he regarded as the coercive atmosphere at the Munich Gymnasium. He became romantically involved with Marie Winteler (Jost's daughter), but seems to have been less serious about it than she was, and the relationship ended badly when Einstein took up with Mileva Maric. He also formed life-long relationships with two of the other Winteler children, Paul and Anna. Paul Winteler married Einstein's sister Maja, and Anna Winteler married Michelangelo Besso, one of Einstein's closest friends. Besso, six years older than Einstein, was a Swiss-Italian studying to be an electrical engineer. Like Einstein, he played the violin, and the two of them first met at a musical gathering in 1896. It was just a year earlier the 16 year old Einstein first wondered how the world would appear to someone traveling at the speed of light. He realized that to such an observer a co-moving lightwave in a vacuum would appear as a spatially fluctuating standing wave, i.e., a stationary wave of light, but it doesn't take an expert in Maxwell's equations to be skeptical that any such configuration is possible. Indeed, Einstein later recalled that "from the beginning it appeared to me intuitively clear" that light must propagate in the same way with respect to any system of inertial coordinates. However, this invariance directly contradicts the Galilean addition rule for the composition of velocities. This problem stayed with Einstein for the next ten years, during which time he finally gained entrance to the Polytechnic, and, to the disappointment of his family, switched majors from electrical engineering to physics. His friend Besso continued with his studies and became

an electrical engineer in Milan. Already by this time Einstein had turned from engineering to pure physics, and seems to have decided (or foreseen) how he would spend his life, as he wrote in an apologetic letter to Marie’s mother Pauline Winteler in the Spring of 1897 Strenuous intellectual work and looking at God’s Nature are the reconciling, fortifying, yet relentlessly strict angels that shall lead me through all of life’s troubles… And yet what a peculiar way this is to weather the storms of life – in many a lucid moment I appear to myself as an ostrich who buries his head in the desert sand so as not to perceive the danger. One creates a small little world for oneself, and as lamentably insignificant as it may be in comparison with the perpetually changing size of real existence, one feels miraculously great and important… Despite his love of physics, Einstein did not perform very impressively as an undergraduate in an academic setting, and this continued to be true in graduate school. Hermann Minkowski referred to his one-time pupil as a "lazy dog". As the biographer Clark wrote, "Einstein became, as far as the professorial staff of the ETH was concerned, one of the awkward scholars who might or might not graduate but who in either case was a great deal of trouble". Professor Pernet at one point suggested to Einstein that he switch to medicine or law rather than physics, saying "You can do what you like, I only wish to warn you in your own interest". Clearly Einstein "pushed along with his formal work just as much as he had to, and found his real education elsewhere". Often he didn't even attend the lectures, relying on Marcel Grossman's notes to cram for exams, making no secret of the fact that he wasn't interested in what men like Weber had to teach him. His main focus during the four years while enrolled at the ETH was independently studying the works of Kirchhoff, Helmholtz, Hertz, Maxwell, Poincare, etc., flagrantly outside the course of study prescribed by the ETH faculty. Some idea of where his studies were leading him can be gathered from a letter to his fellow student and future wife Mileva Maric written in August of 1899 I returned to the Helmholtz volume and am at present studying again in depth Hertz’s propagation of electric force. The reason for it was that I didn’t understand Helmholtz’s treatise on the principle of least action in electrodynamics. I am more and more convinced that the electrodynamics of moving bodies, as presented today, is not correct, and that it should be possible to present it in a simpler way. The introduction of the term “ether” into the theories of electricity led to the notion of a medium of whose motion one can speak without being able, I believe, to associate a physical meaning with this statement. I think that the electric forces can be directly defined only for empty space… Einstein later recalled that after graduating in 1900 the "coercion" of being forced to take the final exams "had such a detrimental effect that... I found the consideration of any scientific problem distasteful to me for an entire year". He achieved an overall mark of 4.91 out of 6, which is rather marginal. Academic positions were found for all members of the graduating class in the physics department of the ETH with the exception of

Einstein, who seems to have been written off as virtually unemployable, "a pariah, discounted and little loved", as he later said. From Milan in late August of 1900 Einstein wrote to his girlfriend, Mileva, and mentioned that I am spending many evening’s here at Michelle’s. I like him very much because of his sharp mind and his simplicity, and also Anna and, especially, the little brat. His house is simple and cozy, even though the details show some lack of taste… In another letter to Mileva, in October, he commented that his friend had intuited the blossoming romance between Einstein and Mileva (who had studied physics together at the Polytechnic) Michele has already noticed that I like you, because, even though I didn’t tell him almost anything about you, he said, when I told him that I must now go the Zurich again: “He surely wants to go to his [woman] colleague, what else would draw him to Zurich?” I replied “But unfortunately she is not there yet”. I prodded him very much to become a professor, but I doubt very much that he’ll do it. He simply doesn’t want to let himself and his family be supported by his father. This is after all quite natural. What a waste of his truly outstanding intelligence. The following April, in another “love letter” to Mileva, Einstein wrote about having just read Planck’s paper on radiation “with mixed feelings”, because “misgivings of a fundamental nature have arisen in my mind”. In the same letter he wrote Michele arrived with wife and child from Trieste the day before yesterday. He is an awful weakling without a spark of healthy humaneness, who cannot rouse himself to any action in life or scientific creation, but an extraordinarily fine mind, whose working, though disorderly, I watch with great delight. Yesterday evening I talked shop with him with great interest for almost 4 hours. We talked about the fundamental separation of luminiferous ether and matter, the definition of absolute rest, molecular forces, surface phenomena, dissociation. He is very interested in our investigations, even though he often misses the overall picture because of petty considerations. This is inherent in the petty disposition of his being, which constantly torments him with all kinds of nervous notions. Toward the end of 1901 Einstein had still found no permanent position. As he wrote to Grossman in December of that year, "I am sure I would have found a position [by now] were it not for Weber's intrigues against me". It was only because Grossman's father happened to be good friends with Haller, the chief of the Swiss Patent Office, that Einstein was finally given a job, despite the fact that Haller judged him to be "lacking in technical training". Einstein wrote gratefully to the Grossman's that he "was deeply moved by your devotion and compassion which do not let you forget an old, unlucky friend", and that he would spare no effort to live up to their recommendation. He had applied for Technical Expert 2nd class, but was given the rank of 3rd class (in June

1902). As soon as he'd been away from the coercive environment of academia long enough that he could stand once again to think about science, he resumed his self-directed studies, which he pursued during whatever free time a slightly lazy patent examiner can make for himself. His circumstances were fairly unusual for someone working on a doctorate, especially since he'd already been rejected for academic positions by both the ETH and the University of Zurich. He was undeniably regarded by the academic community (and others) as "an awkward, slightly lazy, and certainly intractable young man who thought he knew more than his elders and betters". In early 1905, while employed as a patent examiner in Bern, Einstein was striving to complete his doctoral thesis, focusing on black-body radiation, and at the same time writing a paper on light-quanta (later cited by the Nobel committee) and another on Brownian motion. Each of these was a tremendous contribution to 20th century physics, but one has the impression that Einstein was, in a sense, getting these duties out of the way, so that he could concentrate on the "philosophical nonsense" of the velocity addition problem, which he realized was "a puzzle not easy to solve at all". In other words, he realized that he couldn't count on being able to produce anything useful on this question, even though his attention was inexorably drawn to it. One imagines that he forced himself to complete the papers on statistical physics - in which he knew he had something to say - before allowing himself the luxury of focusing on the fascinating but possibly insoluble philosophical problem of motion. After completing the statistical papers on March 17, April 30, and May 10, 1905, he allowed himself to concentrate fully on the problem of motion, which apparently had never been far from his mind. As he later recalled, he "felt a great difficulty to resolve the question... I had wasted time almost a year in fruitless considerations..." Then came the great turning point, both for Einstein's own personal life and for modern physics: "Unexpectedly, a friend of mine in Bern then helped me." The friend was Michelangelo Besso, who had by then also taken a job at the Swiss patent office. In his Kyoto lecture of 1922 Einstein later remembered the circumstances of the unexpected help he received from Besso: That was a very beautiful day when I visited him and began to talk with him as follows: "I have recently had a question which was difficult for me to understand. So I came here today to bring with me a battle on the question." Trying a lot of discussions with him, I could suddenly comprehend the matter. Next day I visited him again and said to him without greeting "Thank you. I've completely solved the problem." It had suddenly become clear to Einstein during his discussion with Besso that the correlation of time at different spatial locations is not absolutely defined, since it depends fundamentally on some form of communication between those locations. Thus, the concept of simultaneity at separate locations is relative. A mere five weeks after this recognition, Einstein completed "On the Electrodynamics of Moving Bodies", in which

he presented the special theory of relativity. This monumental paper contains not a single reference to the literature, and only one acknowledgement: In conclusion, I wish to say that in working at the problem here dealt with I have had the loyal assistance of my friend and colleague M. Besso, and that I am indebted to him for several valuable suggestions. We don't know precisely what those suggestions were, but we have Einstein's later statement that he "could not have found a better sounding board for his ideas in all of Europe." It was Besso also introduced Einstein to the writings of Ernst Mach, which were to have such a profound influence on the development of the general theory (although subsequently Einstein emphasized the influence of Hume over Mach). Besso selfdeprecatingly described their intellectual relationship by saying "Einstein the eagle took Besso the sparrow under his wing, and the sparrow flew a little higher". The two men carried on a regular correspondence that lasted over half a century, through two world wars, and Einstein's incredible rise to world fame. It’s interesting that, despite how highly Einstein valued Besso’s intellect, the latter invariably took a self-denigrating tone in their correspondence (and presumably in their conversations), sometimes even seeming to be genuinely puzzled by the significance that Einstein attached to his “little” comments. In a letter of August 1918 Besso wrote You had, by the way, overestimated the meaningfulness of my observations again: I was not aware that they had the meaning that an energy tensor for gravitation was dispensable. If I understand it correctly, my inadvertent statement now implies that planetary motion would satisfy conservation laws just by chance, as it were. What is certain is that I was not aware of this consequence of my comments and cannot grasp the argument even now. The friendship with Besso may have been, in some ways, the most meaningful of Einstein's life. Michael and his wife sometimes took care of Einstein's children, tried to reconcile Einstein with Mileva when their marriage was foundering, and so on. Another of the few close personal ties that Einstein was able to maintain over the years was with Max von Laue, who Einstein believed was the only one of the Berlin physicists who behaved decently during the Nazi era. Following the war, a friend of Einstein's was preparing to visit Germany and asked if Einstein would like him to convey any messages to his old friends and colleagues. After a moment of thought, Einstein said "Greet Laue for me". The friend, trying to be helpful, then asked specifically about several other individuals among Einstein's former associates in his homeland. Einstein thought for another moment, and said "Greet Laue for me". The stubborn, aloof, and uncooperative aspect of Einstein's personality that he had shown as a student continued to some extent throughout his life. For example, in 1937 he collaborated with Nathan Rosen on a paper purporting to show, contrary to his own prediction of 1916, that gravitational waves cannot exist - at least not without unphysical singularities. He submitted this paper to Physical Review, and it was returned to him with a lengthy and somewhat critical referee report asking for clarifications. Apparently

Einstein was unfamiliar with the refereeing of papers, routinely practiced by American academic journals. He wrote back to the editor Dear Sir, We (Mr. Rosen and I) had sent you our manuscript for publication and had not authorized you to show it to specialists before it is printed. I see no reason to address the - in any case erroneous - comments of your anonymous expert. On the basis of this incident I prefer to publish the paper elsewhere. respectfully, P.S. Mr. Rosen, who has left for the Soviet Union, has authorized me to represent him in this matter. Was the postscript about Mr. Rosen's departure to the Soviet Union (in the politically charged atmosphere of the late 1930's) an oblique jibe at American mores, or just a bland informational statement? In any case, Einstein submitted the paper, unaltered, to another journal (The Journal of the Franklin Institute). However, before it appeared he came to realize that its argument was faulty, causing him to re-write the paper and its conclusions. Interestingly, what Einstein had realized is precisely what the anonymous referee had pointed out, namely, that by a change of coordinates the construction given by Einstein and Rosen was simply a description of cylindrical waves, with a singularity only along the axis (thus considered to be an acceptable singularity). The referee report still exists among Einstein's private papers, although it isn't clear if the correction was prompted by the Physical Review's referee report. (The correction may also have been prompted by private comments from Howard Percy Robertson (via Infeld) who had just returned to Princeton from sabbatical. On the other hand, these two possibilities may amount to the same thing, since Kennefick speculates that Robertson was the anonymous referee!) Another aspect of Einstein's personality that seems incongruous with scholarly success was his remarkable willingness to make mistakes in public and change his mind about things, with seemingly no concern for the effect this might have on his academic credibility. Regarding the long succession of "unified field theories" that Einstein produced in the 1920's and 30's, Pauli commented wryly "It is psychologically interesting that for some time the current theory is usually considered by its author to be the 'definitive solution'". Eventually Einstein gave up on the particular approach to unification that he had been pursuing in those theories, and cheerfully wrote to Pauli "You were right after all, you rascal". Lest we think that this willingness to make and admit mistakes was a characteristic only of the aged Einstein, past his prime, recall Einstein's wry self-description in a letter to Ehrenfest in December 1915: "That fellow Einstein suits his convenience. Every year he retracts what he wrote the year before." In 1939 Einstein's sister Maja Winteler, was forced by Mussolini's racial policies to leave Florence. She went to Princeton to join her brother while Paul moved in with his sister Anna and Michel Besso's family in Geneva. Maja and Paul never saw each other again. In 1946, after the war, they began making plans to reunite in Geneva, but Maja suffered a stroke, and thereafter remained bedridden until her death in 1951. To Besso in 1954, nearly 50 years after their discussion in the patent office, Einstein wrote:

I consider it quite possible that physics cannot be based on the field principle, i.e., on continuous structures. In that case, nothing remains of my entire castle in the air, gravitation theory included..." In March of the following year, Michelangelo Besso died at his home in Geneva. Einstein wrote to the Besso family "Now he has gone a little ahead of me in departing from this curious world". Einstein died three weeks later, on April 18, 1955. 3.9 Constructing the Principles In mechanics as reformed in accordance with the world-postulate, the disturbing lack of harmony between Newtonian mechanics and modern electrodynamics disappears of its own accord. H. Minkowski, 1907

The general public took little notice of the special theory of relativity when it first appeared 1905, but following the sensational reports of the eclipse observations of 1919 Einstein instantly became a world-wide celebrity, and there was suddenly intense public interest in everything having to do with “Einstein’s theory”. The London Times asked him to explain his mysterious theory to its readers. He accommodated with a short essay that is notable for its description of what he regarded as two fundamentally different kinds of physical theories. He wrote: We can distinguish various kinds of theories in physics. Most of them are constructive. They attempt to build up a picture of the more complex phenomena out of the materials of a relatively simple formal scheme from which they start out. Thus the kinetic theory of gases seeks to reduce mechanical, thermal, and diffusional processes to movements of molecules -- i.e., to build them up out of the hypothesis of molecular motion. When we say that we have succeeded in understanding a group of natural processes, we invariably mean that a constructive theory has been found which covers the processes in question. Along with this most important class of theories there exists a second, which I will call "principletheories." These employ the analytic, not the synthetic, method. The elements which form their basis and starting-point are not hypothetically constructed but empirically discovered ones, general characteristics of natural processes, principles that give rise to mathematically formulated criteria which the separate processes or the theoretical representations of them have to satisfy. Thus the science of thermodynamics seeks by analytical means to deduce necessary conditions, which separate events have to satisfy, from the universally experienced fact that perpetual motion is impossible. The advantages of the constructive theory are completeness, adaptability, and clearness, those of the principle theory are logical perfection and security of the foundations. The theory of relativity belongs to the latter class.

Einstein was not the first to discuss such a distinction between physical theories. In an essay on the history of physics included in the book “The Value of Science” published in 1904, Poincare had described how, following Newton’s success with celestial mechanics, the concept of central forces acting between material particles was use almost exclusively

as the basis for constructing physical theories (the exception being Fourier’s theory of heat). Poincare expressed an appreciation for this constructive approach to physics. This conception was not without grandeur; it was seductive, and many among us have not finally renounced it; they know that one will attain the ultimate elements of things only by patiently disentangling the complicated skein that our senses give us; that it is necessary to advance step by step, neglecting no intermediary; that our fathers were wrong in wishing to skip stations; but they believe that when one shall have arrived at these ultimate elements, there again will be found the majestic simplicity of celestial mechanics.

Poincare then proceded to a section called “The Physics of Principles”, where he wrote: Nevertheless, a day arrived when the conception of central forces no longer appeared sufficient… What was done then? The attempt to penetrate into the detail of the structure of the universe, to isolate the pieces of this vast mechanism, to analyse one by one the forces which put them in motion, was abandoned, and we were content to take as guides certain general principles, the express object of which is to spare us this minute study… The principle of the conservation of energy… is certainly the most important, but it is not the only one; there are others from which we can derive the same advantage. These are: Carnot's principle, or the principle of the degradation of energy. Newton's principle, or the principle of the equality of action and reaction. The principle of relativity, according to which the laws of physical phenomena must be the same for a stationary observer as for an observer carried along in a uniform motion of translation… The principle of the conservation of mass… The principle of least action. The application of these five or six general principles to the different physical phenomena is sufficient for our learning of them all that we could reasonably hope to know of them… These principles are results of experiments boldly generalized; but they seem to derive from their very generality a high degree of certainty. In fact, the more general they are, the more frequent are the opportunities to check them, and the verifications multiplying, taking the most varied, the most unexpected forms, end by no longer leaving place for doubt… Thus they came to be regarded as experimental truths; the conception of central forces became then a useless support, or rather an embarrassment, since it made the principles partake of its hypothetical character.

Einstein is known to have been an avid reader of Poincare’s writings, so it seems likely that he adopted the theoretical classification scheme from this essay. Returning to the previous excerpt from Einstein’s article, notice that he actually mentions three sets of alternative characteristics, all treated as representing essentially the same dichotomy. We're told that constructive theories proceed synthetically on the basis of hypothetical premises, whereas principle theories proceed analytically on the basis of empirical premises. Einstein cites statistical thermodynamics as an example of a constructive theory, and classical thermodynamics as an example of a principle theory. His view of these two different approaches to thermodynamics was undoubtedly influenced by the debate concerning the reality of atoms, which Mach disdainfully called the "atomistic doctrine". The idea that matter is composed of finite irreducible entities was regarded as purely hypothetical, and the justification for this hypothesis was not entirely clear. In fact, Einstein himself spent a great deal of time and effort trying to establish the reality of atoms, e.g., this was his expressed motivation for his paper on Brownian motion. Within this context, it's not surprising that he classified the premises of statistical thermodynamics as purely hypothetical, and the development of the theory as synthetic.

However, in another sense, it could be argued that the idea of atoms actually arises empirically, and represents an extreme analytic approach to observed phenomena. Literally the analytic method is to "take apart" the subject into smaller and smaller subcomponents, until arriving at the elementary constituents. We regard macroscopic objects not as an indivisible wholes, but as composed of sub-parts, each of which is composed of still smaller parts, and we continue this process of analysis at least until we can no longer directly resolve the sub-parts (empirically) into smaller entities. At this point we may resort to some indirect methods of inference in order to carry on the process of empirical analysis. Indeed, Einstein's work on Brownian motion did exactly this, in so far as he was attempting to analyze the smallest directly observable entities, and to infer, based on empirical observations, an even finer level of structure. It was apparently Einstein's view that, at this stage, a reversal of methodology is required, because direct observation no longer provides unique answers, and thus the inferences are necessarily indirect, i.e., they can only be based on a somewhat free hypothesis about the underlying structure, and then synthetically working out the observable implications of this hypothesis and comparing these with what we actually observe. So Einstein's conception of a constructive (hypothetically based, synthetic) physical theory was of a theory arrived at by hypothesizing or postulating some underlying structure (consistent with all observations, of course), and then working out the logical consequences of those postulates to see how well they account for the whole range of observable phenomena. At this point we might expect Einstein to classify special relativity as a constructive theory, because it's well known that the whole theory of special relativity - with all its observable consequences - can be constructed synthetically based on the exceedingly elementary hypothesis that the underlying structure of space and time is Minkowskian. However, Einstein's whole point in drawing the distinction between constructive and principle theories was to argue that relativity is not a constructive theory, but is instead a theory of principle. It's clear that Einstein's original conception of special relativity was based on the model of classical thermodynamics, even to the extent that he proposed exactly two principles on which to base the theory, consciously imitating the first and second laws of thermodynamics. Some indication of the ambiguity in the classification scheme can be seen in the various terms that Einstein applied to these two propositions. He variously referred to them as postulates, principles, stipulations, assumptions, hypotheses, definitions, etc. Now, recalling that a "constructive theory" is based on hypotheses, whereas a "principle theory" is based on principles, we can see that the distinction between principles and postulates (hypotheses) is significant for correctly classifying a theory, and yet Einstein was not very careful (at least originally) to clarify the actual role of his two foundational propositions. Nevertheless, his consistently viewed special relativity as a theory of principle, with the invariance of light speed playing a role analogous to the conservation of energy in classical thermodynamics, both regarded as high-level empirical propositions rather than low-level elementary hypotheses. Indeed, it's possible to make this more than just an analogy, because in place of the invariance of light speed (with respect to all inertial

coordinate systems) we could just as well posit conservation of total mass-energy (with the conversion E = mc2), and use this conservation, together with the original principle of relativity (essentially carried over from Newtonian physics), as the basis for special relativity. As late as 1949 in his autobiographical notes (which he jokingly called his "obituary"), Einstein wrote Gradually I despaired of the possibility of discovering the true laws by means of constructive efforts based on known facts. The longer and more desperately I tried, the more I came to the conviction that only the discovery of a universal formal principle could lead us to assured results. The example I saw before me was thermodynamics. The general principle was there given in the theorem: the laws of nature are such that it is impossible to construct a perpetuum mobile (of the first or second kind)... The universal principle of the special theory of relativity is contained in the postulate: The laws of physics are invariant with respect to Lorentz transformations (for the transition from one inertial system to any other arbitrarily chosen inertial system). This is a restricting principle for natural laws, comparable to the restricting principle of the nonexistence of the perpetuum mobile that underlies thermodynamics.

Here Einstein refers to "constructive theories based on known facts", whereas in the 1919 article he indicated that constructive theories are based on "a relatively simple formal scheme" such as the hypothesis of molecular motion (i.e., the atomistic doctrine that Mach (for one) rejected as unempirical), and principle theories are based on empirical facts. In other words, the distinguishing characteristics that Einstein attributed to the two kinds of theories have been reversed. This illustrates one of the problematic aspects of Einstein's classification scheme: every theory is ultimately based on some unprovable premises, and at the same time every (nominally viable) theory is based on what might be called known facts, i.e., is it connected to empirical results. Einstein was certainly well aware of this, as shown by the following comment (1949) in a defense of his methodological approach: A basic conceptual distinction, which is a necessary prerequisite of scientific and pre-scientific thinking, is the distinction between "sense-impressions" (and the recollection of such) on the one hand and mere ideas on the other. There is no such thing as a conceptual definition of this distinction (aside from, circular definitions, i.e., of such as make a hidden use of the object to be defined). Nor can it be maintained that at the base of this distinction there is a type of evidence, such as underlies, for example, the distinction between red and blue. Yet, one needs this distinction in order to be able to overcome solipsism.

In view of this, what ultimately is the distinction between what Einstein called constructive theories and principle theories? It seems that the distinction can only be based on the conceptual level of the hypotheses, so that constructive theories are based on "low level" hypotheses, and principle theories based on "high level" hypotheses. In this respect the original examples (classical thermodynamics and statistical thermodynamics) cited by Einstein are probably the clearest, because they represent two distinct approaches to essentially the same subject matter. In a sense, they can be regarded as just two different interpretations of a single theory (much as special relativity and Lorentz's ether theory can be seen as two different interpretations of the same theory). Now, statistical thermodynamics was founded on hypotheses - such as the existence of atoms that may be considered "low level", whereas the hypothesis of energy conservation in classical thermodynamics can plausibly be described as "high level". On the other hand,

the premises of statistical thermodynamics include the idea that the molecules obey certain postulated equations of motions (e.g., Newton's laws) which are essentially just expressions of conservation principles, so the "constructive" approach differs from the "theory of principle" only in so far as its principles are applied to very low-level entities. The conservation principles are explicitly assumed only for elementary molecules in statistical thermodynamics, and then they are inferred for high-level aggregates like a volume of gas. In contrast, the principle theory simply observes the conservation of energy at the level of gases, and adopts it as a postulate. In the case of special relativity, it's clear that Einstein originally developed the theory from a "high-level" standpoint, based on the observation that light propagates at the same speed with respect to every system of inertial coordinates. He himself felt that a constructive model or interpretation for this fact was lacking. In January of 1908 he wrote to Sommerfeld A physical theory can be satisfactory only if its structures are composed of elementary foundations. The theory of relativity is ultimately just as unsatisfactory as, for example, classical thermodynamics was before Boltzmann interpreted entropy as probability.

However, just eight months later, Minkowski delivered his famous lecture at Cologne, in which he showed how the theory of special relativity follows naturally from just a simple fundamental hypothesis about the metric of space and time. There can hardly be a lower conceptual level than this, i.e., some assumption about the metric(s) of space and time is seemingly a pre-requisite for any description - scientific or otherwise - of the phenomena of our experience. Kant even went further, and suggested that one particular metrical structure (Euclidean) was a sina qua non of rational thought. We no longer subscribe to such a restrictive view, and it may even be possible to imagine physical ideas prior to any spatio-temporal conceptions, but nevertheless the fact remains that such conceptions are among the most primitive that we possess. For example, the posited structure of space and time is more primitive than the notion of atoms moving in a void, because we cannot even conceive of "moving in a void" without some idea of the structure of space and time. Hence, if a complete physical theory can be based entirely on nothing other than the hypothesis of one simple form for the metric of space and time, such a theory must surely qualify as "constructive". Minkowski’s spacetime interpretation does for special relativity what Boltzmann’s statistical interpretation did for thermodynamics, namely, it provided an elementary constructive foundation for the theory. Einstein's reaction to Minkowski's work was interesting. It's well known that Einstein was not immediately very appreciative of his former instructor's contribution, describing it as "superfluous learnedness", and joking that "since the mathematicians have attacked the relativity theory, I myself no longer understand it any more". He seems to have been at least partly serious when he later said "The people in Gottingen [where both Minkowski and Hilbert resided] sometimes strike me not as if they wanted to help one formulate something clearly, but as if they wanted only to show us physicists how much brighter they are than we". Of course, Einstein's appreciation subsequently increased when he found it necessary to use Minkowski's conceptual framework in order to develop general relativity. Still, even in his autobiographical notes, Einstein seemed to downplay

the profound transformation of special relativity that Minkowski's insight represents. Minkowski's important contribution to the theory lies in the following: Before Minkowski's investigation it was necessary to carry out a Lorentz transformation on a law in order to test its invariance under Lorentz transformations; be he succeeded in introducing a formalism so that the mathematical form of the law itself guarantees its invariance under Lorentz transformations.

In other words, Minkowski's contribution was merely the introduction of a convenient mathematical formalism. Einstein then added, almost as an afterthought, He [Minkowski] also showed that the Lorentz transformation (apart from a different algebraic sign due to the special character of time) is nothing but a rotation of the coordinate system in the fourdimensional space.

This is a rather slight comment when we consider that, from the standpoint of Einstein's own criteria, Minkowski's insight that Lorentz invariance is purely an expression of the (pseudo) metric of a combined four-dimensional space-time manifold at one stroke renders special relativity into a constructive theory, the thing for which Einstein had sought so "desperately" for so long. As he wrote in the London Time article above, "when we say that we have succeeded in understanding a group of natural processes, we invariably mean that a constructive theory has been found which covers the processes in question", but he himself had given up on the search for such a theory in 1905, and had concluded that, for the time being, the only possibility of progress was by means of a theory of principle, analogous to classical thermodynamics. Actual understanding of the phenomena would have to wait for a constructive theory. As it happened, this constructive theory was provided just three years later by his former mathematics instructor in Gottingen. From this point of view, it seems fair to say that the modern theory of special relativity has had three distinct forms. First was Lorentz's (and Poincare's) ether theory (18921904) which, although conceived as a constructive theory, actually derived its essential content from a set of high-level principles and assumptions as discussed in Section 3.6. Second was Einstein's explicit theory of principle (1905), in which he identified and isolated the crucial premises underlying Lorentz’s theory, and showed how they could be consistently interpreted as primitive aspects of space and time. Third was Minkowski's explicitly constructive spacetime theory (1908). Each stage represented a significant advance in clarity, with Einstein's intermediate theory of principle and its interpretation serving as the crucial bridge between the two very different constructive frameworks of Lorentz and Minkowski. 4.1 Immovable Spacetime My argument for the notion of space being really independent of body is founded on the possibility of the material universe being finite and moveable. 'Tis not enough for this learned writer [Leibniz] to reply that he thinks it would not have been wise and reasonable for God to have made the material universe finite and moveable… Neither is it sufficient barely

to repeat his assertion that the motion of a finite material universe would be nothing, and (for want of other bodies to compare it with) would produce no discoverable change, unless he could disprove the instance which I gave of a very great change that would happen, viz., that the parts would be sensibly shocked by a sudden acceleration or stopping of the motion of the whole: to which instance, he has not attempted to give any answer. Samuel Clarke, 1716 Although the words "relativity" and "relational" share a common root, their meanings are quite different. The principle of relativity asserts that for any material particle in any state of motion there exists a system of space and time coordinates in terms of which the particle is instantaneously at rest and inertia is homogeneous and isotropic. Thus the natural (inertial) decomposition of spacetime intervals into temporal and spatial components can be defined only relative to some particular frame of reference. Of course, the absolute spacetime intervals themselves are invariant, so the "relativity" refers only to the anaytical decomposition of these intervals. (The physical significance of this particular decomposition is that the quantum phase of any object evolves in proportion to its "natural" temporal coordinate.) In contrast, the principle of relationism asserts that the absolute intervals between material objects fully characterize their extrinsic positional status, without reference to any underlying non-material system of reference which might be called "absolute space". The traditional debate between proponents of relational and absolute motion (such as Leibniz and Clarke, respectively) is of questionable relevance if continuous fields are accepted as extended physical entities, permeating all of space, because this implies there are no unoccupied locations. In this context every point in the entire spacetime manifold is a vertex of actual relations between physical entities, obscuring the distinction between absolute and relational premises. Moreover, in the context of the general theory of relativity, the metrical properties of spacetime itself constitute a field, i.e., an extended physical entity, which not only acts upon material objects but is also acted upon by them, so the absolute-relational distinction has no clear meaning. However, it remains possible to regard fields as only representations of effects, and to insist on materiality for ontological objects, in which case the absolute-relational question remains both relevant and unresolved. Physicists have always recognized the appeal of a purely relational theory of motion, but every such theory has foundered on the same problem, namely, the physicality of acceleration. For example, one of Newton’s greatest challenges was to account for the fact that the Moon is relationally stationary with respect to the Earth (i.e., the distance between Earth and Moon is roughly unchanging), whereas it ought to be accelerating toward the Earth due to the influence of gravity. What is holding the Moon up? Or, to put the question differently, why is the Moon not accelerating directly toward the Earth in accord with the gravitational force that is presumably being applied to it? Newton's brilliant answer was that the Moon is indeed accelerating directly toward the Earth, and

with precisely the magnitude of acceleration predicted by his gravity formula, but th the Moon is also moving perpendicularly to the Earth-Moon axis, with a velocity v = ωR, where R is the Earth-Moon distance and ω is the Moon's angular velocity, i.e., roughly 2π radians/moonth. If it were not accelerating toward the Earth, the Moon would just wander off tangentially away from the Earth, but the force of gravity is modifying its velocity by adding GM/R2 ft/sec toward the Earth each second, which causes the Moon to turn continually in a roughly circular orbit around the Earth. The centripetal acceleration of an object revolving in a circle is v2/R = ω2R, and so (Newton reasoned) this must equal the gravitational acceleration. Thus we have ω2 R3 = GM, which of course is Kepler's third law. This explanation depends on strictly non-relational concept of motion. In fact, it might be said that this was the crucial insight of Newtonian dynamics - and it applies no less in the special theory of relativity. For the purposes of dynamical analysis, motion must be referred to an absolute background class of rectilinear inertial coordinate systems, rather than simply to the relations between material bodies, or even classical fields. Thus we can not infer everything important about an object's state of motion simply from its distances to other objects (at least not to nearby objects). In this sense, both Newtonian and relativistic physics find it necessary to invoke absolute space. But this concept of absolute space presents us with an ontological puzzle, because we can empirically verify the physical equivalence of all uniform states of motion, which suggests that position and velocity have no absolute physical significance, and yet we can also verify that changes in velocity (i.e., accelerations) do have absolute significance, independent of the relations between material bodies (at least in a local sense). If the evident relativity of position and velocity lead us to discard the idea of absolute space, then how are we to understand the apparent absoluteness of acceleration? Some have argued that in order for the change in something to be ontologically real, it is necessary for the thing itself to be real, but of course that's not the case. It's perfectly possible for "the thing itself" to be an artificial conception, whereas the "change" is the ontological entity. For example, the Newtonian concept of the physical world is a set of particles, between which relations exist. The primary ontological entities are the particles, but it's equally possible to imagine that the separations are the "real" entities, and particles are merely abstract entities, i.e., a convenient bookkeeping device for organizing the facts of a set of separations. This raises some interesting questions, such as whether an unordered multiset of n(n1)/2 separations suffices to uniquely determine a configuration of n points in a space of fixed dimension. It isn't difficult to find examples of multisets of separations that allow for multiple distinct spatial arrangements. For example, given the multiset of ten separations

we can construct either of the two five-point configurations shown below

For another example, the following three distinct configurations of eight co-planar points each have the same multiset of 28 point-to-point separations:

In fact, of the 12870 possible arrangements of eight points on a 4x4 grid, there are only 1120 distinct multisets of separations. Much of this reduction is due to rotations and reflections, but not all. Intrinsically distinct configurations of points with the same multiset of distances are not uncommon. They are sometimes called isospectral sets, referring to the spectrum of point-to-point distances. Examples such as these may suggest that unordered separations cannot be the basis of our experience, although we can't rule out, a priori, the possibility that our interpretation of experience is non-unique, and that different states of consciousness might perceive a given physical configuration differently. Even if we reject the possibility of non-unique mapping to our conventional domain of objects, we could still imagine a separation-based ontology by stipulating an ordering for those separations. (One hypothetical form which laws of separation might take is discussed in Section 4.2.) By recognizing the need to specify this ordering, our focus shifts back to a particle-based ontology. As noted previously, according to both Galilean and Einsteinian (special) relativity, position and velocity are relative but acceleration is not. However, it can be argued that the absoluteness of acceleration is incongruous with Galilean spacetime, because if spacetime was Galilean there would be no reason for acceleration to be absolute. This was already alluded to in the discussion of Section 1.8, where the cyclic symmetry of the velocity relations between three Galilean reference systems was noted. In a sense, the relationist Leibniz was correct in asserting that absolute space and time are inconsistent with Galilean relativity, citing the “principle of sufficient reason” in support of this claim. If time and space are separate and distinct (which no one had ever disputed) then there would be no observable distinction between accelerated and un-accelerated systems of reference, as revealed by the fact that the concept of a moveable rigid body of arbitrary size is perfectly consistent with the kinematics of Galilean relativity. Samual Clarke had argued that if all the material in some finite universe was accelerated in tandem, maintaining all the intrinsic relations between the particles, this acceleration would still be physically real, even though no one could observe the acceleration (for lack of anything to compare with it). Leibniz replied Motion does not indeed depend upon being observed, but it does depend upon being possible to be observed. There is no motion when there is no change that can be observed. And when there is no change that can be observed, there is no change at all. The contrary opinion is grounded upon the supposition of a real absolute space, which I have demonstratively confuted by the principle of the want of a sufficient reason of things.

It is quite right that, in the context of Galilean relativity, the acceleration of all the matter of the universe in tandem would be strictly unobservable, so Leibniz has a valid point. However, barring some Machian long-range influence which neither Clarke nor Leibniz seems to have imagined, the same argument implies that inertia should not exist at all. Thus Clarke was correct in pointing out that the very existence of inertia refuts Leibniz’s position. There is indeed an observable distinction between uniform and accelerated, i.e., inertia does exist. In summary, Leibniz was correct in (effectively) claiming that the existence of inertia is logically incompatible with the Galilean concept of space and time, whereas Clarke was correct in pointing out that inertia does actually exist. The only was out of this impasse would have been to discard the one premise that neither of them ever questioned, namely, the Galilean concept of space and time. It was to be another 200 years before a viable alternative to Galilean spacetime was recognized. As explained in Section 1, the spacetime structures of Galileo and Minkowski are formally identical if the characteristic constant c of the latter is infinite. In that case it follows that arbitrarily large rigid bodies are possible, so it is conceivable for all the material in an arbitrarily large region to accelerate in tandem, maintaining all the same intrinsic spatial relations. However, if c has some finite value, this is no longer the case. Section 2.9 described the kinematic limitation on the size of a spatial region in which objects can be accelerated in tandem. Hence the structure of Minkowski spacetime intrinsically distinguishes uniform motion as the only kind of motion that could be applied in tandem to all objects throughout space. In this context, Leibniz’s principle of sufficient reason can be used to argue that different states of uniform motion should not be regarded as physically different, but it cannot be applied to accelerated motion, because the very kinematics of Minkowski spacetime do not permit the tandem acceleration of objects over arbitrarily large regions. It seems justifiable to say that the existence of inertia implies the Minkowski character of spacetime. This goes some way towards resolving the epistemological problems that have often been raised against the principle of inertia. To the question “How are we to distinguish the inertial coordinate systems from all possible systems of reference?”, we can answer that the inertial coordinate systems are precisely those in terms of which two objects separated by an arbitrary distance can be accelerated in tandem. This doesn’t help to identify inertial coordinate systems in Galilean spacetime, but it fully identifies them in the context of Minkowski spacetime. So, it can be argued that (from an epistemological standpoint) Minkowski spacetime is the only satisfactory framework for the principle of inertia. Still, there remain some legitimate open issues regarding any (so far) conceived relativististic spacetime. According to both classical and special relativity, the inertial coordinate systems are fully symmetrical, and each one is regarded as physically equivalent (in the absence of matter). In particular, we cannot single out one particular inertial system and claim that it is the "central" frame, because the equivalence class has no center, and all ontological qualities are uniformly distributed over the entire class. Unfortunately, from a purely formal standpoint, a purported uniform distribution over inertial frames is somewhat problematic, because the inertial systems of reference along a

single line can only be linearly parameterized in terms of a variable that ranges from - to +, such as q = log((1+v)/(1-v)), but if each value of q is to be regarded as equally probable, then we are required to imagine a perfectly uniform density distribution over the real numbers. Mathematically, no such distribution exists. To illustrate, imagine trying to select a number randomly from a uniform distribution of all the real numbers. This is the source of many well-known mathematical conundrums, such as the "High-Low Number" strategy game, whose answer depends on the fact that no perfectly uniform distribution exists over the real numbers (nor even over the integers). In trying to understand whether there was any arbitrary choice in the creation of the physical world, it’s interesting to note that the selection of our particular rest frame cannot have been perfectly arbitrary from a set of pre-existing alternatives. It might be argued that the impossibility of a choice between indistinguishable inertial reference frames implies that only an absolutist framework is intelligible. However, the identity of indiscernables led Leibnic and Mach to argue just the opposite, i.e., that the only intelligible way to imagine the existence of objects, all in roughly the same frame of reference within a perfectly symmetrical class of possible reference systems, is to imagine that the objects themselves are in some way responsible for the class, which brings us back to pure relationism. Alas, as we’ve seen, pure relationism has its own problematic implications. For one, there has traditionally been a close association between relationism and the concept of absolute simultaneity. This is because the “relations” were regarded as purely spatial, and it was necessary to posit a unique instant of time in which to evaluate those spatial relations. To implement a spatial relationist theory in the framework of Minkowski spacetime would evidently require that whatever laws apply to the spatial relations for one particular decomposition of spacetime must also apply to all other decompositions. (A simple example of this is discussed in Section 4.2.) Alternatively, we might say that only invariant quantities should be subject to the relational laws, but this amounts to the same thing as requiring that the laws apply to all decompositions. One common feature of all purely relational models based on Galilean space and time is their evident non-locality, because (as noted above) there is no way, if we limit ourselves to local observations, to identify the inertial motions of material objects purely from the kinematical relations between them. We're forced to attribute the distinction between inertial and non-inertial motion to some non-material (or non-local) interaction. This is nicely illustrated by Einstein's thought experiment (based on Newton's famous "spinning pail") involving two nominally identical fluid globes S1 and S2 floating in an empty region of space. One of these globes is set rotating (about their common axis) while the other remains stationary. The rotating globe assumes an oblate shape due to its rotation.

If globes are mutually stationary and not rotating, they are both spherical and symmetrical, and we cannot distinguish between them, but if one of the globes is spinning about their common axis, the principle of inertia leads us to expect that the spinning globe will bulge at the "equator" and shrink along its axis of rotation due to the centripetal forces. The "paradox" (for the relationist) is that each globe is spinning with respect to the other, so they must still be regarded as perfectly symmetrical, and yet their shapes are no longer congruent. To what can we attribute the asymmetry? If we look further afield we may notice that the deformed globe is rotating relative to all the distant stars, whereas the spherical globe is not. A little experimentation shows that a globe's deformation is strictly a function of its speed of rotation relative to the distant stars, and presumably this is not a mere coincidence. Newton's explanation for this coincidence was to argue that the local globes and the distant stars all reside in the same absolute space, and it is this space that defines absolute (inertial) motion, and likewise the special relativistic theory invokes an absolutely preferred class of reference frames. Moreover, in the general theory of relativity, when viewed from a specific cosmological perspective, there is always a preferred frame of reference, owing to the global boundary conditions that must be imposed in order to single out a solution. This came as a shock to Einstein himself at first, since he was originally thinking (hoping) that the field equations of general relativity represented true relationism, but his conversion began when he received Schwarzschild's exact solution for spherical symmetry, which of course exhibits a preferred coordinate system such that the metric coefficients are independent of time, i.e., the usual Schwarzschild coordinates, which are essentially unique for that particular solution. Likewise for any given solution there is some globally unique system of reference singled out by symmetry or boundary conditions (even for asymptotically flat universes, as Einstein himself showed). For example, in the Friedman "big bang" cosmologies there is a preferred global system of coordinates corresponding to the worldlines with respect to which the cosmic background radiation is isotropic. Of course, this is not a fresh insight. The non-relational global aspects of general relativistic cosmologies have been extensively studied, beginning with Einstein's 1917 paper on the subject, and continuing with Gödel's rotating empty universes, and so on. Such examples make it clear that general relativity is not a relational theory of motion. In other words, general relativity does not correlate all physical effects with the relations between material bodies, but rather with the relations between objects (including fields) and the absolute background metric, which is affected by, but is not determined by, the distribution of objects (except arguably in closed cosmological models). Thus relativity, no less than Newtonian mechanics, relies on spacetime as an absolute entity in itself, exerting influence on fields and material bodies. The extra information contained in the metric of spacetime is typically introduced by means of boundary conditions or "initial values" on a spacelike foliation, sufficient to fix a solution of the field equations. In this way relativity very quickly disappointed its early logical-positivist supporters when it became clear that it was not, and never had been, a relational theory of motion, in

the sense of Leibniz, Berkeley, or Mach. Initially even Einstein was disturbed by the Schwarzschild and de Sitter solutions (see Section 7.6), which represent complete metrical manifolds with only one material object or none at all (respectively). These examples showed that spacetime in the theory of relativity cannot simply be regarded as the totality of the extrinsic relations between material objects (and non-gravitational fields), but is a primary physical entity of the theory, with its own absolute properties, most notably the metric with its related invariants, at each point. Indeed this was Einstein's eventual answer to Mach's critique of pre-relativity physics. Mach had complained that it was unacceptable for our theories to contain elements (such as spacetime) that act on (i.e., have an effect on) other things, but that are not acted upon by other things. Mach, and the other relationalists before him, naturally expected this to be resolved by eliminating spacetime, i.e., by denying that an entity called "spacetime" acts in any physical way. To Mach's surprise (and unhappiness), the theory of relativity actually did just the opposite - it satisfied Mach's criticism by instead making spacetime a full-fledged element of theory, acted upon by other objects. By so doing, Einstein believed he had responded to Mach's critique, but of course Mach hated it, and said so. Early in his career, Einstein was sympathetic to the idea of relationism, and entertained hopes of banishing absolute space from physics but, like Newton before him, he was forced to abandon this hope in order to produce a theory that satisfactorily represents our observations. The absolute significance of spacetime in the theory of relativity was already obvious from trivial considerations of the special theory. The twins paradox is a good illustration of why relativity cannot be a relational theory, because the relation between the twins is perfectly symmetrical, i.e., the spatial distance between them starts at zero, increases to some maximum value, and then decreases back to zero. The distinction between the twins cannot be expressed in terms of their mutual relations to each other, but only in terms of how each of their individual worldlines are embedded in the absolute metrical manifold of spacetime. This becomes even more obvious in the context of general relativity, because we can then have multiple distinct geodesic paths between two given events, with different lapses of proper time, so we cannot even appeal to any difference in "felt" accelerations or local physics of any kind along the two world-paths to account for the asymmetry. Hopes of accounting for this asymmetry by reference to the distant stars, ala Mach, were certainly not fulfilled by general relativity, according to which the metric of spacetime is conditioned by the presence of matter, but only to a very slight degree in most circumstances. From an overall cosmological standpoint we are unable to attribute the basic inertial field to the configuration of mass and energy, and we have no choice but to simply assume a plausible absolute inertial background field, just as in Newtonian physics, in order to actually make predictions and solve problems. This is necessarily a separate and largely independent stipulation from our assumed distribution of matter and energy. To understand why Galilean relativity is actually more relational than special relativity, note that the unified spacetime manifold with the lightcone structure of Minkowski spacetime is more rigid than a pure Cartesian product of a three-dimensional spatial manifold and an independent one-dimensional temporal manifold. In Galilean spacetime

at a spatial point P0 and time t0 there is no restriction at all on the set of spatial points at t0 + dt that may "spatially coincide with P0" with respect to some valid inertial frame of reference. In other words, an inertial worldline through P0 at time t0 can pass through any point in the entire universe at time t0 + dt for any positive dt. In contrast, the lightcone structure of Minkowski spacetime restricts the future of the point P0 to points inside the future null cone, i.e., P0  cdt, and as dt goes to zero, this range goes to zero, imposing a well-defined unique connection from each "infinitesimal" instant to the next, which of course is what the unification of space and time into a single continuum accomplishes. We referred above to Newtonian spacetime without distinguishing it from what has come to be called Galilean spacetime. This is because Newton's laws are manifestly invariant under Galilean transformations, and in view of this it would seem that Newton should be counted as an advocate of relativistic spacetime. However, in several famous passages of the first Scholium of the Principia Newton seems to reject the very relativity on which his physics is founded, and to insist on distinctly metaphysical conceptions of absolute space and time. He wrote I do not define the words time, space, place, and motion, since they are well known to all. However, I note that people commonly conceive of these quantities solely in terms of the relations between the objects of sense perception, and this is the source of certain preconceptions, for the dispelling of which it is useful to distinguish between absolute and relative, true and apparent, mathematical and common. It isn't trivial to unpack the intended significance of these statements, especially because Newton has supplied three alternate names for each of the two types of quantities that he wishes us to distinguish. On one hand we have absolute, true, mathematical quantities, and on the other we have relative, apparent, common quantities. The latter are understood to be founded on our sense perceptions, so the former presumably are not, which seems to imply that they are metaphysical. However, Newton also says that this distinction is useful for dispelling certain prejudices, which suggests that his motives are utilitarian and/or pedagogical rather than to establish an ontology. He continues Absolute, true, and mathematical time, in and of itself and of its own nature flows uniformly (equably), without reference to anything external. By another name it is called duration. Relative, apparent, and common time is any sensible external measure of duration by means of motion. Such measures (for example, an hour, a day, a month, a year) are commonly used instead of true time. Absolute space, in its own nature, without relation to anything external, remains always similar and immovable. Relative space is some movable measure of absolute space, which our senses determine by the positions of bodies... Absolute and relative space are of the same type (species) and magnitude, but are not always numerically the same... Place is a part of space which a body takes up, and is according to the space either

absolute or relative. Absolute motion is the translation of a body from one absolute place to another, and relative motion is the translation from one relative place to another. Newton's insistence on the necessity of referring all true motions to "immovable space" has often puzzled historians of science, because his definition of absolute space and time are plainly metaphysical, and it's easy to see that Newton's actual formulation of the laws of physics is invariant under Galilean transformations, and the concept of absolute space plays no role. Indeed, each mention of a "state of rest" in the definitions and laws is accompanied by the phrase "or uniform motion in a right line", so the system built on these axioms explicitly does not distinguish between these two concepts. What, then, did Newton mean when he wrote that true motions must be referred to immovable space? The introductory Scholium ends with a promise to explain how the true motions of objects are to be determined, declaring that this was the purpose for which the Principia was composed, so it's all the more surprising when we find that the subject is never even mentioned in Books I or II. Only in the concluding Book III, "The System of the World", does Newton return to this subject, and we finally learn what he means by "immovable space". Although his motto was "I frame no hypotheses" we find, immediately following Proposition X in Book III (in the third edition) the singular hypothesis HYPOTHESIS I: That the centre of the system of the world is immovable. In support of this remarkable assertion, Newton simply says "This is acknowledged by all, although some contend that the earth, others that the sun, is fixed in that centre." In the subsequent proposition XI we finally discover Newton's immovable space. He writes PROPOSITION XI: That the common centre of gravity of the earth, the sun, and all the planets, is immovable. For that centre either is at rest or moves uniformly forwards in a right line; but if that centre moved, the center of the world would move also, against the Hypothesis. This makes it clear that Newton's purpose all along has been not to deny Galilean relativity or the fundamental principle of inertia, but simply to show that a suitable system of reference for determining true inertial motions need not be centered on some material body. This was foreshadowed in the first Scholium when he wrote "it may be that there is no body really at rest, to which the places and motions of others may be referred". Furthermore, he notes that many people believed the immovable center of the world was at the center of the Earth, whereas others followed Copernicus in thinking the Sun was the immovable center. Newton evidently (and rightly) regarded it as one of the most significant conclusions of his deliberations that the true inertial center of the world was in neither of those objects, but is instead the center of gravity of the entire solar system. We recall that Galileo found himself in trouble for claiming that the Earth moves, whereas both he and Copernicus believed that the Sun was absolutely stationary. Newton showed that the Sun itself moves, as he continues

PROPOSITION XII: That the sun is agitated by a continual motion, but never recedes far from the common centre of gravity of all the planets. For since the quantity of matter in the sun is to the quantity of matter in Jupiter as 1067 to 1, and the distance to Jupiter from the sun is to the semidiameter of the sun is in a slightly greater proportion, the common center of gravity of Jupiter and the sun will fall upon a point a little without the surface of the sun. This was certainly a magnificent discovery, worthy of being called the purpose for which the Principia was composed, and it is clearly what Newton had in mind when he wrote the introductory Scholium promising to reveal how immovable space (i.e., the center of the world) is to be found. In this context we can see that Newton was not claiming the ability to determine absolute rest, but rather the ability to infer from phenomena a state of absolute inertial motion, which he identified with the center of gravity of the solar system. He very conspicuously labels as a Hypothesis (one of only three in the final edition of the Principia) the conventional statement, "acknowledged by all", that the center of the world is immovable. By these statements he was trying to justify calling the solar system's inertial center the center of the world, while specifically acknowledging that the immovability of this point is conventional, since it could just as well be regarded as moving "uniformly forwards in a right line". The modern confusion over Newton's first Scholium arises from trying to impose an ontological interpretation on a 17th century attempt to isolate the concept of pure inertia, and incidentally to locate the "center of the world". It was essential for Newton to make sure his readers understood that "uniform motion" and "right lines" cannot generally be judged with reference to neighboring bodies (such as the Earth's spinning surface), because those bodies themselves are typically in non-uniform motion. Hence he needed to convey the fact that the seat of inertia is not the Earth's center, or the Sun, or any other material body, but is instead absolute space and time - in precisely the same sense that spacetime is absolute in special relativity. This is distinct from asserting an absolute state of rest, which Newton explicitly recognized as a matter of convention. Indeed, we now know the solar system itself revolves around the center of the galaxy, which itself moves with respect to other galaxies, so under Hypothesis I we must conclude that Proposition XI is strictly false. Nevertheless, the deviations from true inertial motion represented by those stellar and galactic motions are so slight that Newton's "immovable center of the world" is still suitable as the basis of true inertial motion for nearly all purposes. In a more profound sense, the concept of "immoveable space" been carried over into modern relativity because, as Einstein said, spacetime in general relativity is endowed with physical qualities that enable it to establish the local inertial frames, but "the idea of motion may not be applied to it". 4.2 Inertial and Gravitational Separations And I am dumb to tell a weather’s wind

How time has ticked a heaven round the stars. Dylan Thomas, 1934 The special theory of relativity is formulated as a local theory, so its natural focus is on the worldlines of individual particles. In addition, special relativity presupposes a preferred class of worldlines, those representing inertial motion. The idea of a worldline is inherently “absolute” in the sense that it is nominally defined with reference only to a system of space and time coordinates, not to any other objects. This is in contrast to a truly relational theory, which would take the "dual" approach, and regard the separations between particles as the most natural objects of study. In fact, as mentioned in Section 4.1, we could go to the relationist extreme of regarding separations as the primary ontological entities, and considering particles to be merely abstract concepts that we use to psychologically organize and coordinate the separations. The relationist view arguably has the advantage of not presupposing a fixed background or even a definite dimensionality of space, since each “separation” could be considered to represent an independent degree of freedom. Of course, this freedom doesn’t seem to exist in the real world, since we cannot arrange five particles all mutually equidistant from each other. Indeed it appears that the n(n1)/2 separations between n particles can be fully encoded as just 3n real numbers, and moreover that those real number vary continuously as the individual particles “move”. This is the justification for the idea of particles moving in a coherent three-dimensional space. Nevertheless, it’s interesting to examine the spatial separations that exist between material particles (as opposed to the space and time coordinates of individual particles), to see if their behavior can be characterized in a simple way. From this point of view, the idea of "motion" is secondary; we simply regard separations as abstract entities having certain properties that may vary with time. In this context, rather than discussing inertial motion of an individual particle, we consider the spatial separation (as a function of time) between two inertial particles. However, since we don’t presuppose a background of absolute inertial motion, we will refer to the particles as being “co-inertial”, meaning simply that the spatial separation between them behaves like the separation between two particles in absolute inertial motion, regardless of whether the two particles are actually in absolute inertial motion. Is it possible to characterize in a simple way the spatial separations that exist between coinertial particles? Consider, for example, the spatial separation s(t) as a function of time between a stationary particle and a particle moving uniformly in a straight line through space, as depicted in the figure below for the condition when the direction of motion of the moving particle B is perpendicular to the displacement from the stationary particle A.

Obviously the separation between objects A and B in this configuration is stationary at

this instant, i.e., we have ds/dt = 0, and yet we know from experience that this physical situation is distinct from one in which the two objects are actually stationary with respect to each other’s inertial rest frames. For example, the Moon and Earth are separated by roughly a constant distance, and yet we understand that the Moon is in constant motion perpendicular to its separation from the Earth. It is this transverse motion that counteracts the effect of gravity and keeps the Moon in its orbit. This is another reason that we ordinarily find it necessary to describe motion not in purely relational terms, but in terms of absolutely non-rotating systems of inertial coordinates. Of course, as Mach observed, the apparent existence of “absolute rotation” doesn’t necessarily refute relationism as a viable basis for coordinating events. It could also mean that we must take more relations into account. (For example, the Moon’s motion is always tangential to the Earth, but it is not always tangential to other bodies, so it’s orbital motion does show up in the totality of binary separations.) Whether or not a workable physics could be developed on a purely relational basis is unclear, but it’s still interesting to examine the class of co-inertial separations as functions of time. It turns out that co-inertial separations are characterized by a condition that is nearly identical to the condition for linear gravitational free-fall, as well as for certain other natural kinds of motion. The three orthogonal components Δx, Δy, and Δz of the separation between two particles in unaccelerated motion relative to a common reference frame must be linear functions of time, i.e.,

where the coefficients ai and bi are constants. Therefore the magnitude of any "co-inertial separation" is of the form

where

Letting the subscript n denote nth derivative with respect to time, the first two derivatives of s(t) are

The right hand equation shows that s2 s03 = k, and we can differentiate this again and divide the result by s02 to show that the separation s(t) between any two particles in relatively unaccelerated (i.e., co-inertial) motion in Galilean spacetime must satisfy the equation

Now we consider the separation that characterizes an isolated non-rotating two-body system in gravitational free-fall. Assume the two bodies are identical particles, each of mass m. According to Newtonian theory the inertial and gravitational constraints are coupled together by the auxiliary quantity called "force" by the following equations

where G is a universal constant. (Note that each particle's "absolute" acceleration is half of the second derivative of their mutual separation with respect to time.) Equating these two forces gives s2 s02 = 2Gm. Differentiating this again and dividing through by s0, we can characterize non-rotating gravitational free-fall by the purely kinematic equation

The formal similarity between equations (1) and (2) is remarkable, considering that the former describes strictly inertial separations and the latter describes gravitational separations. We can show how the two are related by considering general free motion in a gravitational field. The Newtonian equations of motion are

where r is the magnitude of the distance from the center of the field and ω is the angular velocity of the particle. If we solve the left hand equation for ω and differentiate to give d ω/dt, we can substitute these expressions into the right hand equation and re-arrange the terms to give

which applies (in the Newtonian limit) to arbitrary free paths of test particles in a gravitational field. Obviously if m = 0 this reduces to equation (1), representing free inertial separations, whereas for purely radial motion we have d2r/dt2 = m/r2, and so this reduces to equation (2), representing radial gravitational separation. Other classes of physical separations also satisfy a differential equation similar to (1) and (2). For example, consider a particle of mass m attached to a rod in such a way that it can slide freely along the rod. If we rotate the rod about some point P then the particle in general will tend to slide outward along the rod away from the center of rotation in accord with the basic equation of motion

where s is the distance from the center of rotation to the sliding particle, and ω is the angular velocity of the rod. Differentiating and multiplying through by s0 gives

Then since s2 = ω2s0, we see that s(t) satisfies the equation (3) So, we have found that arbitrary co-inertial separations, non-rotating gravitational separations, and rotating radial separations are all characterized by a differential equation of the form (4) for some constant N. (Among the other solutions of this equation (with N = 1) are the elementary transcendental functions et, sin(t), and cos(t).) Solving for N, to isolate the arbitrary constant, we have

Differentiating this gives the basic equation

If none of s0, s1, s2, and s3 is zero, we can divide each term by all of these to give the interesting form

This could be seen as a (admittedly very simplistic) “unification” of a variety of physically meaningful spatial separation functions under a single equation. The “symmetry breaking” that leads to the different behavior in different physical situations arises from the choice of N, which appears as a constant of integration. Incidentally, even though the above has been based on the Galilean spatial separations between objects as a function of Galilean time, the same conditions can be shown to apply to the absolute spacetime intervals between inertial particles as a function of their proper times. Relative to any point on the worldline of one particle, the four components Δt, Δx, Δy, and Δz of the absolute interval to any other inertially moving particle are all linear functions of the proper time τ along the latter particle's worldline. Therefore, the

components can be written in the form

where the coefficients ai and bi are constants. It follows that the absolute magnitude of any "co-inertial separation" is of the form

where

Thus we have the same formal dependence as before, except now the parameter s represents the absolute spacetime separation. This shows that the absolute separation between any fixed point on one inertial worldline and a point advancing along any other inertial worldline satisfies equation (1), where subscripts denote derivatives with respect to proper time of the advancing point. Naturally the reciprocal relation also holds, as well as the absolute separation between two points, each advancing along arbitrary inertial worldlines, correlated according to their respective proper times. 4.3 Free-Fall Equations When, therefore, I observe a stone initially at rest falling from an elevated position and continually acquiring new increments of speed, why should I not believe that such increases take place in a manner which is exceedingly simple and rather obvious to everybody? Galileo Galilei, 1638 The equation of two-body non-rotating radial free-fall in Newtonian theory is formally identical to the one-body radial free-fall solution in Einstein's theory (as is Kepler's third law), provided we identify Newton's radial distance with the Schwarzschild parameter r, and Newton's time with the proper time of the falling particle. Therefore, it's worthwhile to explicitly derive the cycloidal form of this solution. From the Newtonian point of view we can begin with the inverse-square law of gravitation for the radial separation s(t) between two identical non-rotating particles of mass m

where dots signify derivatives with respect to time. Integrating this over ds from an

arbitrary initial separation s(0) to the separation s(t) at some other time t gives

Notice that the left hand integral can be rewritten

Therefore, the previous equation can easily be integrated to give

which shows that the quantity

is invariant for all t. Solving the equation for

, we have

Rearranging, this gives

To simplify the expressions, we put s0 = s(0), v0 = preceding expression can be written

and r = s(t)/s0. In these terms, the

There are two cases to consider. If K is positive, then the trajectory is bounded, and there is some point on the trajectory (the apogee) at which v = 0. Choosing this point as our time origin t = 0, we have K=1, and the standard integral gives

This equation describes a (scaled) cycloidal relation between t and r, which can be expressed parametrically in terms of a fictitious angle θ as follows

To verify that these two equations are equivalent to the preceding equation, we can solve the second for θ and substitute into the first to give

Using the trigonometric identity the right side is

we see that the first term on

Also, letting ϕ = invcos(2r1), we can use the trigonometric identity

to show that this angle is

so the second term on the right side of (2) is

which completes the demonstration that the cycloid relation given by (2) is equivalent to the free-fall relation (1). The second case is when K is negative. For this case we can conveniently express the equations in terms of the positive parameter k = -K. The standard integral

tells us that, for any two points s0 and s1 on the trajectory, the time interval is related to the separations according to

where

Notice that if we define S0 = s0 / k and R = kr, then this becomes

Thus, if we define the normalized time parameter

then the normalized equation of motion is

This represents the shape of every non-rotating separation between identical particles of mass m for which k is positive, which means that the absolute value of v0 exceeds 2 . These are the unbound radial orbits for which R goes to infinity, as opposed to the case when the absolute value of v0 is less than this threshold, which gives bound radial orbits in the shape of a cycloid in accord with equation (1). It's interesting to note the "removable singularity" of (3) at R = 0. Physically the parameter R is always non-negative by definition, so it abruptly reverses slope at the origin, even though the position may vary monotonically with respect to an external coordinate system. 4.4 Force, Curvature, and Uncertainty The atoms, as their own weight bears them down plumb through the void, at scarce determined times, in scarce determined places, from their course

decline a little - call it, so to speak, mere changed trend. For were it not their wont thuswise to swerve, down would they fall, each one, like drops of rain, through the unbottomed void; and then collisions ne'er could be, nor blows among the primal elements; and thus Nature would never have created aught. Lucretius, 50 BC The trajectory of radial non-rotating gravitational freefall can be expressed by the simple differential equation

where k is a constant and dots signify derivatives with respect to time. This equation is valid for both Newtonian gravity and general relativity, provided we identify Newton's time parameter with the free-falling particle's proper time, and Newton's radial distance with the radial Schwarzschild coordinate. Notice that no gravitational constant appears in this equation (k is just a constant of integration determined by the initial conditions), so equation (1) is a purely kinematic description of gravity. Why did Newton not adopt this simple kinematic view? Historically the reasons involved considerations of rotating systems, but the basic problem with the kinematic view is present even with simple nonrotating free-fall. The problem is that equation (1) has an unrealistic "static solution" at . This condition implies that k=0, and the separation between the two objects has no proper "trajectory" (i.e., time drops out of the equation), so the equation cannot extrapolate the position forward or backward in time. Of course, this condition can never arise naturally from any non-static condition with k0, but we can imagine that by the imposition of some external force we can arrange to have the two objects initially at rest and not accelerating relative to each other. Then when the objects are released from the "outside" force we expect them to immediately begin falling toward each other under the influence of their mutual gravitational attraction. This implies that k, and therefore , must immediately assume some non-zero values, but equation (1) gives us no information about these values, because the entire equation identically vanishes at the static solution. To escape from the static solution, Newtonian mechanics splits the kinematic equation of motion into two parts, coupled together by the dynamical concepts of force and mass. Two objects are said to exert (equal and opposite) force on each other proportional to the inverse of the square of the separation between them, and the second derivative of that separation is proportional (per mass) to this force. Thus, the relation between separation and time for two identical particles, each of mass m, is given not by a single kinematic equation but by two simultaneous equations

If we combine these two equations by eliminating F, we have

which shows that when the two objects are released, the separation instantly acquires the second derivative 2Gm/s2. Once this "initialization" has been accomplished, the subsequent free fall is entirely determined by equation (1), as can be seen by differentiating (2) to give

which, assuming the separation is not zero, can be divided by s to give , the derivative of (1). This shows that, for non-rotating radial free-fall, the coupling parameters F and m are entirely superfluous except that they serves to establish the proper initial condition when the two objects are released from rest. Thus, Newton's dual concepts of force-at-a-distance and the proportionality of acceleration to force serve only (in this context) to enable us to solve for a non-vanishing as a function of s when = 0, which equation (1) obviously cannot do. Furthermore, the constant G does not appear in (1) or (3), even though they give a complete description of gravitational free-fall except for the singularity at = 0. Thus the gravitational constant is also needed only at this singular point, the "static solution" of equation (1), which is the only point at which the dynamical concepts of force and mass are used. Aside from this singular condition, non-rotating radial Newtonian gravity is a purely kinematical phenomenon. There are several essentially equivalent formulations of the kinematic equation of nonrotating radial gravitational motion, but all lead to an indeterminate condition at the static solution. For example, if we set k = β2α in equation (1) and multiply through by 2 we have . Integrating this over time gives constant of integration. Dividing by s gives

where γ is a

which we recognize as expressing the classical conservation of energy, with the first term representing potential energy and the second term denoting kinetic energy. Taking the derivative of this gives

Notice that in each of the preceding equations the condition

still represents a

solution for any s, even though it is unrealistic. At this point we may be tempted to solve our problem by dividing through equation (4) by to give

which is the Newtonian inverse-square "force" law of gravity. This does indeed determines the second derivative as a function of s, and thereby provides the information needed to depart from the externally imposed static initial condition. However, notice that the condition which concerns us is precisely when = 0, so when we divided equation (4) by we were essentially just eliminating the singular pole arbitrarily by dividing by zero. Thus we can't properly say that the "force-at-a-distance" law (5) follows from equation (1). The removal of the indeterminate singularity actually represents an independent assumption relative to the basic kinematic equation of motion. Of course, this assumption is perfectly compatible with the equation of motion, as can be seen by solving equation (5) for γ/s and substituting into the energy equation to give

and thus

which is the same as equation (1). This compatibility is a necessary consequence of the fact that the equation of motion is totally indeterminate when =0, which is the only condition at which the force law introduces new information not contained in the basic equation of motion. In view of the above relations, it is not surprising that in the general theory of relativity we find gravity expressed without the concept of force. Einstein avoided the problem of the static solution - without invoking an auxiliary concept such as force - simply by recasting the phenomena in four-dimensional space-time, within which no material object is ever static. Every object, even one "at rest" in space, necessarily has a proper trajectory through spacetime, because it's moving forward in time. Furthermore, if we allow the spacetime manifold to posses intrinsic curvature, it follows that a purely timelike trajectory can "veer off" and acquire space-like components. Of course, this tendency to "veer off" depends on the degree of curvature of the spacetime, which general relativity relates to the mass-energy in the region. One of Einstein's motivations for the general theory was the desire to eliminate arbitrary constants, particularly the gravitational constant G, from the expressions of physical laws, but in the general theory it is still necessary to determine the proportionality between mass and curvature empirically, so the arbitrary gravitational constant remains. In any case, we see that Newtonian mechanics and general relativity give formally identical relations between separation and time for non-rotating free-fall, and the conceptual differences between the

two theories can be expressed in terms of the ways in which they escape from or avoid the static condition. It's interesting to note that the static solution of (1) is unstable in the direction collapse. Given a positive separation s, the signs of must be {+,}, {+,+}, {,+} or {,} in order to satisfy (1), but considering small perturbations of these derivatives from the state in which they are both zero, it's clear that {+,} is unrealistic, because would not go positive from zero while was going negative from zero. For similar reasons, perturbations leading to {+,+} and {,+} are also excluded. Only the case {,} represents a realistic outcome of a small perturbation from the static solution. This instability in the direction of collapse suggests another approach to escaping from (or avoiding) the static solution. The exact velocity and position of the two objects cannot be known at the quantum level, so, in a sense, the closest that two bodies can come to a static condition must still allow the equivalent of one quanta of momentum in their relative velocities. It's tempting to imagine that there might be some way of deriving the gravitational constant based on the idea that the initial condition for (1) is determined by the characteristic quantum uncertainty for the separations between massive particles, since, as we've seen, this initial condition fully determines the trajectory of radial gravitational free-fall. Simplistically we could note that, for a particle of mass m, any finite limit L on allowable distances implies two irreducible quantities of energy per unit mass, one being (h/2L)2/2m2 corresponding to the minimum "observable" momentum mv = h/2L (where h is Planck's constant) due to the uncertainty principle, and the other being the minimum gravitational potential energy Gm/L. Identifying these two energies with each other, and setting L equal to the event horizon radius c/H where c is the velocity of light and H is Hubble's expansion constant, we have the relation

Inserting the values h = (6.625)10-34 J sec, G = (6.673)10-11 Nm2/kg2, c = (2.998)108 m/sec, and H = (2.3)10-18 sec-1 gives a value of (1.8477)10-28 kg for the characteristic mass m, which happens to be about one ninth the mass of a proton. Rough relationships of this kind between the fundamental physical constants have been discussed by Dirac and others, including Leopold Infeld, who wrote in 1949 Let us take as an example Maxwell’s equations and try to find their solution on a cosmological background… In a closed universe the frequency of radiation has a lowest value [corresponding to the maximum possible wavelength]. The spectrum, on its red side, cannot reach frequency zero. We obtain characteristic values for frequencies… a similar situation prevails if we consider Dirac’s equations upon a cosmological background. The solutions in a closed universe are different, not because of the metric, but because of the topology of our universe. Such ideas are intiguing, but they have yet to be incorporated meaningfully into any

successful physical theory. The above represents a very simplistic sense in which the uncertainty of quantum mechanics and the spacetime curvature of general relativity can be regarded as two alternative conceptual strategies for establishing a consistent gravitational coupling. In a more sophisticated sense, we can find other interesting formal parallels between these two concepts, both of which fundamentally express non-commutativity. Given a system of orthogonal xyz coordinates, let A,B,C denote operations which, when applied to any unit vector emanating from the origin, rotate that vector in the positive sense about x, y, or z axis respectively. Each of these operations can be represented by a rotation matrix, such that multiplying any vector by that matrix will effectively rotate the vector accordingly. As Hamilton realized in his efforts to find a three-dimensional analog of complex numbers (which represent rotation operators in two-dimensions), the multiplication (i.e., composition) of two rotations in space is not commutative. This is easily seen in our example, because if we begin with a vector V emanating from the origin in the positive z direction, and we first apply rotation A and then rotation B, we arrive at a vector pointing in the positive y direction, whereas if we begin with V and apply the rotation B first and then A we arrive at a vector pointing in the negative x direction. Thus the effect of the combined operation AB is different from the effect of the combined operation BA, and so the matrix AB  BA does not vanish. This is in contrast with ordinary scalars and complex numbers, which always satisfy the commutivity relation ab  ba = 0 for every two numbers a,b. This non-commutivity also appears when dealing with calculus on curved manifolds, which we will discuss in more detail in Section 5. Just to give a preliminary indication of how non-commutative relations arise in this context, suppose we have a vector field Tα defined over a given metrical manifold, and we let Tαµν denote covariant differentiation of Tα first with respect to the coordinate xµ and then with respect to the coordinate xν. In a flat manifold the covariant derivative is identical to the partial derivative, which is commutative. In other words, the result of differentiation with respect to two coordinates in succession is independent of the order in which we apply the differentiations. However, in a curved manifold this is not the case. We find that reversing the order of the differentiations yields different results, just as when applying two rotations in succession to a vector. Specifically, we will find that

where Rσαµν is the Riemann curvature tensor, to be discussed in detail in Section 5.7. The vanishing of this tensor is the necessary and sufficient condition for the manifold to be metrically flat, i.e., free of intrinsic curvature, so this tensor can be regarded as a measure of the degree of non-commutivity of covariant derivative operators in the manifold. Non-commutivity also plays a central role in quantum mechanics, where observables such as position and momentum are represented by operators, much like the rotation operators in our previous example, and the possible observed states are eigenvalues of those operators. If we let X and P denote the position and momentum operators, the

application of one of these operators to the state vector of a given system results in a new state vector with specific probabilities. This represents a measurement of the respective observable. The effect of a position measurement followed by a momentum measurement can be represented by the combined operator XP, and likewise the effect of a momentum measurement followed by a position measurement can be represented by PX. Again we find that the commutative property does not generally hold. If two observable are compatible, such as the X position and the Y position of a particle, then the operators commute, which means we have XY  YX = 0. However, if two operators are not compatible, such as position and momentum, their operators do not commute. This leads to the important relation

This non-commutivity in the measurement of observables implies an inherent limit on the precision to which the values of the incompatible observables can be jointly measured. In general it can be shown that if A and B are the operators associated with the physical quantities a and b, and if Δa and Δb denote the expected root mean squares of the deviations of measured values of a and b from their respective expected values, then

This is Heisenberg's uncertainty relation. The commutator of two observable operators is invariably a multiple of Planck's constant, so if Planck's constant were zero, all observables would be compatible, i.e., their operators would commute, just as do all classical operators. We might say (with some poetic license) that Planck's constant is a measure of the "curvature" of the manifold of observation. This "curvature" applies only to incompatible observables, although the term "incompatible" is somewhat misleading, because it actually signifies that two observables A,B are conjugates, i.e., transformable into each other by the conjugacy relation A=UBU-1 where U is a unitary operator (analagous to a simple rotation operator).

4.5 Conventional Wisdom This, however, is thought to be a mere strain upon the text, for the words are these: ‘That all true believers break their eggs at the convenient end’, and which end is the convenient end, seems, in my humble opinion, to be left to every man’s conscience… Jonathan Swift, 1726 It is a matter of empirical fact that the speed of light is invariant in terms of inertial coordinates, and yet the invariance of the speed of light is often said to be a matter of convention - as indeed it is. The empirical fact refers to the speed of light in terms of

inertial coordinates, but the decision to define speeds in terms of inertial coordinates is conventional. It’s trivial to define systems of space and time coordinates in terms of which the speed of light is not invariant, but we ordinarily choose to describe events in terms of inertial coordinates, partly because of the invariance of light speed based on those coordinates. Of course, this invariance would be tautological if inertial coordinate systems were simply defined as the systems in terms of which the speed of light is invariant. However, as discussed in Section 1.3, the class of inertial coordinate systems is actually defined in purely mechanical terms, without reference to the propagation of light. They are the coordinate systems in terms of which mechanical inertia is homogeneous and isotropic (which are the necessary and sufficient conditions for Newton’s three laws of motion to be valid, at least quasi-statically). The empirical invariance of light speed with respect to this class of coordinate systems is a non-trivial empirical fact, but nothing requires us to define “velocity” in terms of inertial coordinate systems. Such systems cannot claim to have any a priori status as the “true” class of coordinates. Despite the undeniable success of the principle of inertia as a basis for organizing our understanding of the processes of nature, it is nevertheless a convention. The conventionalist view can be traced back to Poincare, who wrote in "The Measure of Time" in 1898 ... we have no direct intuition about the equality of two time intervals. The simultaneity of two events or the order of their succession, as well as the equality of two time intervals, must be defined in such a way that the statements of the natural laws be as simple as possible. In the same paper, Poincare described the use of light rays, together with the convention that the speed of light is invariant and the same in all directions, to give an operational meaning to the concept of simultaneity. In his book "Science and Hypothesis" (1902) he summarized his view of time by saying There is no absolute time. When we say that two periods are equal, the statement has no meaning, and can only acquire a meaning by a convention. Poincare's views had a strong influence on the young Einstein, who avidly read "Science and Hypothesis" with his friends in the self-styled "Olympia Academy". Solovine remembered that this book "profoundly impressed us, and left us breathless for weeks on end". Indeed we find in Einstein's 1905 paper on special relativity the statement We have not defined a common time for A and B, for the latter cannot be defined at all unless we establish by definition that the time required by light to travel from A to B equals the time it requires to travel from B to A. In a later popular exposition, Einstein tried to make the meaning of this definition more clear by saying That light requires the same time to traverse the path A to M (the midpoint of AB)

as for the path B to M is in reality neither a supposition nor a hypothesis about the physical nature of light, but a stipulation which I can make of my own freewill in order to arrive at a definition of simultaneity. Of course, this concept of simultaneity is also embodied in Einstein's second "principle", which asserts the invariance of light speed. Throughout the writings of Poincare, Einstein, and others, we see the invariance of the speed of light referred to as a convention, a definition, a stipulation, a free choice, an assumption, a postulate, and a principle... as well as an empirical fact. There is no conflict between these characterizations, because the convention (definition, stipulation, free choice, principle) that Poincare and Einstein were referring to is nothing other than the decision to use inertial coordinate systems, and once this decision has been made, the invariance of light speed is an empirical fact. As Poincare said in 1898, we naturally choose our coordinate systems "in such a way that the statements of the natural laws are as simple as possible", and this almost invariably means inertial coordinates. It was the great achievement of Galileo, Descartes, Huygens, and Newton to identify the principle of inertia as the basis for resolving and coordinating physical phenomena. Unfortunately this insight is often disguised by the manner in which it is traditionally presented. The beginning physics student is typically expected to accept uncritically an intuitive notion of "uniformly moving" time and space coordinate systems, and is then told that Newton's laws of motion happen to be true with respect to those "inertial" systems. It is more meaningful to say that we define inertial coordinate systems as those systems in terms of which Newton's laws of motion are valid. We naturally coordinate events and organize our perceptions in such a way as to maximize symmetry, and for the motion of material objects the most important symmetries are the isotropy of inertia, the conservation of momentum, the law of equal action and re-action, and so on. Newtonian physics is organized entirely upon the principle of inertia, and the basic underlying hypothesis is that for any object in any state of motion there exists a system of coordinates in terms of which the object is instantaneously at rest and inertia is homogeneous and isotropic (implying that Newton's laws of motion are at least quasi-statically valid). The empirical validity of this remarkable hypothesis accounts for all the tremendous success of Newtonian physics. As discussed in Section 1.3, the specification of a particular state of motion, combined with the requirement for inertia to be homogeneous and isotropic, completely determines a system of coordinates (up to insignificant scale factors, rotations, etc), and such a system is called an inertial system of coordinates. Such coordinate systems can be established unambiguously by purely mechanical means (neglecting the equivalence principle and associated complications in the presence of gravity). The assumption of inertial isotropy with respect to a given state of motion suffices to establishes the loci of inertial simultaneity for that state of motion. Poincare and Einstein rightly noted the conventionality of this simultaneity definition because they were not pre-supposing the choice of inertial simultaneity. In other words, we are not required to use inertial coordinates. We simply choose, of our own free will, to use inertial coordinates - with the corresponding inertial definition of simultaneity - because this renders the statement of physical laws and the descriptions of physical phenomena as simple and perspicuous as possible, by taking advantage of the maximum possible

symmetry. In this regard it's important to remember that inertial coordinates are not entirely characterized by the quality of being unaccelerated, i.e., by the requirement that isolated objects move uniformly in a straight line. It's also necessary to require the unique simultaneity convention that renders mechanical inertial isotropic (the same in all spatial directions), which amounts to the stipulation of equal one-way speeds for the propagation of physically identical actions. These comments are fully applicable to the Newtonian concept of space, time, and inertial reference frames. Given two objects in relative motion we can define two systems of inertial coordinates in which the respective objects are at rest, and we can orient these coordinates so the relative motion is purely in the x direction. Let t,x and T,X denote these two systems of inertial coordinates. That such coordinates exist is the main physical hypothesis underlying Galilean physics. An auxiliary hypothesis – one that was not always clearly recognized – concerns the relationship between two such systems of inertial coordinates, given that they exist. Galileo assumed that if the coordinates x,t of an event are known, and if the two inertial coordinate systems are the rest frames of objects moving with a relative speed of v, then the coordinates of that event in terms of the other system (with suitable choice of origins) are T = t, X = x  vt. Viewed in the abstract, this is a rather peculiar and asymmetrical assumption, although it is admittedly borne out by experience - at least to the precision of measurement available to Galileo. However, we now know, empirically, that the relation between relatively moving systems of inertial coordinates has the symmetrical form T = (t  vx)/γ and X = (x  vt)/γ where γ = (1v2)1/2 when the time and space variables are expressed in the same units such that the constant (3)108 meters/second equals unity. It follows that the one-way (not just the two-way) speed of light is invariant and isotropic with respect to any and every system of inertial coordinates. The empirical content of this statement is simply that the propagation of light is isotropic with respect to the same class of coordinate systems in terms of which mechanical inertia is isotropic. This is consistent with the fact that light itself is an inertial phenomena, e.g., it conveys momentum. In fact, the inertia of light can be seen as a common thread running through three of the famous papers published by Einstein in 1905. In the paper entitled "On a Heuristic Point of View Concerning the Production and Transformation of Light" Einstein advocated a conception of light as tiny quanta of energy and momentum, somewhat reminiscent of Newton's inertial corpuscles of light. It's clear that Einstein already understood that the conception of light as a classical wave is incomplete. In the paper entitled "Does the Inertia of a Body Depend on its Energy Content?" he explicitly advanced the idea of light as an inertial phenomenon, and of course this was suggested by the fundamental ideas of the special theory of relativity presented in the paper "On the Electrodynamics of Moving Bodies". The Galilean conception of inertial frames assumed that all such frames share a unique foliation of spacetime into "instants". Thus the relation "in the present of" constituted an equivalence relation across all frames of reference. If A is in the present of B, and B is in the present of C, then A is in the present of C. However, special relativity makes it clear that there are infinitely many distinct loci of inertial simultaneity through any given

event, because inertial simultaneity depends on the velocity of the worldline through the event. The inertial coordinate systems do induce a temporal ordering on events, but only a partial one. (See the discussion of total and partial orderings in Section 1.2.) With respect to any given event we can still partition all the other events of spacetime into distinct causal regions, including "past", "present" and "future", but in addition we have the categories "future null" and "past null", and none of these constitute equivalence classes. For example, it is possible for A to be in the present of B, and B to be in the present of C, and yet A is not in the present of C. Being "in the present of" is not a transitive relation. It could be argued that a total unique temporal ordering of events is a more useful organizing principle than the isotropy of inertia, and so we should adopt a class of coordinate systems that provides a total ordering. We can certainly do this, as Einstein himself described in his 1905 paper To be sure, we could content ourselves with evaluating the time of events by stationing an observer with a clock at the origin of the coordinates who assigns to an event to be evaluated the corresponding position of the hands of the clock when a light signal from that event reaches him through empty space. However, we know from experience that such a coordination has the drawback of not being independent of the position of the observer with the clock. The point of this "drawback" is that there is no physically distinguished "origin" on which to base the time coordination of all systems of reference, so from the standpoint of assessing possible causal relations we must still consider the full range of possible "absolute" temporal orderings. This yields the same partial ordering of events as does the set of inertial coordinates, so the "total ordering" that we can achieve by imposing a single temporal foliation on all frames of reference is only formal, and not physically meaningful. Nevertheless, we could make this choice, especially if we regard the total temporal ordering of events as a requirement of intelligibility. This seems to have been the view of Lorentz, who wrote in 1913 about the comparative merits of the traditional Galilean and the new Einsteinian conceptions of time It depends to a large extent on the way one is accustomed to think whether one is attracted to one or another interpretation. As far as this lecturer is concerned, he finds a certain satisfaction in the older interpretations, according to which... space and time can be sharply separated, and simultaneity without further specification can be spoken of... one may perhaps appeal to our ability of imagining arbitrarily large velocities. In that way one comes very close to the concept of absolute simultaneity. Of course, the idea of "arbitrarily large velocities" already pre-supposes a concept of absolute simultaneity, so Lorentz's rationale is not especially persuasive, but it expresses the point of view of someone who places great importance on a total temporal ordering, even at the expense of inertial isotropy. Indeed one of Poincare's criticisms of Lorentz's early theory was that it sacrificed Newton's third law of equal action and re-action. (This

can be formally salvaged by assigning the unbalanced forces and momentum to an undetectable ether, but the physical significance of a conservation law that references undetectable elements is questionable.) Oddly enough, even Poincare sometimes expressed the opinion that a total temporal ordering would always be useful enough to out-weigh other considerations, and that it would always remain a safe convention. The approach taken by Lorentz and most others may be summarized by saying that they sacrificed the physical principles of inertial relativity, isotropy, and homogeneity in order to maintain the assumed Galilean composition law. This approach, although technically serviceable, suffers from a certain inherent lack of conviction, because while asserting the ontological reality of anisotropy in all but one (unknown) frame of reference, it unavoidably requires us to disregard that assertion and arbitrarily assume one particular frame as being "the" rest frame. Poincare and Einstein recognized that in our descriptions of events in spacetime in terms of separate space and time coordinates we're free to select our "basis" of decomposition. This is precisely what one does when converting the description of events from one frame to another using Galilean relativity, but, as noted above, the Galilean composition law yields anisotropic results when applied to actual observations. So it appeared (to most people) that we could no longer maintain isotropy and homogeneity in all inertial frames together with the ability to transform descriptions from one frame to another by simply applying the appropriate basis transformation. But Einstein realized this was too pessimistic, and that the new observations were fully consistent with both isotropy in all inertial frames and with simple basis transformations between frames, provided we adjust our assumption about the effective metrical structure of spacetime. In other words, he brilliantly discerned that Lorentz's anisotropic results totally vanish in the context of a different metrical structure. Even a metrical structure is conventional in a sense, because it relies on our ontological premises. For example, the magnitude of the interval between two events may seem to be one thing but actually be another, due (perhaps) to variations in our means of observation and measurement. However, once we have agreed on the physical significance of inertial coordinate systems, the invariance of the quantity (dt)2  (dx)2  (dy)2  (dz)2 also becomes physically significant. This shows the crucial importance of the very first sentence in Section 1 of Einstein's 1905 paper: Let us take a system of co-ordinates in which the equations of Newtonian mechanics hold good. Suitably qualified (as noted in Section 1.3), this immediately establishes not only the convention of simultaneity, but also the means of operationally establishing it, and its physical significance. Any observer in any state of inertial motion can throw two identical particles in opposite directions with equal force (i.e., so there is no net disturbance of the observer's state of motion), and the convention that those two particles have the same speed suffices to fully specify an entire system of space and time coordinates, which we call inertial coordinates. It is then an empirical fact - not a definition, convention, assumption, stipulation, or postulate - that the speed of light is

isotropic in terms of inertial coordinates. This obviously doesn't imply that inertial coordinates are "true" in any absolute sense, but the principle of inertia has proven to be immensely powerful for organizing our knowledge of physical events, and for discerning and expressing the apparent chains of causation. If a flash of light emanates from the geometrical midpoint between two spatially separate particles at rest in an inertial frame, the arrival times of the light wave at those two particles are simultaneous in terms of that rest frame’s inertial coordinates. Furthermore, we find empirically that all other physical processes are isotropic with respect to those inertial coordinates, e.g., if a sound wave emanates from the midpoint of a uniform steel beam at rest in an inertial frame, the sound reaches the two ends simultaneously in accord with this definition. If we adopt any other convention we introduce anisotropies in our descriptions of physical processes, such as sound in a uniform stationary steel beam propagating more rapidly in one direction than in the other. The isotropy of physical phenomena - including the propagation of light - is strictly a convention, but it was not introduced by special relativity, it is one of the fundamental principles which we use to organize our knowledge, and it leads us to choose inertial coordinates for the description of events. On the other hand, the isotropy of multiple distinct physical phenomena in terms of inertial coordinates is not purely conventional, because those coordinates can be defined in terms of just one of those phenomena. The value of this definition is due to the fact that a wide variety of phenomena are (empirically) isotropic with respect to the same class of coordinate systems. Of course, it could be argued that all these phenomena are, in some sense, “the same”. For example, the energy conveyed by electromagnetic waves has momentum, so it is an inertial phenomenon, and therefore it is not surprising that the propagation of such energy is isotropic in terms of inertial coordinates. From this point of view, the value of the definition of inertial coordinates is that it reveals the underlying unity of superficially dissimilar phenomena, e.g., the inertia of energy. This illustrates that our conventions and definitions are not empty, because they represent ways of organizing our knowledge, and the efficiency and clarity of this organization depends on choosing conventions that reflect the unity and symmetries of the phenomena. We could, if we wish, organize our knowledge based on the assumption of a total temporal ordering of events, but then it would be necessary to introduce a whole array of unobservable anisotropic "corrections" to the descriptions of physical phenomena. As we’ve seen, the principle of relativity constrains, but does not uniquely determine, the form of the mapping from one system of inertial coordinates to another. In order to fix the observable elements of a spacetime theory with respect to every member of the equivalence class of inertial frames we require one further postulate, such as the invariance of light speed (or the inversion symmetry discussed in Chapter 1.8). However, we should distinguish between the strong and weak forms of the light-speed invariance postulate. The strong form asserts that the one-way speed of light is invariant with respect to the natural space-time basis associated with any inertial state of motion, whereas the weak form asserts only that the round-trip speed of light is invariant. To illustrate the different implications of these two different assumptions, consider an experiment of the

type conducted by Michelson and Morley in their efforts to detect a directional variation in the speed of light, due to the motion of the Earth through the aether, with respect to which the absolute speed is light was presumed to be referred. To measure the speed of light along a particular axis they effectively measured the elapsed time at the point of origin for a beam of light to complete a round trip out to a mirror and back. At first we might think that it would be just as easy to measure the one-way speed of light, by simply comparing the time of transmission of a pulse of light from one location to the time of reception at another location, but of course this requires us to have clocks synchronized at two spatially separate locations, whereas it is precisely this synchronization that is at issue. Depending on how we choose to synchronize our separate clocks we can measure a wide range of light speeds. To avoid this ambiguity, we must evaluate the time interval for a transit of light at a single spatial location (in the coordinate system of interest), which requires us to measure a round trip, just as Michelson and Morley did. Incidentally, it might seem that Roemer's method of estimating the speed of light from the variations in the period between eclipses of Jupiter's moons (see Section 3.3) constituted a one-way measurement. Similarly people sometimes imagine that the one-way speed of light could be discerned by (for example) observing, from the center of a circle, pulses of light emitted uniformly by a light source moving at constant speed around the perimeter of the circle. Such methods are indeed capable of detecting certain kinds of anisotropy, but they cannot detect the anisotropy entailed by Lorentz’s ether theory, nor any of the other theories that are observationally indistinguishable from Lorentz’s theory (which itself is indistinguishable from special relativity). In any theory of this class, there is an ambiguity in the definition of a “circle” in motion, because circles contract to ellipses in the direction of motion. Likewise there is ambiguity in the definition of “uniformlytimed” pulses from a light source moving around the perimeter of a moving circle (ellipse). The combined effect of length contraction and time dilation in a Lorentzian theory is to render the anisotropies unobservable. The empirical indistinguishability between the theories in this class implies that there is no unambiguous definition of “the one-way speed of light”. We can measure without ambiguity only the lapses of time for closed-loop paths, and such measurements cannot establish the “open-loop” speed. The ambiguity in the one-way speed remains, because over any closed loop, by definition, the net change in each and every direction is zero. Hence it is possible to consistently interpret all observations based on the assumption of non-isotropic light speed. Admittedly the resulting laws take on a somewhat convoluted appearance, and contain unobservable parameters, but they can't be ruled out empirically. To illustrate, consider a measurement of the round-trip speed of light, assuming light travels at a constant speed c relative to some absolute medium with respect to which our laboratory is moving with a speed v. Under these assumptions, we would expect a pulse of light to travel with a speed c+v (relative to the lab) in one direction, and cv in the opposite direction. So, if we send a beam of light over a distance L out to a mirror in the "c+v" direction, and it bounces back over the same distance in the "cv" direction, the total elapsed time to complete the round trip of length 2L is

Therefore, the average round-trip speed relative to the laboratory would be

This shows why a round-trip measurement of the speed of light would not be expected to reveal any dependency on the velocity of the laboratory unless the measurement was precise enough to resolve second-order effects in v/c. The ability to detect such small effects was first achieved in the late 19th century with the development of precision interferometry (exploiting the wave-like properties of light.) The experiments of Michelson and Morley showed that, despite the movement of the Earth in its orbit around the Sun (to say nothing of the movement of the solar system, and even of the galaxy), there was no (v/c)2 term in the round-trip speed of light. In other words, they found that 2L/Δt is always equal to c, at least to the accuracy they could measure, which was more than adequate to rule out a second-order deviation. Thus we have a firm empirical basis for asserting that the round-trip speed of light is independent of the motion of the source. This is the weak form of the invariant light speed postulate, but in his 1905 paper Einstein asserted something stronger, namely, that we should adopt the convention of regarding the one-way speed of light as invariant. This stronger postulate doesn't follow from the results of Michelson and Morley, nor from any other conceivable experiment or observation - but there is also no conceivable observation that could conflict with it. The invariant round-trip speed of light fixes the observable elements of the theory, but it does not uniquely determine the presumed ontological structure, because multiple different interpretations can be made to fit the same set of appearances. The one-way speed of light is necessarily an interpretative element of our experience. To illustrate the ambiguity, notice that we can ensure a null result for the Michelson and Morley experiment while maintaining non-constant light speed, merely by requiring that the speed of light v1 and v2 in the two opposite directions of travel (out and back) satisfy the relation

In other words, a linear round-trip measurement of light speed will yield the constant c in every direction provided only that the harmonic mean of the one-way speeds in opposite directions always equals c. This is easily accomplished by defining the one-way velocity v1 as a function of direction arbitrarily for all directions in one hemisphere, and then setting the velocities in the opposite directions the velocities v2 in the opposite directions as v2 = cv1 / (2v1  c). However, we also wish to cover more complicated round-trips, rather than just back and forth on a single line. To ensure that a circuit of light around an

equilateral triangle with edges of length L yields a round-trip speed of c, the speeds v1, v2, v3 in the three equally spaced directions must satisfy

so again we see that the light speeds must have a harmonic mean of c. In general, to ensure that every closed loop of light, regardless of the path, yields the average speed c, it's necessary (and also sufficient) to have light speed v = C(θ) as a function of angle θ in a principal plane such that, for any positive integer n,

In units with c = 1, we need the n terms on the left side to sum to n, so the velocity function must be such that 1/C(θ) = 1 + f(θ) where the function f(θ) satisfies

for all θ. The canonical example of such a function is simply f(θ) = k cos(θ) for any constant k. Thus if we postulate that the speed of light varies as a function of the angle of travel θ relative to some primary axis according to the equation

then we are assured that all closed-loop measurements of the speed of light will yield the constant c, despite the fact that the one-way speed of light is distinctly non-isotropic (for non-zero k). This equation describes an ellipse, and no measurement can disprove the hypothesis that the one-way speed of light actually is (or is not) given by (1). It is, strictly speaking, a matter of convention. If we choose to believe that light has the same speed in all directions, then we assume k = 0, and in order to send a synchronizing signal to two points we would locate ourselves midway between them (i.e., at the location where round trips between ourselves and those two points take the same amount of time.) On the other hand, if we choose to believe light travels twice as fast in one direction as in the other, then we would assume k = 1/3, and we would locate ourselves 2/3 of the way between them (i.e., twice as far from one as the other, so round trip times are two to one). The latter case is illustrated in the figure below.

Regardless of what value we assume for k (in the range from -1 to +1), we can synchronize all clocks according to our belief, and everything will be perfectly consistent and coherent. Of course, in any case it's necessary to account consistently for the lapse of time for information to get from one clock to another, but the lapse of time between any two clocks separated by a distance L can be anything we choose in the range from virtually 0 to 2L/c. The only real constraint is that that the speed be an elliptical function of the direction angle. The velocity profile given by (1) is simply the polar equation of an ellipse (or ellipsoid is revolved about the major axis), with the pole at one focus, the semi-latus rectum equal to c, and eccentricity equal to k. This just projects the ellipse given by cutting the light cone with an oblique plane. Interestingly, there are really two light cones that intersect on this plane, and they are the light cones of the two events whose projections are the two foci of the ellipse - for timelike separated events. Recall that all rays emanating from one focus of an ordinary ellipse and reflecting off the ellipse will re-converge on the other focus, and that this kind of ray optics is time-symmetrical. In this context our projective ellipse is the intersection of two null-cones, i.e., it is the locus of all points in spacetime that are null-separated from both of the "foci events". This was to be expected in view of the time-symmetry of Maxwell's equations (not to mention the relativistic Schrodinger equation), as discussed in Section 9. Our main reason for assuming k = 0 is our preference for symmetry, simplicity, and consistency with inertial isotropy. Within our empirical constraints, k can be interpreted as having any value between -1 and +1, but the principle of sufficient reason suggests that it should not be assigned a non-zero value in the absence of any rational justification. Nevertheless, it remains a convention (albeit a compelling one), but we should be clear about what precisely is – and what is not – conventional. The invariance of lightspeed is a convention, but the invariance of lightspeed in terms of inertial coordinates is an empirical fact, and this empirical fact is not a formal tautology, because inertial coordinates are determined by the mechanical inertia of material objects, independent of

the propagation of light. Recall that Einstein’s 1905 paper states that if a pulse of light is emitted from an unaccelerated clock at time t1, and is reflected off some distant object at time t2, and is received back at the original clock at time t3, then the inertial coordinate synchronization is given by stipulating that

Reichenbach noted that the formally viable simultaneity conventions correspond to the assumption

where ε is any constant in the range from 0 to 1. This describes the same class of “elliptical speed” conventions as discussed above, with ε = (k+1)/2 where k ranges from  1 to +1. The corresponding coordinate transformation is a simple time skew, i.e., x’ = x, y’ = y, z’ = z, t’ = t + kx/c. This describes the essence of the Lorentzian “absolutist” interpretation of special relativity. Beginning with the putative absolute rest frame inertial coordinates x,y, Lorentz associates with each state of motion v a system of coordinates x’,t’ related to x,y by a Galilean transformation with parameter v. In other words, x’ = x – vt and t’ = t. He then re-scales the x’,t’ coordinates to account for what he regards as the physical contraction of the lengths of stable object and the slowing of the durations of stable physical processes, to arrive at the coordinates x” = x/ γ and t” = t γ where γ = (1 v2/c2)1/2. These he regards as the proper rest frame coordinates for objects moving with speed v in terms of the absolute frame. There is nothing logically unacceptable about these coordinate systems, but we must realize that they do not constitute inertial coordinate systems in the full sense. Mechanical inertia and the speed of light are not isotropic in terms of such coordinates, precisely because the time foliation (i.e., the simultaneity convention) is skewed relative to the ε = 1/2 convention. If we begin with the inertial rest frame coordinates for the state of motion v (which Lorentz and Einstein agree are related to the putative absolute rest frame coordinate by a Lorentz transformation), and then apply the time skew transformation with parameter k = -v/c, we arrive at these Lorentzian rest frame coordinates. Needless to say, our choice of coordinate systems does not affect the outcome of any physical measurement, except that the outcome will be expressed in different terms. For example, by the Einsteinian convention the speed of light is isotropic in terms of the rest frame coordinates of any material object, whereas by the Lorentzian convention it is not. This difference is simply due to different definitions of “rest frame coordinates”. If we specify inertial coordinate systems (i.,e., coordinates in terms of which inertia is isotropic and Newton’s laws are quasi-statically valid) then there is no ambiguity, and both Lorentz and Einstein agree that the speed of light is isotropic in terms of all inertial coordinate systems. In subsequent sections we’ll see that the standard formalism of general relativity provides

a convenient means of expressing the relations between spacetime events with respect to a larger class of coordinate systems, so it may appear that inertial references are less significant in the general theory. In fact, Einstein once hoped that the general theory would not rely on the principle of inertia as a primitive element. However, this hope was not fulfilled, and the underlying physical basis of the spacetime manifold in general relativity remains the set of primitive inertial paths (geodesics) through spacetime. Not only do these inertial paths determine the equivalence class of allowable coordinate systems (up to diffeomorphism), it even remains true that at each event we can construct a (local) system of inertial coordinates with respect to which the speed of light is c in all directions. Thus the empirical fact of lightspeed invariance and isotropy with respect to inertial coordinates remains as a primitive component of the theory. The difference is that in the general theory the convention of using inertial coordinates is less prevalent, because in general there is no single global inertial coordinate system, and non-inertial coordinate systems are often more convenient on a curved manifold. 4.6 The Field of All Fields Classes and concepts may be conceived as real objects, existing independently of our definitions and constructions. It seems to me that the assumption of such objects is quite as legitimate as the assumption of physical bodies, and there is quite as much reason to believe in their existence. Kurt Gödel, 1944 Where is the boundary between the special and general theories of relativity? It is sometimes said that any invocation of "general covariance" implies general relativity, but just about any theory can be expressed in a generally covariant form, so this doesn't even distinguish between general relativity and Newtonian physics, let alone special relativity. For example, it's perfectly possible to simply transform the special relativistic solution of a rotating platform into some arbitrary accelerated coordinate system, and although the result is ugly, it is no less (or more) valid than when it was expressed in terms of nonaccelerating coordinates, because the transformation from one stipulated set of coordinates to another has no physical content. The key word there is "stipulated", because the real difference between the special and general theories is in what they take for granted. In a sense, special relativity is analogous to "naive set theory" in mathematics. By this I mean that special relativity is based on certain plausible-sounding premises which actually are quite serviceable for treating a wide class of problems, but which on close examination are susceptible to self-referential antinomies. This is most evident with regard to the assumption of the identifiability of inertial frames. As Einstein remarked, "in the special theory of relativity there is an inherent epistemological defect", namely, that the preferred class of reference frames on which the theory relies is circularly defined. Special relativity asserts that the lapse of proper time between two (timelike-

separated) events is greatest along the inertial worldline connecting those two events - a seemingly interesting and useful assertion - but if we ask which of the infinitely many paths connecting those two events is the "inertial" one, we can only answer that it is the one with the greatest lapse of proper time. If we simply accept this uncritically, and are willing to naively rely on the testimony of accelerometers as unambiguous indicators of "inertia", we have a fairly solid basis on which to do physics, and we can certainly work out correct answers to many questions. However, the epistemological defect was worrisome to Einstein, and caused him (in a remarkably short time) to abandon special relativity and global Lorentz invariance as a suitable conceptual framework for the formulation of physics. The naive reliance on accelerometers as unambiguous indicators of global inertia in the context of special relativity is immediately undermined by the equivalence principle, because we're then required to predicate any application of special relativity on the absence (or at least the negligibility) of irreducible gravitational fields, and this condition is simply not verifiable within special relativity itself, because of the circularity in the principle of inertia. This circularity genuinely troubled Einstein, and was one of the major motivations (along with the problem of reconciling mass-energy equivalence with the Equivalence Principle) that led him to abandon special relativity. Given the recognized limitations of special relativity, and considering how successfully it was generalized and extended in 1916, we may wonder why it's even necessary to continue carrying along the special theory as a conceptually distinct entity. Will this duality persist indefinitely, or will we eventually just say there is a single theory of relativity (the theory traditionally called general relativity), which subsumes and extends the earlier theory called special relativity? The reluctance to discard the special theory as a separate theory may be due largely to the fact that it represents a simple and widelyapplicable special case of the general theory, and it's convenient to have a name for this limiting case. (There are, however, many cases in which the holistic approach of the general theory is actually much simpler than the traditional special-theory-plus-generalcorrections approach.) Another reason that's sometimes mentioned is the (remote) possibility that Einstein's general relativity is not the "right" generalization/extension of the special theory. For example, if observation were ever to conclusively rule out the existence of gravitational waves (which is admittedly hard to imagine in view of the available binary star data), it might be necessary to seek another framework within which to place the special theory. In this sense, we might regard special relativity as roughly analogous to set theory without the axiom of choice, i.e., a restricted and less ambitious theory that avoids making use of potentially suspect concepts or premises. However, it's hard to say exactly which of the fundamental principles of general relativity is considered to be suspect. We've seen that "general covariance" is a property of almost any theory, so that can't be a problem. We might doubt the equivalence principle in one or more of its various flavors, but it happens to be one of the most thoroughly tested principles in physics. It seems most likely that if general relativity fails, it would be because one or more of its "simplicities" is inappropriate. For example, the restriction to 2nd order, or the assumption of Riemannian metrics rather than, say, Finsler metrics, or

the naive assumption of R4 topology, or maybe even the basic assumption of a continuum. Still, each of these would also have conceptual implications for the special theory, so these aren't valid reasons for continuing to regard special relativity as a separate theory. Suppose we naively superimpose special relativity on Newtonian physics, and adopt a naive definition of "inertial worldline", such as a worldline with no locally sensible acceleration. On that basis we find that there can be multiple distinct "inertial" worldlines connecting two given events (e.g., intersecting elliptical orbits of different eccentricities), which conflicts with the special relativistic principle of a unique inertial interval between any pair of timelike separated events. To press the antinomy analogy further, we could arrange to have special relativity conclude that each of these worldlines has a lesser lapse of proper time than each of the others. (If the barber shaves everyone who doesn't shave himself, who shaves the barber?) Of course, with special relativity (as with set theory) we can easily block such specific conundrums - once they are pointed out - by imposing one or more restrictions on the definition of "inertial" (or the definition of a "set"), and in so doing we make the theory somewhat less naive, but the experience raises legitimate questions about whether we can be sure we have blocked all possible escapes. We shouldn't push the analogy too far, since there are obvious differences between a purely mathematical theory and a physical theory, the latter being exposed to potential conflict with a much wider class of "external" constraints (such as the requirement to possess a consistent mapping to a representation of experience). However, when considering naive set theory's assumption of the existence of sets, and its assertions about how to manipulate and reason with sets, all in the absence of a comprehensive criteria of how to identify what can legitimately be called a set, there is an interesting parallel with special relativity's assumption of the existence of inertial frames and how to reason with them and in them, all in the absence of a comprehensive framework for deciding what does and what does not constitute an inertial frame. It might be argued that relativity is a purely formalistic theory, which simply assumes an inertial frame is specified, without telling how to identify one. Certainly we can completely insulate special relativity from any and all conflict by simply adopting this strategy, i.e., asserting that special relativity avers no mapping at all between it's elements and the objects of our experience. However, although this strategy effectively blocks conflict, it also renders the theory quite unfalsifiable and phenomenologically otiose. Even recognizing the distinction between logical inconsistency and empirical falsification, we must also remember that the rules of logic and reason are ultimately grounded in "observations", albeit of a very abstract nature, and mathematical theories no less than physical theories are attempts to formalize "observations". As such, they are comparably subject to upset when they're found to conflict with other observations (e.g., barbers, gravity, etc.). It might be argued that we cannot really attribute any antinomies to special relativity, because the cases noted above (multiply intersecting elliptical orbits, etc) arise only from attempting to apply special relativistic reasoning to a class of entities for which it is not

suited. However, the same is true of naive set theory, i.e., it works perfectly well when applied to a wide class of sets, but leads to logically impossible conclusions if we attempt to apply it to a class of sets that "act on themselves"... just as gravity is found to act on itself in the general theory. In a real sense, gravity in general relativity is a self-referential phenomenon, as revealed by the non-linearity of the field equations. Notice that our antinomies in the special theory arise only when trying to reason with "self-referential inertial frames", i.e., in the presence of irreducible gravitational fields. The basic point is that although special relativity serves as the local limiting case of the general theory, it is not able to stand alone, because it cannot identify the applicability of its premises, which renders it incapable of yielding definite macroscopic conclusions about the physical world. By placing all the necessary indefinite qualifiers on the scope of applicability, we effectively remove special relativity from the set of physical theories. This just re-affirms the point that any application of special relativity is, strictly speaking, legitimized only within the context of the general theory, which provides the framework for assessing the validity of the application. One can, of course, still practice the special theory from a naive standpoint, and be quite successful at it, just as one can practice naive set theory without running into trouble very often. Naturally none of this implies that special relativity, by itself, is unfalsifiable. Indeed it is falsifiable, but only when superimposed on some other framework (such as Newtonian physics) and combined with some auxiliary assumptions about how to identify inertial frames. In fact, the special theory of relativity is not only falsifiable, it is falsified, and was superceded in 1916 by a superior and more comprehensive theory. Nevertheless, strict epistemological scruples don't have a great deal of relevance to the actual day-to-day practice of science. From a more formal standpoint, it's interesting to consider the correspondence between the foundations of set theory and the theories of relativity. The archetypal example of a problematic concept in naive set theory was the notion of the "set of all sets". It soon became apparent to Cantor, Russell, and other mathematicians that this plausiblesounding notion could not consistently be treated as a set in the usual sense. The problem was recognized to be the self-referential nature of the concept. We can compare this to the general theory of relativity, which is compelled by the equivalence principle to represent the metric of spacetime as (so to speak) "the field of all fields". To make this more precise, recall that Newtonian gravity can be represented by a scalar field φ defined over a pre-existing metrical space, whose metric we may denote as g. The vacuum field equation is Lg(φ) = 0 where Lg signifies the Laplacian operator over the space with the fixed metric g. In general relativity the Laplacian is replaced by a more complicated operator Rg which, like the Laplacian, is effectively a differential operator whose components are evaluated on the spacetime with the metric g. However, in general relativity the field on which Rg operates is nothing but the spacetime metric g itself. In other words, the vacuum field equations are Rg(g) = 0. The entity Rg(g) is called the Ricci tensor in differential geometry, usually denoted in covariant form as Rµν. This highlights the essentially self-referential nature of the Einstein field equations, as opposed to the Newtonian field equations where the operator and the field being operated on are completely independent entities. It's interesting to compare this situation to

schematic representations of Goedel's formalization of arithmetic, leading to his proof of the Incompleteness Theorem. Given a well-defined mapping between single-variable propositional statements and the natural numbers (which Goedel showed is possible, though far from trivial), let Pn(w) denote the nth statement applied to the variable w. Since every possible proposition maps to some natural number, there is a natural number k such that Pk(w) represents the proposition that Pw(w) has no proof. But then what happens if we set the variable w equal to k? We see that Pk(k) represents that proposition that there is no proof of Pk(k), from which it follows that if there is no proof of Pk(k) then Pk(k) is true, whereas if there is a proof of Pk(k) then Pk(k) is false. Hence, assuming our system of arithmetic is self-consistent, so that it doesn't contain proofs of false propositions, we must conclude that Pk(k) is true but unprovable. Obviously the negation of Pk(k) must also be unprovable, assuming our arithmetic is consistent, so the proposition is strictly undecidable within the formal system encoded by our numbering scheme. The analogy between Goedel propositions Pk(k) and the field equations of general relativity Rg(g) = 0 should not be pressed too far, but it does hint at the real and profound subtleties that can arise when we allow self-referential statements. It's interesting that Einstein seems to have been mindful very early of the eventual necessity of such statements, although he deferred it for quite some time. Prior to 1905 many physicists were attempting to construct a purely electromagnetic theory of matter based on Maxwell's equations, according to which "the particle would be merely a domain containing an especially high density of field energy". However, in presenting the special theory of relativity Einstein carefully avoided proposing any particular theory as to the ultimate structure of matter, and showed that a purely kinematical interpretation could account for the relation between energy and inertia. He took this approach not because he was disinterested in the nature of matter, but because he recognized immediately that Maxwell's equations did not permit the derivation of the equilibrium of the electricity that constitutes a particle. Only different, nonlinear field equations could possibly accomplish such a thing. But no method existed for discovering such field equations without deteriorating into adventurous arbitrariness. So in 1905 Einstein took the more conservative route and merely(!) redefined the traditional concepts of time and space. A few years later he himself embarked on an adventure leading ultimately in 1915 to the non-linear field equations of general relativity, but even in this he managed to make important progress by sidestepping again the question of the ultimate constituency of matter and light. As he recalled in his Autobiographical Notes It seemed hopeless to me at that time to venture the attempt of representing the total field [as opposed to the pure gravitational field] and to ascertain field laws for it. I preferred, therefore, to set up a preliminary formal frame for the representation of the entire physical reality; this was necessary in order to be able to investigate, at least preliminarily, the effectiveness of the basic idea of general relativity.

In his later years it seems Einstein had decided he had made all the progress that could be made on this preliminary basis, and set about the attempt to represent the total field. He wrote the above comments in 1949, after a quarter-century of fruitless efforts to discover the non-linear equations for the "total field", including electromagnetism and matter, so he knew only too well the risks of deteriorating into adventurous arbitrariness.

4.7 The Inertia of Twins We have no direct intuition of simultaneity, nor of the equality of two durations. People who believe they possess this intuition are dupes of an illusion... The simultaneity of two events, the order of their succession, and the equality of two durations, are to be so defined that the enunciation of the natural laws may be as simple as possible. Poincare, The Value of Science, 1905 The most commonly discussed "paradox" associated with the theory of relativity concerns the differing lapses of proper time along two different paths between two fixed events. This is often expressed in terms of a pair of twins, one moving inertially from event A to event B, and the other moving inertially from event A to an intermediate event M, where he changes his state of motion, and then moves inertially from M to B, where it is found that the total elapsed time of the first twin exceeds that of the second. Much of the popular confusion over this sequence of events is simply due to specious reasoning. For example, if x,t and x',t' denote inertial rest frame coordinates respectively of the first and second twin (on either the outbound or inbound leg of his journey), some people are confused by the elementary fact that if those two coordinate systems are related according to the Lorentz transformation, then the partials (t'/t)x and (t/t')x' both have the same value. (For example, the unfortunate Herbert Dingle spent his retirement years on a pitiful crusade to convince the scientific community that those two partial derivatives must be the reciprocals of each other, and that therefore special relativity is logically inconsistent.) Other people struggle with the equally elementary algebraic fact that the proper time along any given path between two events is invariant under arbitrary Lorentz transformations. The inability to grasp this has actually led some eccentrics to waste years in a futile effort to prove special relativity inconsistent by finding a Lorentz transformation that does not leave the proper time along some path invariant. Despite the obvious fallacies underlying these popular confusions, and despite the manifest logical consistency of special relativity, it is nevertheless true that the so-called twins paradox, interpreted in a more profound sense, does highlight a fundamental epistemological shortcoming of the principle of inertia, on which both Newtonian mechanics and special relativity are based. Naturally if we simply stipulate that one of the twins is in inertial motion the entire time and the other is not, then the resolution of the "paradox" is trivial, but the stipulation of "inertial motion" for one of the twins begs the

very question that motivates the paradox (in its more profound form), namely, how are inertial worldlines distinguished from the set of all possible worldlines? In a sense, the only answer special relativity can give is that the inertial worldline between two events is the one with the greatest lapse of proper time, which is clearly of no help in resolving which of the twins' worldlines is "inertial", because we don't know a priori which twin has the greater lapse of proper time - that's what we're trying to determine! This circularity in the definition of inertia and the inability to justify the privileged position held by inertial worldlines in special relativity were among the problems that led Einstein in the years following 1905 to seek a broader and more coherent context for the laws of physics. The same kind of circular reasoning arises whenever we critically examine the concept of inertia. For example, when trying to decide if our region of spacetime is really flat, so that "straight lines" exist, we face the same difficulty. As Einstein said: The weakness of the principle of inertia lies in this, that it involves an argument in a circle: a mass moves without acceleration if it is sufficiently far from other bodies; we know that it is sufficiently far from other bodies only by the fact that it moves without acceleration. We could equally well substitute [has the greatest lapse of proper time] for [is sufficiently far from other bodies]. In either case the point is the same: special relativity postulates the existence of inertial frames and assigns to them a preferred role, but it gives no a priori way of establishing the correct mapping between this concept and anything in reality. This is what Einstein was referring to when he said "In classical mechanics, and no less in the special theory of relativity, there is an inherent epistemological defect...". He illustrates this with a famous thought experiment involving two relatively spinning globes, discussed in Chapter 4.1. (The term "thought experiment" might be regarded as an oxymoron, since the epistemological significance of an experiment is its empirical quality, which a thought experiment obviously doesn't possess. Nevertheless, it's undeniable that scientists have made good use of this technique - along with occasionally making bad use of it.) The puzzling asymmetry of the spinning globes is essentially just another form of the twins paradox, where the twins separate and re-converge (one accelerates away and back while the other remains stationary), and they end up with asymmetric lapses of proper time. How can the asymmetry be explained? In 1916 Einstein thought that The only satisfactory answer must be that the physical system consisting of S1 and S2 reveals within itself no imaginable cause to which the differing behavior of S1 and S2 can be referred. The cause must therefore lie outside the system. We have to take it that the general laws of motion...must be such that the mechanical behavior of S1 and S2 is partly conditioned, in quite essential respects, by distant masses which we have not included in the system under consideration. It should be noted that the strongly Machian attitude conveyed by this passage was subsequently tempered for Einstein when he realized that in the general theory of

relativity it may be necessary to attribute the "essential conditioning" to boundary conditions rather than distant masses. Nevertheless, this quotation serves to demonstrate how seriously Einstein took the question, which, of course, is as applicable to the twins paradox as it is to the two-globe paradox. The above “weighty argument from the theory of knowledge” was the first reason cited by Einstein (in 1916) for the need to go beyond special relativity in order to arrive at a suitable conceptual framework. The second reason was the apparent impossibility of doing justice, within the context of special relativity, to the equivalence principle relating gravitation and acceleration. The first of these reasons bears most directly on the twins paradox, although the problem of reconciling acceleration with gravity inevitably enters the picture as well, since we can't avoid the issue of gravitation as soon as we contemplate acceleration  assuming we accept the equivalence principle. From these considerations it’s clear that special relativity could never have been more than a transitional theory, since it was not comprehensive enough to justify its own conclusions. The question of whether general relativity is required to resolve the twins paradox has long been a subject of spirited debate. On one hand, Einstein wrote a paper in 1918 to explain how the general theory accounts for the asymmetric aging of the twins by means of the “gravitational fields” that appear with respect to accelerated coordinates attached to the traveling twin, and Max Born recounted this analysis in a popular book, concluding that "the clock paradox is due to a false application of the special theory of relativity, namely, to a case in which the methods of the general theory should be applied". On the other hand, many people object vigorously to any suggestion that special relativity is inadequate to satisfactorily resolve the twins paradox. Ultimately the answer depends on what sort of satisfaction is being sought, viz., on whether the paradox is being presented as a challenge to the consistency of special relativity (as is Dingle's fallacy) or to the completeness of special relativity. If we're willing to accept uncritically the existence and identifiability of inertial frames, and their preferred status, and if we are willing to exclude any consideration of gravity or the equivalence principle, then we can reduce the twins paradox to a trivial exercise in special relativity. However, if it is the completeness (rather than the consistency) of special relativity that is at issue, then the naive acceptance of inertial frames is precisely what is being challenged. In this context, we can hardly justify the exclusion of gravitation, considering that the very same metrical field which determines the inertial worldlines also represents the gravitational field. Notice that the typical statement of the twins paradox does not stipulate how the galaxies in the universe along with the cosmological boundary conditions that determine the metrical field are dynamically configured relative to the twins. If every galaxy in the universe were “moving” in tandem with the "traveling twin", which (if either) of the twins' reference frames would be considered inertial? Obviously special relativity is silent on this point, and even general relativity does not give an unequivocal answer. Weinberg asserts that "inertial frames are determined by the mean cosmic gravitational field, which is in turn determined by the mean mass density of the stars", but the second clause is not necessarily true, because the field equations generally require some additional information (such as boundary conditions) in order to yield definite results.

The existence of cosmological models in which the average matter of the universe rotates (a fact proven by Kurt Gödel) shows that even general relativity is incomplete, in the sense that it is subject to global conditions with considerable freedom. General relativity may not even give a unique field for a given (non-spherically symmetric) set of boundary conditions and mass distribution, which is not surprising in view of the possibility of gravitational waves. Thus even if we sharpen the statement of the twins paradox to specify how the twins are moving relative to the rest of the matter in the universe, the theory of relativity still doesn't enable us to say for sure which twin is inertial. Furthermore, once we recognize that the inertial and gravitational field are one and the same, the twins paradox becomes even more acute, because we must then acknowledge that within the theory of relativity it's possible to contrive a situation in which two identical clocks in identical local circumstances (i.e., without comparing their positions to any external reference) can nevertheless exhibit different lapses in proper time between two given events. The simplest example is to place the twins in intersecting orbits, one circular and the other highly elliptical. Each twin is in freefall continuously between their periodic meetings, and yet they experience different lapses of proper time. Thus the difference between the twins is not a consequence of local effects; it is a global effect. At any point along those two geodesic paths the local physics is identical, but the paths are embedded differently within the global manifold, and it is the different embedding within the manifold that accounts for the difference in proper length. (The same point can be made by referring to a flat cylindrical spacetime.) This more general form of the twins paradox compels us to abandon the view that physical phenomena are governed solely by locally sensible influences. (Notice, however, that we are forced to this conclusion not by logical contradiction, but only by our philosophical devotion to the principle of sufficient cause, which requires us to assign like physical causes to like physical effects.) Likewise the identification of gravity with local spacetime curvature is untenable, as shown by the fact that a suitable arrangement of gravitating masses can produce an extended region of flat spacetime in which the metrical field is nevertheless accelerating in the global sense, and we surely would not regard such a region as free of gravitation. It is fundamentally misguided to exercise such epistemological concerns within the framework of special relativity, because special relativity was always a provisional theory with recognized epistemological short-comings. As mentioned above, one of Einstein's two main two reasons for abandoning special relativity as a suitable framework for physics was the fact that, no less than Newtonian mechanics, special relativity is based on the unjustified and epistemologically problematical assumption of a preferred class of reference frames, precisely the issue raised by the twins paradox. Today the "special theory" exists only (aside from its historical importance) as a convenient set of widely applicable formulas for important limiting cases of the general theory, but the phenomenological justification for those formulas can only be found in the general theory. This is true even if we posit the absence of gravitational effects, because the question at issue is essentially the origin of inertia, i.e., why one worldline is inertial while another is not, and the answer unavoidably involves the origin and significance of the background

metric, even in the absence of curvature. The special theory never claimed, and was never intended, to address such questions. The general theory attempts to provide a coherent framework within which to answer such questions, but it's not clear whether the attempt is successful. The only context in which general relativity can give (at least arguably) a complete explanation of inertia is a closed, finite, unbounded cosmology, but the observational evidence doesn't (at present) clearly support this hypothesis, and any alternative cosmology requires some principle(s) outside of general relativity to determine the metrical configuration of the universe. Thus the twins paradox is ultimately about the origin and significance of inertia, and the existence of a definite metrical structure with a preferred class of worldlines (geodesics). In the general theory of relativity, spacetime is not simply the totality of all the relations between material objects. The spacetime metric field is endowed with its own ontological existence, as is clear from the fact that gravity itself is a source of gravity. In a sense, the non-linearity of general relativity is an expression of the ontological existence of spacetime itself. In this context it's not possible to draw the classical distinction between relational and absolute entities, because spatio-temporal relations themselves are active elements of the theory. We should also mention another common objection to the relativistic treatment of the twins, based not on any empirical disagreement, but on linguistic and metaphysical preferences. It is pointed out that we can, without logical contradiction, posit the existence of a unique, absolute, and true metaphysical time at every location, and we can account for the differences between the elapsed times on clocks that have followed different paths simply by stipulating that the rate of a clock depends on its absolute state of motion (defined relative to, for instance, the local frame in which the presumably global cosmic background radiation is maximally isotropic). Indeed this was essentially the view advocated by Lorentz. However, as discussed at the end of Section 1.5, postulating a metaphysical “truth” along with whatever physical laws are necessary to account for why the observed facts differ from the postulated “truth” is not generally useful, except as a way of artificially reconciling our experience with any particular metaphysical truth that we might select. The relativistic point of view is based on purely local concepts, such as that of an “ideal clock” corrected for all locally sensible conditions, recommended to us by the empirical fact that all observable aspects of local physical phenomena – including the rates of temporal progression – exhibit the same dependence on their state of inertial motion (which is not a locally sensible condition). This is the physical symmetry presented to us, and we are certainly justified in exploiting this symmetry to simplify and clarify the enunciation of physical laws. 4.8 The Breakdown of Simultaneity I have yielded: Instruct my daughter how she shall persever, that time and place with this deceit so lawful may prove coherent. William Shakespeare, 1603

We've seen how the operational time convention enables us to define surfaces of simultaneity with respect to any given inertial frame. However, if we try to apply this procedure to a set of accelerating bodies the concept breaks down. The problem is illustrated in the spacetime diagram shown below.

This drawing shows a family of worldlines, each having the identical history of velocity as a function of time relative to the inertial coordinates. By sending light beams back and forth to its neighboring worldlines, an observer following path B can determine that he is equidistant from A and C. Likewise an observer on C is equidistant between B and D, and an observer on D is equidistant from C and E. However, due to the change in velocity of these worldlines, an observer on C can not conclude that he is equidistant from A and E. This breakdown of the well-defined locus of simultaneity is unavoidable in accelerating systems, because the operational procedure defining simultaneity involves a non-zero lapse of time for spatially separate objects, so the simultaneity relations change during the performance of the procedure. Of course, the greater the distance between objects, the greater the change in velocity (and simultaneity relations) during the performance of a synchronization procedure. Another illustration of this problem is shown below, where the instantaneous loci of simultaneity of an abruptly accelerated worldline are seen to intersect each other (on the left), so that a given distant event is assigned multiple times of occurrence. Furthermore, events in the region "R" on the right do not properly correspond to any time according to the accelerating worldline's instantaneous inertial time, because at the instant of acceleration his locus of simultaneity jumps abruptly.

Obviously any amount of relative "skew" between the planes of simultaneity for a given worldline will result in interference at some distance, producing non-unique time coordinates. However, if the velocity of our worldline varies continuously (instead of abruptly), then for some limited region the planes of simultaneity will be advancing forward in time faster than they are "tilting" backwards, so over this limited region we can, if we choose, make use of these planes of simultaneity for the time labels of events. This situation is illustrated below.

x We can easily determine the approximate limit for unique time labels with this kind of coordinate system by noting that if the velocity changes by amount dv/c during a time interval dt, then the relative slope of the new plane of simultaneity is c/dv, so it intersects with the original plane of simultaneity at a distance dx = (cdt)(c/dv) = c2/(dv/dt). Since a = dv/dt is the acceleration, we can estimate that this accelerating system of coordinates is coherent out to distances on the order of c2/a. As an example of the use of accelerating coordinate systems and the breakdown of inertial simultaneity, consider a circular Sagnac device as described in Section 2.7. As we've seen, each point on the rim of the rotating disk can be associated with an

instantaneously co-moving inertial coordinate system, each with its own surfaces of simultaneity. However, since each point of the disk is accelerating with respect to each other point, there is no coherent simultaneity (in the inertial sense) shared by any two points. If we analytically continue the local simultaneity from one point to the next around the perimeter, the result is an open helical surface as indicated below:

The worldline of a particular point on the rim is shown by the helical curve AB, and the shallower helix represents the analytically continued surface of inertial simultaneity. (It's interesting to compare this construction with Riemann surfaces in complex function analysis.) Of course, we can dispense with the use of local inertial simultaneity to define our constant-t coordinate surfaces, and simply define an arbitrary system of space and time coordinates in terms of which a rotating disk is stationary (for example), but we then must be careful to correctly account for non-inertial aspects of these accelerating coordinates, particularly with regard to the meanings of spatial lengths. The usual intuitive definition of the spatial length of an object (such as the perimeter of the rim) is the absolute length of a locus of inertially simultaneous points of that object, so it depends on the establishment of a slice of "inertial simultaneity" over the entire rim. If we use inertial coordinates this is easy, but if we use non-inertial coordinates (such as those in which the rotating disk is stationary), then no surface of inertial simultaneity coincides with our surfaces of constant time parameter. In fact, this is essentially the definition of non-inertial coordinates. So, we will obviously be unable to define a coherent locus of inertial simultaneity over the whole disk as a surface of constant time parameter when working with non-inertial coordinates. One consequence of this is the fact that the spatial length of a path becomes dependent on the speed of the path. We are accustomed to this for temporal lengths, i.e., the length of time around the rim might be 30 seconds or 2 hours or 1 nanosecond, etc., depending on how fast we are going relative to the disk, how fast the rim is spinning, in which direction it is spinning, and so on. Likewise the spatial length of a path around the rim (in terms of some particular coordinates) depends on the speed of the path. This shouldn't be surprising, because the decomposition of spacetime into separate spatial and temporal

components is not unique, i.e., there are multiple equally self-consistent decompositions. Since this is often a source of confusion, it's worthwhile to describe how this works in detail. Let's first establish inertial cylindrical coordinates in 2+1 spacetime, using polar coordinates (r,θ) for the space (where θ is the angular coordinate), and t for time. The metric in terms of these inertial coordinates is

and for any fixed time t the purely spatial metric is

So, to find the "length" of any spacelike curve, such as the perimeter of a spinning disk of radius rd centered at the origin, we simply integrate ds over this curve at the fixed value of t. For a circular disk, r = rd is constant, so dr = 0, and the spatial metric is simply ds = rd dθ, which we integrate from θ = 0 to 2π to give the length 2π rd. Now let's look at this situation in terms of a system of coordinates in which the spinning disk is stationary, i.e., such that a fixed point anywhere on the disk maintains constant spatial coordinates for all values of the temporal coordinate. Taking the most naive and simplistic approach, let's define the new coordinates T,R,α by the relations

where ω is a constant, denoting the angular speed of these coordinates with respect to the inertial t,r,θ coordinates. We also have the differentials

Substituting these expressions into the metric equation gives

According to these coordinates, a spatial length S must be given by integrating the absolute spacelike differential using the metric along some constant-T surface, i.e., with dT = 0, where the metric is

Again for the perimeter of the disk we get 2π Rd = 2π rd. Notice that our constant-T surfaces are also constant-t surfaces, so this perimeter length agrees with our previous result, and of course it doesn't matter which direction we integrate around the perimeter.

Incidentally, letting v = Rd ω denote the velocity of the rim with respect to the original inertial coordinates, the full spacetime metric for the rim (R = Rd) in terms of the rotating coordinates is

For a point fixed on the rim we have dα = 0, and so

which confirms that the lapse of proper time for a point fixed on the rim of the rotating disk is

times the lapse of T (and therefore of t).

Now let's send light beams around the perimeter in opposite directions. For lightlike paths we have dτ = 0, so the path of light must satisfy

The purely spatial component is dS = Rd dα, so we can make this substitution and divide both sides by (dT)2 to give

The quantity dS/dT is the "speed of light" in terms of these rotating non-inertial coordinates. Also, from the definitions we have

where dθ/dt is the angular velocity of the light at radius Rd with respect to the inertial coordinates, so it equals 1/Rd (noting that c = 1 in our units), with the sign depending on whether the light is clockwise or counter-clockwise. Substituting into the previous expression gives

Letting C = dS/dT denote the speed of light with respect to these rotating non-inertial coordinates, we therefore have C = 1  v, where again the sign depends on the direction of the light relative to the direction of rotation of the disk. Does this analysis lead to some kind of paradox? It indicates that the non-inertial "speed of light" with respect to these rotating coordinates is not equal to 1, and in fact the ratio of the speeds in the two directions is (1+v)/(1v), but of course this doesn't conflict with special relativity, because these are not inertial coordinates (due to their rotation). However, suppose we increase Rd and decrease w in proportion so that the rim speed v remains constant. The above formulas still apply for arbitrarily large Rd and small angular speed w, and yet the speed ratio remains the same, (1+v)/(1v). Does this conflict with special relativity in the limit as the radius goes to infinity and the angular speed of the rim goes to zero? Clearly not, since we saw in Section 2.7 that if t1 and t2 denote the travel times for light pulses circling the disk in opposite directions, as measured by a clock at a fixed point on the rim, so that t2/t1 = (1+v)/(1v), then we have t2/t1  1 = ϕ/π, where ϕ is the angular travel of the disk during the transit of light. In other words, the observed ratio of travel times around the rim always differs from 1 by an amount proportional to the angular travel of the disk during the transit of light. Thus the net acceleration (change of velocity) of the rim observer during the measurement remains in constant proportion to the measured anisotropy of the transit times. However, even without waiting for the light rays to circle the disk and report their anisotropy, don't the above formulas imply that the speeds of light in the two directions are in the ratio of (1+v)/(1v) instantaneously with respect to our rotating coordinates, and don't the rotating coordinates approach being inertial coordinates as Rd increases while holding v constant? Yes and no. Both sets of coordinates use the same time t = T, but they use different space coordinates, s and S. For the perimeter of the disk we have

where W = dθ/dt. Thus the ratio dS/ds of spatial distances along a given "path" depends on the angular speed W of the path. Recall that for a signal travelling at c = 1 (with respect to the inertial coordinates) around the perimeter we have W = 1/rd, and so

This is consistent with the velocity ratio

This shows that the "spatial distances" around the perimeter are different in the two directions. But we saw earlier that "the spatial distance" was independent of the direction

in which we integrated around the perimeter, even in the rotating coordinate system, so does this indicate an inconsistency? No, because, as noted above, the ratio dS/ds along a given path depends on the speed of the path. We have dS/ds = 1 + w/W, and for the perimeter of the disk with rim speed v and for a path with speed V, this gives

If the path is lightlike, we have V = 1 and so dS/ds = 1  v, whereas when we considered the purely spatial distance around the perimeter we took the "instantaneous" distance, i.e., we took a spacelike path with V = , in which case dS/ds = 1 in both directions. This explains quantitatively what we mean when we say that we are measuring different things, depending on what spacetime path is having it's "spatial length" evaluated. Just as the temporal length of a path around the rim depends on the speed of the path, so too does the spatial length. By the way, notice that if we integrate the spatial component of a path whose velocity V (relative to the original inertial coordinates) is the same as the rim speed itself, so that v = V, then obviously we will never move with respect to the disk in one direction, so dS = 0 and therefore dS/ds = 0, whereas in the other direction we have dS/ds = 2. Similarly if V = 0 we will never move relative to the original coordinates, i.e., ds = 0 and therefore dS/ds is infinite along such a path. 5.1 Vis Inertiae It is indeed a matter of great difficulty to discover, and effectively to distinguish, the true motions of particular bodies from the apparent, because the parts of that immovable space in which those motions are performed do by no means come under the observation of our senses. Yet the thing is not altogether desperate… Isaac Newton, 1687 According to Newtonian mechanics a particle moves without acceleration unless acted upon by a force, in which case the particle undergoes acceleration proportional to the applied force. The acceleration is defined as a vector whose components are the second derivatives of the particle’s space coordinates with respect to the time coordinate, which would seem to imply that the acceleration of a particle – and hence the force to which the particle is subjected – depends on our choice of coordinate systems. Of course, Newton’s law is understood to be applicable only with respect to a special class of coordinate systems, called the inertial coordinate systems, which all give the same acceleration, and hence the same applied force, for any given particle. Thus the restriction to inertial coordinate systems enables us to regard accelerations and the corresponding forces as absolute.

However, even in the context of Newtonian mechanics it is sometimes convenient to set aside the restriction to inertial coordinate systems, and as a result the distinction between physical forces and coordinate-based accelerations becomes ambiguous. For example, consider a particle whose position in space is expressed by the vector

where i, j, k are orthogonal unit vectors for a coordinate system with fixed origin, and x(t), y(t), z(t) are scalar functions of the time coordinate t. Obviously if these basis vectors are unchanging, the derivatives of r are simply given by

but if the basis vectors may be changing with time (due to rotation of the coordinate axes) the first derivative of r by the chain rule is

The quantity in the first parentheses is the partial derivative of r with respect to t at constant basis vectors i, j, k, so we denote it as ∂r/∂t. The quantity in the second parentheses is the partial derivative of r with respect to t at constant x, y, z, which means it represents the differential change in r due just to the rotation of the axes. This change is perpendicular to both r and the angular velocity vector ω, and its magnitude is ωr times the sine of the angle between ω and r, as indicated in the figure below.

Therefore, the total derivative of r with respect to t can be written as

Notice that this applies to any vector (compare with equation 4b in Appendix 4, noting that the angular velocity serves here as the “Christoffel symbol”), so we can immediately differentiating again with respect to t, giving the total acceleration

Noting that the cross product is distributive, and that the chain rule applies to derivatives of cross products, this can be written as

This was based on the premise that the origin of the x,y,z coordinates was stationary, but if we stipulate that the origin is at position R(t) with respect to some fully inertial coordinate system, then the particle’s position in terms of these inertial coordinates is R+r and the total acceleration of the particle includes the second derivative of R. Thus Newton’s second law, which equate the net applied force F to the mass times the acceleration (defined in terms of an inertial coordinate system), is

If our original xyz coordinate system was inertial, then all the terms on the right hand side except for the second would vanish, and we would have the more familiar-looking expression

Now, if we are determined to organize our experience based on this simple formulation, for any arbitrary choice of coordinate systems, we can do so, but only by introducing new “forces”. We need only bring the other four terms from the right hand side of the previous equation over to the left side, and call them “forces”. Thus we define the net force on the particle to be

The first term on the right side is the net of the “physical forces”, whereas the remaining terms are what we might call “inertial forces”. They are also often called “fictitious forces”. The second term is the linear acceleration force, such as we may imagine is pulling us downward when standing in an elevator that is accelerating upward. The fourth term is called the Coriolis force, and the fifth term is sometimes called the centrifugal

force. (The third term apparently doesn’t have a common name, perhaps because the angular velocity in many practical circumstances is constant.) On this basis the Newtonian equation of motion in terms of an arbitrary Cartesian coordinate system has the simple form

It’s interesting to consider why we usually don’t adopt this point of view. It certainly gives a simpler general equation of motion, but at the expense of introducing several new “forces”, beyond whatever physical forces we had already identified in F. Our preference for the usual (more complicated or more restrictive) formulation of Newton’s law is due to our desire to associate “physical forces” with some proximate substantial entity. For example, the force of gravity is attributed to the pull of some massive body. The force of the wind is attributed to the impulse of air molecules. And so on. The “inertial forces” can’t be so easily attributed to any proximate entity, so unless we want to pursue the Machian idea of associating them with the changing relations to distant objects in the universe, we are left with a “force” that has no causative substance, so we tend to regard such forces as fictitious. Nevertheless, it’s worth remembering that the distinction between “physical” and “fictitious” forces is to some extent a matter of choice, as is our preference for inertial coordinate systems to measure time and space. To illustrate some of the consequences of these ideas, recall that the Sagnac effect was described in Section 2.7 from the standpoint of various systems of inertial coordinates, and in Section 4.8 in terms of certain non-inertial coordinate systems, but in all these cases the analyses was based on the premise that the “true” measures of time and space were based on inertial coordinate systems. We can now examine some aspects of a Sagnac device from a more general standpoint of arbitrary curvilinear coordinates, leading to the idea that the “physical” effects of acceleration can be absorbed into the metrical structure of spacetime itself. In a square or triangular Sagnac device the light ray going from one mirror to the next in the direction of rotation passes through the interior of the polygon when viewed from a non-rotating frame of reference. This implies that the light ray, traveling in a straight line, diverges from the rim of the Sagnac device and then converges back to the next vertex. On the other hand, if we consider the same passage of light from the standpoint of an observer riding along on the rotating device, the beam of light goes from one end of the straight edge to the other, but since the light beam diverges from the edge and passes through the interior of the polygon, it follows that from the standpoint of the rotating observer the ray of light is emitted from one vertex and curves in toward the center of rotation and then curves back to reach the next mirror. Likewise, the counter-rotating ray travels outside the polygon, so when viewed from the rotating frame it appears to curve outward (away from the center) and then back. So, on a typical segment between two mirrors M1 and M2, when viewed from the rotating frame of reference, the two light rays follow curved paths as shown in the drawing below:

The amount of this "warping" of the light rays depends mainly on the shape of the path and the speed of the rim, so if we have significant warping of light rays with small r, the warping won't be reduced by increasing the radius while holding the mirror speed constant. Any bending of light rays would reveal to an observer that the segment M1 to M2 is not inertial, so if we want to construct a scenario in which an observer sitting on a mirror is "inertial for all practical purposes", we need to make each segment subtend a very small arc of the disk and/or limit the rim speed, as well as restricting our attention to a short enough span of time so that the rotating observer doesn't rotate through an appreciable angle. One thing that sometimes misleads people when assessing how things look from the perspective of a rim observer is that they believe it was only necessary to consider the centripetal acceleration, v2/R, of each point on the rim, but clearly if our objective is to assess the speed of light with respect to a coordinate system in which an observer at a particular point on the rim is stationary, we must determine the full accelerations of the points on the rim relative to that system of coordinates. This includes the full five-term expression for the acceleration of a moving point relative to an arbitrarily moving coordinate system. On that basis we find that the light rays are subjected to an "acceleration" field whose dominant term has a magnitude in the direction of travel of

where θ is the angular distance from the observer. (Note that this acceleration is defined on the basis of "slow-measuring-rod- transport" lengths around the loop, combined with time intervals corresponding to the rim observer's co-moving inertial frame. Also, note that "vc/R" is characteristic of the Coriolis term, as opposed to v2/R for the centripetal term.) Integrating these accelerations in both directions gives the pseudo-speeds (i.e., the speeds relative to the accelerating coordinates) of the two light beams as a function of position in the acceleration field

The average pseudo-speeds of the co- and counter-rotating beams around the loop are therefore cv and c+v respectively, which gives a constant "anisotropic ratio". However, these speeds differ from c at any particular point only in proportion to the pseudogravitational potential relative to the observer's location. The amplitude of the acceleration field averaging cv/R does indeed go to zero as the radius R increases while holding the rim speed v constant, but the integral of (cv/R)sin(θ) over the entire loop still always gives the speed distribution around the rim noted above, with the maximum anisotropy occurring at the opposite point on the circumference (where the pseudogravitational potential difference relative to the observer is greatest), and this gives the constant "anisotropic ratio". All of this is in perfect accord with the principles of relativity. Of course, if the problem is treated in terms of inertial coordinates, then acceleration isn't an issue, and the solution is purely kinematical. However, our purpose here is to examine the consequences of re-casting the Sagnac effect into a system of non-inertial coordinates in which an observer sitting on the rim is stationary, which means we need to absorb into the coordinates not only his circular translatory motion but also his rotation. This introduces fictitious forces and acceleration/gravitational fields which must be taken into account. Needless to say, there's no need to go to all this trouble, since the treatment in an inertial frame is completely satisfactory. The only reason for re-casting this in noninertial coordinates is to illustrate how the general relativistic theory accommodates the use of arbitrary coordinates. Now, it's certainly true that there is no single coherent set of coordinates with respect to which all the points on the disk are fully stationary, where the term "coherent" signifies a single unique locus of inertial simultaneity. We can, however, construct a coherent set of coordinates with respect to which one particular point on the rim is fully stationary, and then use slow-transport methods for assigning spatial distances between any two mirrors, and combine this with the observer's proper time as the basis for defining velocities, accelerations, etc, with respect to the rim observer's accelerating coordinates. To understand the nature of the pseudo-gravitational fields that exist with respect to these accelerating coordinates, carry out the transformation to the observer's system in two steps. First, construct a non-rotating system of coordinates in which the observer is constantly at the origin. Thus we have absorbed his circular motion but not his rotation into these coordinates. The result is illustrated below, where the disk is regarded as rotating about the "stationary" observer riding on the rim, and the circles represent the disk position at different "times" (relative to these coordinates).

So, at this stage, each point on the disk is twirling around the observer at an angular speed of w (the same as the speed of the disk in the hub-centered coordinates). If we draw the spiro-graph traced out by a point moving around the circle at speed c while the circle rotates slightly about the observer with angular speed w = v/R, we see that the co-and counter-rotating directions have different path lengths, precisely accounting for the difference in travel times. Thus, even with respect to these accelerating coordinates (in which the observer has a fixed position), the Sagnac effect is still due strictly to the difference in path length, which demonstrates how directly the Sagnac effect is due not just to acceleration in general but specifically to rotation. Next, we absorb the rotation of the disk into our coordinates, so the disk is no longer twirling around the observer. However, by absorbing the twirl of the disk into the coordinates, we introduce an anisotropic pseudo-gravitational field (relative to the "stationary" observer), for particles or light rays moving around the loop. The fact that the "speed of light" in these coordinates can differ from c is exactly analogous to how the distant stars have enormous speeds with respect to the Earth's rotating coordinates, and that speed is attributed to the enormous pseudo-gravitational potential which exists at those distances with respect to the Earth's coordinates. Similarly, relative to our rim observer, the maximum gravitational potential difference is at the furthest point on the circle, i.e., the point diametrically opposite on the disk, which is also where the greatest anisotropy in the "speed of light" (with respect to these particular non-inertial coordinates) occurs. Thus, to first order with relatively small mirror speeds, the light rays are subjected to an "acceleration" field whose magnitude in the directions of travel is (vc/R)sin(θ) where θ is the angular distance from the observer. Now, it might seem that we are unable to account for the anisotropic effect of acceleration, on the assumption that all the points on the rim are subject to the same acceleration, so there can be no differential effect for light rays moving in opposite directions around the loop. However, that's not the case, for two reasons. First, the acceleration (with respect to these accelerating coordinates) is not constant, and second it is the Coriolis (not the centripetal) acceleration that produces the dominant effect. The Coriolis acceleration is the cross product of the rotation (pseudo) vector w with the velocity vector of the object in question, and this has an opposite sense depending on whether the object (or light ray) is moving in the co-rotating or counterrotating direction. Of course, both directions eventually encounter the same amount of positive and negative

acceleration, but in the opposite order. Thus, they both start out at c, and one experiences an increase in velocity of +v followed by a decrease of -v, whereas the other drops down by -v first and then increases by +v. Thus their accelerations and velocities as functions of angular position are as shown below:

The average speeds of the co- and counter-rotating beams around the loop are therefore cv and c+v respectively, which gives the constant "anisotropic ratio". Notice that the speeds differ from c only where there is significant pseudo-gravitational potential relative to the observer's location (just as with the distant stars, and of course the relation is reciprocal). The intensity of the acceleration field is on the order of cv/R, which does indeed go to zero as the radius R increases while holding the rim speed v constant, but the integral of (cv/R)sin(θ) over the entire loop still always gives the speed distribution around the rim noted above, with the maximum anisotropy occurring at the opposite point on the circumference (where the pseudo-gravitational potential difference relative to the observer is greatest), and this gives the constant "anisotropic ratio". It's also worth noting that the anisotropic ratio of speeds given by this pseudogravitational potential corresponds precisely to the anisotropic distances when the Sagnac device is analyzed with respect to the instantaneously co-moving inertial frame of the rim observer. 5.2 Tensors, Contravariant and Covariant Ten masts at each make not the altitude which thou hast perpendicularly fell. Thy life’s a miracle. Speak yet again.

Shakespeare One of the most important relations involving continuous functions of multiple continuous variables (such as coordinates) is the formula for the total differential. In general if we are given a smooth continuous function y = f(x1,x2,...,xn) of n variables, the incremental change dy in the variable y resulting from incremental changes dx1, dx2, ..., dxn in the variables x1, x2, ... ,xn is given by

where y/xi is the partial derivative of y with respect to xi. (The superscripts on x are just indices, not exponents.) The scalar quantity dy is called the total differential of y. This formula just expresses the fact that the total incremental change in y equals the sum of the "sensitivities" of y to the independent variables multiplied by the respective incremental changes in those variables. (See the Appendix for a slightly more rigorous definition.) If we define the vectors

then dy equals the scalar (dot) product of these two vectors, i.e., we have dy = gd. Regarding the variables x1, x2,..., xn as coordinates on a manifold, the function y = f(x1,x2,...,xn) defines a scalar field on that manifold, g is the gradient of y (often denoted as y), and d is the differential position of x (often denoted as dx), all evaluated about some nominal point [x1,x2,...,xn] on the manifold. The gradient g = y is an example of a covariant tensor, and the differential position d = dx is an example of a contravariant tensor. The difference between these two kinds of tensors is how they transform under a continuous change of coordinates. Suppose we have another system of smooth continuous coordinates X1, X2, ..., Xn defined on the same manifold. Each of these new coordinates can be expressed (in the region around any particular point) as a function of the original coordinates, Xi = Fi(x1, x2, ..., xn), so the total differentials of the new coordinates can be written as

Thus, letting D denote the vector [dX1,dX2,...,dXn] we see that the components of D are related to the components of d by the equation

This is the prototypical transformation rule for a contravariant tensor of the first order. On the other hand, the gradient vector g = y is a covariant tensor, so it doesn't transform in accord with this rule. To find the correct transformation rule for the gradient (and for covariant tensors in general), note that if the system of functions Fi is invertible (which it is if and only if the determinant of the Jacobian is non-zero), then the original coordinates can be expressed as some functions of these new coordinates, xi = fi(X1, X2, ..., Xn) for i = 1, 2, .., n. This enables us to write the total differentials of the original coordinates as

If we now substitute these expressions for the total coordinate differentials into equation (1) and collect by differentials of the new coordinates, we get

Thus, the components of the gradient of g of y with respect to the Xi coordinates are given by the quantities in parentheses. If we let G denote the gradient of y with respect to these new coordinates, we have

This is the prototypical transformation rule for covariant tensors of the first order. Comparing this with the contravariant rule given by (2), we see that they both define the transformed components as linear combinations of the original components, but in the contravariant case the coefficients are the partials of the new coordinates with respect to the old, whereas in the covariant case the coefficients are the partials of the old coefficients with respect to the new. The key attribute of a tensor is that it's representations in different coordinate systems depend only on the relative orientations and scales of the coordinate axes at that point, not on the absolute values of the coordinates. This is why the absolute position vector pointing from the origin to a particular object in space is not a tensor, because the components of its representation depend on the absolute values of the coordinates. In contrast, the coordinate differentials transform based solely on local information. So far we have discussed only first-order tensors, but we can define tensors of any order. One of the most important examples of a second-order tensor is the metric tensor. Recall that the generalized Pythagorean theorem enables us to express the squared differential distance ds along a path on the spacetime manifold to the corresponding differential components dt, dx, dy, dz as a general quadratic function of those differentials as

follows

Naturally if we set g00 = g11 = g22 = g33 = 1 and all the other gij coefficients to zero, this reduces to the Minkowski metric. However, a different choice of coordinate systems (or a different intrinsic geometry, which will be discussed in subsequent sections) requires the use of the full formula. To simplify the notation, it's customary to use the indexed variables x0, x1, x2, x3 in place of t, x, y, z respectively. This allows us to express the above metrical relation in abbreviated form as

To abbreviate the notation even more, we adopt Einstein's convention of omitting the summation symbols altogether, and simply stipulating that summation from 0 to 3 is implied over any index that appears more than once in a given product. With this convention the above expression is written as

Notice that this formula expresses something about the intrinsic metrical relations of the space, but it does so in terms of a specific coordinate system. If we considered the metrical relations at the same point in terms of a different system of coordinates (such as changing from Cartesian to polar coordinates), the coefficients gµν would be different. Fortunately there is a simple way of converting the gµν from one system of coordinates to another, based on the fact that they describe a purely localistic relation among differential quantities. Suppose we are given the metrical coefficients gµν for the coordinates xα, and we are also given another system of coordinates yα that are defined in terms of the xα by some arbitrary continuous functions

Assuming the Jacobian of this transformation isn't zero, we know that it's invertible, and so we can just as well express the original coordinates as continuous functions (at this

point) of the new coordinates

Now we can evaluate the total derivatives of the original coordinates in terms of the new coordinates. For example, dx0 can be written as

and similarly for the dx1, dx2, and dx3. The product of any two of these differentials, dxµ and dxν, is of the form

(remembering the summation convention). Substituting these expressions for the products of x differentials in the metric formula (5) gives

The first three factors on the right hand side obviously represent the coefficient of dyαdyβ in the metric formula with respect to the y coordinates, so we've shown that the array of metric coefficients transforms from the x to the y coordinate system according to the equation

Notice that each component of the new metric array is a linear combination of the old metric components, and the coefficients are the partials of the old coordinates with respect to the new. Arrays that transform in this way are called covariant tensors. On the other hand, if we define an array Aµν with the components (dxµ/ds)(dxν/ds) where s denotes a path parameter along some particular curve in space, then equation (2) tells us that this array transforms according to the rule

This is very similar to the previous formula, except that the partial derivatives are of the new coordinates with respect to the old. Arrays whose components transform according to this rule are called contra-variant tensors. When we speak of an array being transformed from one system of coordinates to another, it's clear that the array must have a definite meaning independent of the system of coordinates. We could, for example, have an array of scalar quantities, whose values are the same at a given point, regardless of the coordinate system. However, the components of the array might still be required to change for different systems. For example, suppose the temperature at the point (x,y,z) in a rectangular tank of water is given by the scalar field T(x,y,z), where x,y,z are Cartesian coordinates with origin at the geometric center of the tank. If we change our system of coordinates by moving the origin, say, to one of the corners of the tank, the function T(x,y,z) must change to T(xx0, yy0, zz0). But at a given physical point the value of T is unchanged. Notice that g20 is the coefficient of (dy)(dt), and g02 is the coefficient of (dt)(dy), so without loss of generality we could combine them into the single term (g20 + g02)(dt)(dy). Thus the individual values of g20 and g02 are arbitrary for a given metrical equation, since all that matters is the sum (g20 + g02). For this reason we're free specify each of those coefficients as half the sum, which results in g20 = g02. The same obviously applies to all the other diagonally symmetric pairs, so for the sake of definiteness and simplicity we can set gab = gba. It's important to note, however, that this symmetry property doesn't apply to all tensors. In general we have no a priori knowledge of the symmetries (if any) of an arbitrary tensor. Incidentally, when we refer to a vector (or, more generally, a tensor) as being either contravariant or covariant we're abusing the language slightly, because those terms really just signify two different conventions for interpreting the components of the object with respect to a given coordinate system, whereas the essential attributes of a vector or tensor are independent of the particular coordinate system in which we choose to express it. In general, any given vector or tensor can be expressed in both contravariant and covariant form with respect to any given coordinate system. For example, consider the vector P shown below.

We should note that when dealing with a vector (or tensor) field on a manifold each element of the field exists entirely at a single point of the manifold, with a direction and a magnitude, rather than imagining each vector to actually extends from one point in the manifold to another. (For example, we might have a vector field describing the direction and speed of the wind at each point in a given volume of air.) However, for the purpose of illustrating the relation between contravariant and covariant components, we are focusing on simple displacement vectors in a flat metrical space, which can be considered to extend from one point to another. Figure 1 shows an arbitrary coordinate system with the axes X1 and X2, and the contravariant and covariant components of the position vector P with respect to these coordinates. As can be seen, the jth contravariant component consists of the projection of P onto the jth axis parallel to the other axis, whereas the jth covariant component consists of the projection of P into the jth axis perpendicular to that axis. This is the essential distinction (up to scale factors) between the contravariant and covariant ways of expressing a vector or, more generally, a tensor. (It may seem that the naming convention is backwards, because the "contra" components go with the axes, whereas the "co" components go against the axes, but historically these names were given on the basis on the transformation laws that apply to these two different interpretations.) If the coordinate system is "orthogonal" (meaning that the coordinate axes are mutually perpendicular) then the contravariant and covariant interpretations are identical (up to scale factors). This can be seen by imagining that we make the coordinate axes in Figure 1 perpendicular to each other. Thus when we use orthogonal coordinates we are essentially using both contravariant and covariant coordinates, because in such a context the only difference between them (at any given point) is scale factors. It’s worth noting that "orthogonal" doesn't necessarily imply "rectilinear". For example, polar coordinates are not rectilinear, i.e., the axes are not straight lines, but they are orthogonal, because as we vary the angle we are always moving perpendicular to the local radial axis. Thus the metric of a polar coordinate system is diagonal, just as is the metric of a Cartesian

coordinate system, and so the contravariant and covariant forms at any given point differ only by scale factors (although these scale factor may vary as a function of position). Only when we consider systems of coordinates that are not mutually perpendicular do the contravariant and covariant forms differ (at a given point) by more than just scale factors. To understand in detail how the representations of vectors in different coordinate systems are related to each other, consider the displacement vector P in a flat 2-dimensional space shown below.

 In terms of the X coordinate system the contravariant components of P are (x1, x2) and the covariant components are (x1, x2). We’ve also shown another set of coordinate axes, denoted by Ξ, defined such that Ξ1 is perpendicular to X2, and Ξ2 is perpendicular to X1. In terms of these alternate coordinates the contravariant components of P are (ξ1, ξ2) and the covariant components are (ξ1, ξ2). The symbol ω signifies the angle between the two positive axes X1, X2, and the symbol ω’ denotes the angle between the axes Ξ1 and Ξ2. These angles satisfy the relations ω + ω’ = π and θ = (ω’ω)/2. We also have

which shows that the covariant components with respect to the X coordinates are the same, up to a scale factor of cos(θ), as the contravariant components with respect to the Ξ coordinates, and vice versa. For this reason the two coordinate systems are called "duals" of each other. Making use of the additional relations

the squared length of P can be expressed in terms of any of these sets of components as follows:

In general the squared length of an arbitrary vector on a (flat) 2-dimensional surface can be given in terms of the contravariant components by an expression of the form

where the coefficients guv are the components of the covariant metric tensor. This tensor is always symmetrical, meaning that guv = guv, so there are really only three independent elements for a two-dimensional manifold. With Einstein's summation convention we can express the preceding equation more succinctly as

From the preceding formulas we can see that the covariant metric tensor for the X coordinate system in Figure 2 is

whereas for the dual coordinate system Ξ the covariant metric tensor is

noting that cos(ω’) = cos(ω). The determinant g of each of these matrices is sin(ω)2, so we can express the relationship between the dual systems of coordinates as

We will find that the inverses of the metric tensor is also very useful, so let's use the superscripted symbol guv to denote the inverse of a given guv. The inverse metric tensors for the X and Ξ coordinate systems are

Comparing the left-hand matrix with the previous expression for s2 in terms of the covariant components, we see that

so the inverse of the covariant metric tensor is indeed the contravariant metric tensor. Now let's consider a vector x whose contravariant components relative to the X axes of Figure 2 are x1, x2, and let’s multiply this by the covariant metric tensor as follows:

Remember that summation is implied over the repeated index u, whereas the index v appears only once (in any given product) so this expression applies for any value of v. Thus the expression represents the two equations

If we carry out this multiplication we find

which agrees with the previously stated relations between the covariant and contravariant components, noting that sin(θ) = cos(ω). If we perform the inverse operation, multiplying these covariant components by the contravariant metric tensor, we recover the original contravariant components, i.e., we have

Hence we can convert from the contravariant to the covariant versions of a given vector simply by multiplying by the covariant metric tensor, and we can convert back simply by multiplying by the inverse of the metric tensor. These operations are called "raising and lowering of indices", because they convert x from a superscripted to a subscripted variable, or vice versa. In this way we can also create mixed tensors, i.e., tensors that are contravariant in some of their indices and covariant in others. It’s worth noting that, since xu = guv xu , we have

Many other useful relations can be expressed in this way. For example, the angle θ

between two vectors a and b is given by

These techniques immediately generalize to any number of dimensions, and to tensors with any number of indices, including "mixed tensors" that have some contravariant and some covariant indices. In addition, we need not restrict ourselves to flat spaces or coordinate systems whose metrics are constant (as in the above examples). Of course, if the metric is variable then we can no longer express finite interval lengths in terms of finite component differences. However, the above distance formulas still apply, provided we express them in differential form, i.e., the incremental distance ds along a path is related to the incremental components dxj according to

so we need to integrate this over a given path to determine the length of the path. These are exactly the formulas used in 4-dimensional spacetime to determine the spatial and temporal "distances" between events in general relativity. For any given index we could generalize the idea of contravariance and covariance to include mixtures of these two qualities in a single index. This is not ordinarily done, but it is possible. Recall that the contravariant components are measured parallel to the coordinate axes, and the covariant components are measured normal to all the other axes. These are the two extreme cases, but we could define components with respect to directions that make a fixed angle relative to the coordinate axes and normals. The transformation rule for such representations is more complicated than either (6) or (8), but each component can be resolved into sub-components that are either purely contravariant or purely covariant, so these two extreme cases suffice to express all transformation characteristics of tensors. 5.3 Curvature, Intrinsic and Extrinsic Thus we are led to a remarkable theorem (Theorem Egregium): If a curved surface is developed upon any other surface whatever, the measure of curvature in each point remains unchanged. C. F. Gauss, 1827

The extrinsic curvature κ of a plane curve at a given point on the curve is defined as the derivative of the curve's tangent angle with respect to position on the curve at that point. In other words, if θ(s) denotes the angle which the curve makes with some fixed reference axis as a function of the path length s along the curve, then κ = dθ/ds. In terms of orthogonal and naturally scaled coordinates X,Y we have tan(θ) = dX/dY. If the X axis is tangent to the curve at the point in question, then tan(θ) approaches θ and dX approaches ds, so in terms of such tangent normal coordinates the curvature can

equivalently be defined as simply the second derivative, κ = d2Y/dX2. One way of specifying a plane curve is by giving a function Y = f(X) where X and Y are naturally scaled orthogonal coordinates. Natural scaling means (ds)2 = (dX)2 + (dY)2, so we have ds/dX = [1 + (dY/dX)2]1/2. The curvature can easily be determined by directly evaluating the derivative dθ/ds as follows

Likewise if the curve is specified parametrically by the functions X(t) and Y(t) for some arbitrary path parameter t, we have ds/dt = (Xt2 + Yt2)1/2 where subscripts denote derivatives, and the curvature is

Although these derivations are quite simple and satisfactory for the case of plane curves, it's worthwhile to examine both of them more closely to clarify the application to higher dimensional cases, where it is more convenient to use the definition of curvature based on the second derivative with respect to tangent coordinates. First, let's return to the case where the plane curve was specified by an explicit function Y = f(X) for naturally scaled orthogonal coordinates X,Y. Expanding this function into a power series (up to second order) about the point of interest, we have constants A,B,C such that Y = AX2 + BX + C. The constant C is just a simple displacement, so it's irrelevant to the shape of the curve. Thus we need only consider the curve Y = AX2 + BX. If B is non-zero this curve is not tangent to the X axis at the origin. To remedy this we can consider the curve with respect to a rotated system of coordinates x,y, related to the original coordinates by the transformation equations

Substituting these expressions for X and Y into the equation Y = AX2 + BX and rearranging terms gives

If we select an angle α such that the coefficient of the linear term in the numerator vanishes, i.e., if we set Bcos(α) + sin(α) = 0 by putting α = invtan(-B), then the numerator is purely second order. If we then expand the denominator into a power series in x and y, the product of this series with the numerator yields just a constant times the numerator plus terms of third and higher order in x and y. Hence the non-constant terms in the denominator are insignificant up to second order, so the denominator is effectively just equal to the constant term. Inserting the value of α into the above equation gives

The curvature κ at the origin is just the second derivative, so we have

where subscripts denote derivatives, and we have used the facts that, for the original function f(X) at the origin we have fX = B and fXX = 2A. This shows how we can arrive (somewhat laboriously) at our previous result by using the "second derivative" definition of curvature and an explicitly defined curve Y = f(X). A plane curve can be expressed parametrically as a function of the path length s by the functions x(s), y(s). Since (ds)2 = (dx)2 + (dy)2, it follows that xs2 + ys2 = 1 (where again subscripts denote derivatives). The vector (xs,ys) is tangent to the curve, so the perpendicular vector (-ys,xs) is normal to the curve. The vector (xss,yss) represents the rate of change of the tangent direction of the curve with respect to s. Recall that the curvature κ of a line in the plane is defined as the rate of change of the angle of the curve as a function of distance along the curve, but since tan(θ) approaches θ to the second order as θ goes to zero, we can just as well define curvature as the rate of change of the tangent. Noting that ys = (1xs2)1/2 we have yss = xsxss/(1xs2)1/2 and hence ysyss = xsxss. Thus we have yss/xss = ys/xs, which means the vector (xss,yss) is perpendicular to the curve. The magnitude of this vector is |κ| = (xss2 + yss2)1/2, and we can define the signed magnitude as the dot product of (xss,yss) with the vector (-ys,xs), and normalize this to the length of this vector, which happens to be (xs2 + ys2)1/2 = 1. This gives the signed curvature

The center of curvature of the curve at the point (x,y) is at the point (x  ys/|κ|, y + xs/|κ|). To illustrate, a circle of radius R centered at the origin can be expressed by the parametric equations x(s) = Rcos(s/R) and y(s) = Rsin(s/R), and the first derivatives are xs =  sin(s/R) and ys = cos(s/R). The second derivatives are xss = (1/R)cos(s/R) and yss =  (1/R)sin(s/R). From this we have the magnitude of the curvature |κ| = 1/R and the signed curvature +1/R. The sign is based on the path direction being positive in the counterclockwise direction. The center of curvature for every point on this curve is the origin

(0,0). The preceding parametric derivation was based on the path length parameter s, but we can also define a curve in terms of an arbitrary parameter t, not necessarily the path length. In this case we have the functions x(t), y(t), and s(t). Again we have (ds)2 = (dx)2 + (dy)2, so the derivatives of these three functions are related by xt2 + yt2 = st2. We also have xs = xt/st and ys = yt/st, and the second derivatives are

and the similar expression for yss. Substituting into the previous formula for the signed curvature we get

The techniques described above for determining the curvature of plane curves can be used to determine the sectional curvatures of a two-dimensional surface embedded in three-dimensional space. Notice that the general power series expansion of a curve defined by the function f(x) is f(x) = c0 + c1 x + c2 x2 + c3 x3 + ..., but by choosing coordinates so that the curve passes through the origin tangent to the x axis at the point in question we can arrange to make c0 = c1 = 0, so the expansion of the curve about this point can be put into the form f(x) = c2 x2 + c3 x3 + ... Also, since the 2nd derivative is f''(x) = 2c2 + 6c3 x ..., evaluating this at x = 0 gives simply f''(0) = 2c2, so it's clear that only the 2nd-order term is significant in determining the curvature with respect to tangent normal coordinates, i.e., it is sufficient to represent the curve as f(x) = ax2. Similarly if we consider the extrinsic curvature of a cross-section of a two-dimensional surface in space, we see that at any given point on the surface we can construct an orthogonal "xyz" coordinate system such that the xy plane is tangent to the surface and the z axis is perpendicular to the surface at that point. In general the equation of our surface can be expanded about this point into a polynomial giving the "height" z as a function of x and y. As in the one-dimensional case, the constant and 1st-order terms of this polynomial will be zero (because we defined our coordinates tangent to the surface with the origin at the point in question), and the 3rd and higher order terms don't affect the second derivative at the origin, so we can represent our surface by just the 2nd-order terms of the expansion, i.e.,

The second (partial) derivatives of this function with respect to x and y are 2a and 2c respectively, so these numbers give us some information about the curvature of the surface. However, we'd really like to know the curvature of the surface evaluated in any direction, not just in the x and y directions. (Note that the tangency condition uniquely

determines the direction of the z axis, but the x and y axes can be rotated anywhere in the tangent plane.) In general we can evaluate the curvature of the surface in the direction of the line y = qx for any constant q. The equation of the surface in this direction is simply

but of course we want to evaluate the derivatives with respect to changes along this direction, rather than changes in the pure x direction. Parametrically the distance along the tangent plane in the y = qx direction is s2 = x2 + y2 = (1 + q2) x2, so we can substitute for x2 in the preceding equation to give the value of f as a function of the distance s

The second derivative of this function gives the extrinsic curvature κ(q) of the surface in the "q" direction:

Now we might ask what directions give the extreme (min and max) curvatures. Setting the derivative of κ(q) to zero gives the result

where m = (c  a)/b. Since the constant term of this quadratic is 1 it follows that the product of the two roots of this equation is also 1, which means that each of them is the negative reciprocal of the other, so the lines of min and max curvature are of the form y = qx and y = (1/q)x, which shows that the two directions are perpendicular. Substituting the two "extreme" values of q into the equation for κ(q) gives (see the Appendix for details) the two "principal curvatures" of the surface

The product of these two is called the "Gaussian curvature" of the surface at that point, and is given by

which of course is just the (negative) discriminant of the quadratic form ax2 + bxy + cy2.

For the surface of a sphere of radius R this quantity equals 1/R2 (as derived in the Appendix). Another measure of the curvature of a surface is called the "mean curvature", which, as the name suggests, is the mean value of the curvature over all possible directions. Since we want to give all the directions equal weight, we insert tan(θ) for q in the equation for κ(q) and then integrate over θ, giving the mean curvature

(Of course, we could also infer this mean value directly as the average of κ1 and κ2 since κ is symmetrically distributed.) Notice that the mean curvature occurs along two perpendicular directions, and these are oriented at 45 degrees relative to the "principal" directions. This can be verified by setting the derivative of the product κ(q) κ(-1/q) to zero and noting that the resulting quartic in q factors into two quadratics, one giving the two principal directions, and the other giving the directions of the mean curvature. (The product κ(q) κ(-1/q) is obviously a maximum when both terms have the mean value, and a minimum when the terms have their extreme values.) Examples of surfaces with constant Gaussian curvature are the sphere, the plane, and the pseudosphere, which have positive, zero, and negative curvature respectively. (Negative Gaussian curvature signifies that the two principal curvatures have opposite signs, meaning the surface has a "saddle" shape.) Surfaces with vanishing mean curvature are called "minimal surfaces", and represent the kinds of surfaces that are formed by a "soap films". For many years the only complete and non-self-intersecting minimal surfaces known were the plane, the catenoid, and the helicoid, but recently an infinite family of such minimal surfaces was discovered. The above discussion was based on extrinsic properties of surfaces, i.e., measuring the rate of deviation between one surface and another. However, we can also look at curvature from an intrinsic standpoint, in terms of the relations between points within the surface itself. For example, if we were confined the surface of a sphere of radius R, we would find that the ratio Q of the circumference to the "radius" of a circle as measured on the surface of the sphere would not be constant but would depend on the circle's radius r according to the relation Q = π (R/r) sin(r/R). Evaluating the second derivative of Q with respect to r in the limit as r goes to zero we have

Thus we can infer the radius of our sphere entirely from local measurements over a small region of the surface. The results of such local measurements of intrinsic distances on a surface can be encapsulated in the form of a "metric tensor" relative to any chosen system of coordinates on the surface.

In general, any two-dimensional surface embedded in three-dimensional space can be represented over a sufficiently small region by an expression of the form Z = f(X,Y) where X,Y,Z are orthogonal coordinates. The expansion of this function up to second order is

where A,B..,E are constants. If the coefficients D and E are zero, the surface is tangent to the XY plane, and we can immediately compute the Gaussian curvature and the metric tensor as discussed previously. However, if D and E are not zero, we need to rotate our coordinates so that the XY plane is tangent to the surface. To accomplish this we can apply the usual Euler rotation matrix for a rotation through an angle α about the z axis followed by a rotation through an angle β about the (new) X axis. Thus we have a new system of orthogonal coordinates x,y,z related to the original coordinates by

Making these substitutions for X,Y, and Z in (1) gives the equation of the surface in terms of the rotated coordinates. The coefficients of the linear terms in x and y in this transformed equation are Dcos(α)  Esin(α)

Dsin(α)cos(β) + Ecos(α)cos(β) + sin(β)

respectively. To make these coefficients vanish we must set

Substituting these angles into the full expression gives

The cross-product terms involving xz, yz, and z2 have been omitted, because if we bring these over to the left side and factor out z, we can then divide both sides by the factor (k1 + k2x + k3y + k4z), and the power series expansion of this, multiplied by the second-order terms in x and y, gives just a constant times those terms, plus terms of third and higher order in x,y, and z, which do not affect the curvature at the origin. Therefore, the second-

order terms involving z drop out, and we're left with the above quadratic for z. This describes a surface tangent to the xy plane at the origin, i.e., a surface of the form z = ax2 + bxy + cy2, and the curvature of such a surface equals 4acb2 at the origin, so the curvature of the above surface at the origin is

Remember that we began with a surface defined by the function Z = f(X,Y), and from equation (1) we see that the partial derivatives of the function f with respect to X and Y at the origin are

Consequently, the equation for the curvature of the surface can be written as

In addition, if we take the differentials of both sides of (1) we have

Inserting this for dZ into the metrical expression (ds)2 = (dX)2 + (dY)2 + (dZ)2 gives the metric at the origin on the surface with respect to the XY coordinates projected onto the surface:

where

Thus the curvature of the surface can also be written in the form

where g = gXXgYY  gXY2. The quantities in the numerator of the right hand expression are the coefficients of the "second groundform" of the surface, and the metric line element is called the first groundform. Hence the curvature is simply the ratio of the determinants of the two groundforms.

The preceding was based on treating the 2D surface embedded in 3D space defined by giving Z explicitly as a function of X and Y. This is analogous to our treatment of curves in the plane based on giving Y as an explicit function of X. However, we found that a more general and symmetrical expression for the curvature of a plane curve was found by considering the curve defined parametrically, i.e., giving x(u) and y(u) as functions of an arbitrary path parameter u. Similarly we can define a 2D surface in 3D space by giving x(u,v), y(u,v) and z(u,v) as functions of two arbitrary coordinates on the surface. From the Euclidean metric of the embedding 3D space we have

where subscripts denote partial derivatives. We also have the total differentials

which can be substituted into the basic 3D Euclidean metric (ds)2 = (dx)2 + (dy)2 + (dz)2 to give the 2D metric of the surface with respect to the arbitrary surface coordinates u,v

where

The space-vectors [xu,yu,zu] and [xv,yv,zv] are tangent to the surface and point along the u and v directions, respectively, so the cross-product of these two vectors is a vector normal to the surface

whose magnitude is

The space-vectors [xuu,yuu,zuu] and [xvv,yvv,zvv] represent the rates of change of the tangent vectors to the surface along the u and v directions, and the vector [xuv,yuv,zuv] represents the rate of change of the u tangent with respect to v, and vice versa. Thus if we take the dot products of each of these vectors with the unit vector normal to the surface, we will get the signed coefficients of an expression for the surface of the pure quadratic form h(u,v) = au2 + buv + cv2. where "h" can be regarded as the height above the tangent plane at the origin, and the three scaled triple products correspond to huu = 2a, huv = b, and hvv = 2c. If u and v were projections of orthogonal coordinates (as were x and y in our prior discussion), the determinant of the surface metric at the origin would be 1, and the

curvature would simply be 4acb2. However, in general we allow u and v to be any surface coordinates, not necessarily orthogonal, and not necessarily scaled to equal the path length along constant coordinate lines. Given orthogonal metrically scaled tangent coordinates X,Y, there exist coefficients A,B,C such that the height h above the tangent plane is h(X,Y) = AX2 + BXY + CY2, and the curvature K at the origin is simply 4AC B2. Also, for points sufficiently near the origin we have

Substituting these expressions into h(X,Y) gives h(u,v) = au2 + buv + cv2 where

With these coefficients we find

In addition, we know that the surface is asymptotic to the tangent plane at the origin, so the metric in terms of X,Y is simply (ds)2 = (dX)2 + (dY)2. Substituting the expressions for dX and dY in terms of du and dv, the metric at the origin in terms of the u,v coordinates is

From this we have the determinant of the metric

This shows that the intrinsic curvature K is related to the quantity 4acb2 by the equation

We saw previously that the coefficients 2a,b,2c are given by triple vector products divided by the normalizing factor form, we have

. Writing out the triple products in determinant

Therefore the Gaussian curvature is given by

Recalling that the determinant of the transpose of a matrix is the same as of the matrix itself, we can transpose the second factor in each determinant product to give the equivalent expression

The determinant of a product of matrices is the same as the product of the determinants of those matrices, so we can carry out the matrix multiplications inside the determinant symbols. The first product of determinants can therefore be written as the single determinant

Notice that several of the entries in this matrix can be expressed purely in terms of the components guu, guv, and gvv of the metric tensor and their partial derivatives, so we can write this determinant as

In a similar manner we can expand the second product of determinants into a single determinant and express most of the resulting components in terms of the metric to give

The curvature is just 1/g2 times the difference between these two determinants. In both cases we have been able to express all the matrix components in terms of the metric, with the exception of the upper-left entries. However, notice that the cofactors of these two entries in their respective matrices are identical (namely g), so when we take the difference of these determinants the upper-left entries both appear simply as multiples of g. Thus we need only consider the difference of these two entries, which can indeed be written purely in terms of the metric coefficients and their derivatives as follows

Consequently, we can express the Gaussian curvature K entirely in terms of the intrinsic metric with respect to arbitrary two-dimensional coordinates on the surface, as follows

This formula was first presented by Gauss in his famous paper "Disquisitiones generales circa superficies curvas" (General Investigations of Curved Surfaces), published in 1827. Gauss regarded this result as quite remarkable (egregium in Latin), so it is commonly known as the Theorema Egregium. The reason for Gauss' enthusiasm is that this formula proves the Gaussian curvature of a surface is indeed intrinsic, i.e., it is not dependent on the embedding of the surface in higher dimensional space. Operating entirely within the surface we can lay out arbitrary curvilinear coordinates u,v, and then determine the metric coefficients (and their derivatives) with respect to those coordinates, and from this information alone we can compute the intrinsic curvature of the surface. The Gaussian curvature K is defined as the product of the two principle extrinsic sectional curvatures κ1 and κ2, neither of which is an intrinsic metrical property of the surface, but the product of these two numbers is an intrinsic metrical property. In Section 5.7 the full Riemann curvature tensor Rabcd for manifolds of any number of dimensions is defined, and we show that Gauss' surface curvature K is equal to Ruvuv/g,

which completely characterizes the curvature of a two-dimensional surface. To highlight the correspondence between Gauss' formula and the full curvature tensor, we can re-write the above formula as

where we have used the facts that guu/g = gvv, gvv/g = guu, and guv/g = -guv. Notice that if we define the symbol

for any three indices a,b,c, then Gauss' formula for the curvature of a surface can be written more succinctly as

No summations are implied here, but to abbreviate the notation even further, we could designate the symbols α and β as "wild card" indices, with implied summation of every term in which they appear over all possible indices (i.e., over u and v). On this basis the formula is

As discussed in Section 5.7, this is precisely the formula for the component Ruvuv of the full Riemann curvature tensor in n dimensions, which makes it clear how directly Gauss' result for two-dimensional surfaces generalizes to n dimensions. Naturally this formula

for K reduces to κ1κ2 = 4ac  b2 where κ1 and κ2 of the two principal extrinsic curvatures relative to a flat plane tangent to the surface at the point of interest. The reason this formula is so complicated is that it applies to any system of coordinates (rather than just projected tangent normal coordinates), and is based entirely on the intrinsic properties of the surface. To illustrate this approach, consider the two-dimensional surface defined as the locus of points at a height h above the xy plane, where h is given by the equation

with arbitrary constants a, b, and c. For example, with a=c=0 and b=1 this gives the simple surface h = xy shown below:

For other values of a,b,c this surface can have various shapes, such as paraboloids. The function h(x,y) is single-valued over the entire xy plane, so it's convenient to simply project the xy grid onto the surface and use this as our coordinates on the surface. (Any other system of curvilinear coordinates would serve just as well.) Over a sufficiently small interval on this surface the distance ds along a path is related to the incremental changes dx, dy, and dz according to the usual Pythagorean relation

Also the equation of the surface allows us to express the increment dz in terms of dx and dy as follows

Therefore we have

Substituting this into the equation for the line element (ds)2 gives the basic metrical equation of the surface

where the components of the "metric tensor" are

We can, in principle, directly measure the incremental distance ds for any given increments dx and dy without ever leaving the surface, so the metric components are purely intrinsic properties of the surface. In general the metric tensor is a symmetric covariant tensor of second order, and is usually written in the form of a matrix. Thus, for our simple example we can write the metric as

The determinant of this matrix at the point (x,y) is

The inverse of the metric tensor is denoted by guv , where the superscripts are still indices, not exponents. In our example the inverse metric tensor is

Substituting these metric components into the general formula for the Gaussian curvature K gives

in agreement with our earlier result for surfaces specified explicitly in the form z = f(x,y). At the origin, where x = y = 0, this gives K = 4ac  b2, i.e., the product of the two principal extrinsic curvatures. In addition, the formula gives the Gaussian curvature for any point on the surface, so we don't have to go to the trouble of laboriously constructing

a tangent plane at each point and finding the quadratic expansion of the surface about that point. We can see from this formula that the curvature at every point of this simple twodimensional surface always has the same sign as the discriminant 4ac  b2 . Also, the shape of the constant-curvature lines on this surface can be determined by re-arranging the terms of the above equation, from which we find that the curvature equals K on the locus of points satisfying the equation

This is the equation of a conic with discriminant +4(4ac  b2)2. The case of zero curvature occurs only when the discriminant vanishes, which implies that b = so the equation of the surface factors as

, and

The quantity inside the parentheses is a planar function, so the surface is a parabolic "valley", which has no intrinsic curvature (like the walls of a cylinder). It follows from the preceding conic equation that the lines of constant curvature (if there is any curvature) must be ellipses centered on the origin. However, this is not the most general form of curvature possible on a two-dimensional surface, it's just the most general form for a surface embedded in three-dimensional Euclidean space. Suppose we embed our two-dimensional surface in four dimensional Euclidean space. We can still, at any given point on the surface, construct a two-dimensional tangent plane with orthogonal xy coordinates, and expand the equation of the surface up to second degree about that point, but now instead of just a single perpendicular height h(x,y) we allow two mutually perpendicular heights, which we may call h1(x,y) and h2(x,y). Our surface can now be defined (in the neighborhood of the origin at the point of tangency) by the equations

Following the same procedure as before, determining the components of the metric tensor for this surface and plugging them into Gauss's formula, we find that the intrinsic curvature of this surface is

where

The lines of constant curvature on this surface can be much more diverse than for a surface embedded in just three dimensional space. As an example, if we define the surface with the equations

then the lines of constant curvature are as indicated in the figure below.

We have focused on two-dimensional surfaces in this section, but the basic idea of intrinsic curvature remains essentially the same in any number of dimensions. We'll see in subsequent sections that Riemann generalized Gauss's notion of intrinsic curvature by noting that any two (distinct) directional rays emanating from a given point P, if continued geodesically and with parallel transport (both of which we will discuss in detail), single out a two-dimensional surface within the manifold, and we can determine the "sectional" curvature of that surface in the same way as described in this section. Of course, in a manifold of three or more dimensions there are infinitely many twodimensional surfaces passing through any given point, but Riemann showed how to encode enough information about the manifold at each point so that we can compute the sectional curvature on any surface. For spaces of n>2 dimensions, we can proceed in essentially the same way, by imagining a flat n-dimensional Euclidean space tangent to the space at point of interest, with a Cartesian coordinate system, and then evaluating how the curved space deviates from the flat space into another set of n(n1)/2 orthogonal dimensions, one for each pair of dimensions in the flat tangent space. This is obviously just a generalization of our approach for n = 2 dimensions, when we considered a flat 2D space with Cartesian

coordinates x,y tangent to the surface, and described the curved surface in the region around the tangent point in terms of the "height" h(x,y) perpendicular to the surface. Since the have chosen a flat baseline space tangent to the curved surface, it follows that the constant and first-order terms of h(x,y) are zero. Also, since we are not interested in any derivatives higher than the second, we can neglect all terms of h(x,y) above second order. Consequently we can express h(x,y) as a homogeneous second-order expression, i.e.,

We saw that embedding a curved 2D surface in four dimensions allows even more freedom for the shape of the surface, but in the limit as the region becomes smaller and smaller, the surface approaches a single height. Similarly for a space of three dimensions we can imagine a flat three-dimensional space with x,y,z Cartesian coordinates tangent to the curved surface, and consider three perpendicular "heights" h1(x,y), h2(x,z), and h3(y,z). There are obvious similarities between intrinsic curvature and ordinary spatial rotations, neither of which are possible in a space of just one dimension, and both of which are - in a sense - inherently two-dimensional phenomena, even when they exist in a space of more than two dimensions. Another similarity is the non-commutativity exhibited by rotations as well as by translations on a curved surface. In fact, we could define curvature as the degree to which translations along two given directions do not commute. The reason for this behavior is closely connected to the fact that rotations in space are non-commutative, as can be seen most clearly by imagining a curved surface embedded in a higher dimensional space, and noting that the translations on the surface actually involve rotations, i.e., angular displacements in the embedding space. Hence it's inevitable that such displacements don't commute.

5.4 Relatively Straight There’s some end at last for the man who follows a path; mere rambling is interminable. Seneca, 60 AD

The principle of relativity, as expressed in Newton's first law of motion (and carried over essentially unchanged into Einstein's special theory of relativity) is based on the idea of uniform motion in a straight line. However, the terms "uniform motion" and "straight line" are not as easy to define as one might think. Historically, it was usually just assumed that such things exist, and that we know them when we see them. Admittedly there were attempts to describe these concepts, but mainly in somewhat vague and often circular ways. For example, Euclid tells us that "a line is breadthless length", and "a straight line is a line which lies evenly with the points on itself". The precise literal interpretation of these statements can be debated, but they seem to have been modeled on an earlier definition given by Plato, who said a straight line is "that of which the middle covers the

ends". This in turn may have been based on Parmenides' saying that "straight is whatever has its middle directly between the ends". Each of these definitions relies on some pre-existing of idea straightness to give meaning to such terms as "lying evenly" or "directly between", so they are immediately selfreferential. Other early attempts to define straightness invoked visual alignment, on the presumption that light travels in a straight line. Of course, we could simply define straightness to be congruence with a path of light, but such an empirical definition would obviously preclude asking whether, in fact, light necessarily travels in straight lines as defined in some more abstract sense. Not surprisingly, thinkers like Plato and Euclid, who wished to keep geometry and mechanics strictly separate, preferred a purely abstract a priori definition of straightness, without appealing (explicitly) to any physical phenomena. Unfortunately, their attempts to provide meaningful conceptual definition were not particularly successful. Aristotle noted that among all possible lines connecting two given points, the straight line is the one with the shortest length, and Archimedes suggested that this property could be taken as the definition of a straight line. This at least has the merit of relating two potentially distinct concepts, straightness and length, and even gives us a way of quantifying which of two lines (i.e., curves) connecting two points is "straighter", simply by comparing their lengths, without explicitly invoking the straightness of anything else. Furthermore, this definition can be applied in a more general context, such as on the surface of the Earth, where the straightest (shortest) path between two points is an arc of a great circle, which is typically not congruent to a visual line of sight. We saw in Chapter 3.5 that Hero based his explanation of optical reflection on the hypothesis that light travels along the shortest possible path. This is a nice example of how an a priori conceptual definition of straightness led to a non-trivial physical theory about the behavior of light, which obviously would have been precluded if there had been no conception of straightness other than that it corresponds to the paths of light. We've also seen how Fermat refined this principle of straightness to involve the variable of time, related to spatial distances by what he intuited was an invariant characteristic speed of light. Similarly the principle of least action, popularized by Maupertius and Euler, represented the application of stationary paths in various phase spaces (i.e., the abstract space whose coordinates are the free variables describing the state of a system), but for actual geometrical space (and time) the old Euclidean concept of extrinsic straightness continued to predominate, both in mathematics and in physics. Even in the special theory of relativity Einstein relied on the intuitive Euclidean concept of straightness, although he was dissatisfied with this approach, and believed that the true principle of relativity should be based on the more profound Archimedian concept of straight lines as paths with extremal lengths. In a sense, this could be regarded as relativizing the concept of straightness, i.e., rather than seeking absolute extrinsic straightness, we focus instead on relative straightness of neighboring paths, and declare the extremum of the available paths to be "straight", or rather "as straight as possible". In addition, Einstein was motivated by the classical idea of Copernicus that we should not

regard our own particular frame of reference (or any other frame of reference) as special or preferred for the laws of physics. It ought to be possible to express the laws of physics in such a way that they apply to any system of coordinates, regardless of their state of motion. The special theory succeeds in this for all uniformly moving systems of coordinates (although with the epistemological shortcoming noted above), but Einstein sought a more general theory of relativity encompassing coordinate systems in any state of motion and avoiding the circular definition of straightness. We've noted that Archimedes suggested defining a straight line as the shortest path between two points, but how can we determine which of the infinitely many paths from any given point to another is the shortest? Let us imagine any arbitrary path through three-dimensional space from the point P1 at (x1,y1,z1) to the point P2 at (x2,y2,z2). We can completely describe this path by assigning a smooth monotonic parameter λ to the points of the path, such that λ=0 at P1 and λ=1 at P2, and then specifying the values of x(λ), y( λ), and z(λ) as functions of λ The total length S of the path can be found from the functions x(λ), y(λ), and z(λ) by integrating the differential distances all along the path as follows

Now suppose we let δx(λ), δy(λ), and δz(λ) denote three arbitrary functions of λ, representing some deviation from the nominal path, and consider the resulting "disturbed path" described by the functions X(λµ  x(λ) + µδx(λ)

Y(λµ) = y(λ) + µδy(λ)

Z(λµ) = z(λ) + µδz(λ)

where µ is a parameter that we can vary to apply different fractions of the disturbance. For any fixed value of the parameter µ the distance along the path from P1 to P2 is given by

Our objective is to find functions x(λ), y(λ), z(λ) such that for any arbitrary disturbance vector δ, the value of S(µ) is minimized at µ = 0. Those functions will then describe the “straightest” path from P1 to P2. To find the minimal value of S(µ) we differentiate with respect to µ. It's legitimate to perform this differentiation inside the integral, so (omitting the indications of functional dependencies) we can write

We can evaluate the derivatives with respect to λ based on the definitions of X,Y,Z as follows

Therefore, the derivatives of these with respect to µ are simply

Substituting these expressions into the previous equation gives

We want this quantity to equal zero when µ equals 0. Of course, in that case we have X=x, Y=y, and Z=z, so we make these substitutions and then require that the above integral vanish. Thus, letting dots denote differentiation with respect to λ, we have

Using "integration by parts" we can evaluate this integral, term by term. For example, considering just the x component in the numerator, we can use the "parts" variables

and then the usual formula for integration by parts gives

The first term on the right-hand side automatically vanishes, because by definition the disturbance components δx,δy,δz are all zero at the end-points of the path. Applying the same technique to the other components, we arrive at the following expression for the overall integral which we wish to set to zero

The coefficients of the three terms in the integrand are the disturbance functions δx, δy, δz, which are allowed to take on any arbitrary values in between λ = 0 and λ = 1. Regardless of the values of these three disturbance components, we require the integral to vanish. This is a very strong requirement, and can only be met by setting each of the three derivatives in parentheses to zero, i.e., it requires

This implies that the arguments of these three derivatives do not change as a function of the path parameter, so they have constant values all along the path. Thus we have

The numerators of these expressions can be regarded as the x, y, and z components, respectively, of the "rate" of motion (per λ) along the path, whereas the denominators represent the total magnitude of the motion. Thus, these conditions tell us that the components of motion along the path are in a constant ratio to each other, which means that the direction of motion is constant, i.e., a straight line. So, to reach from P1 to P2, the constants must be given by Cx = (x2  x1)/D, Cy = (y2  y1)/D, and Cz = (z2  z1)/D, where D is the total distance given by D2 = (x2  x1)2 + (y2  y1)2 + (z2  z1)2. Given an initial trajectory, the entire path is determined by the assumption that it proceeds from point to point always by the shortest possible route. So far we have focused on finding the geodesic paths in ordinary Euclidean threedimensional space, and found that they correspond to our usual notion of straight lines. However, in a space with a different metric, the shapes of geodesic paths can be more complicated. To determine the general equations for geodesic paths, let us first formalize

the preceding "variational" technique. In general, suppose we wish to determine a function x(λ) from λ1 to λ2 such that the integral of some function F(λ,x, ) along that path is stationary. (As before, dots signify derivatives with respect to λ.) We again define an arbitrary disturbance δx(x) and the disturbed function X(λ,µ) = x(λ) + µδx(λ), where µ is a parameter that determined how much of the disturbance is to be applied. We wish to make stationary the integral

This is done by differentiating S with respect to the parameter µ as follows

Substituting for dX/dµ and

/dµ gives

We want to set this quantity to zero when µ = 0, which implies X = x, so we require

The integral of the second term in parentheses (integration by parts) as

The first term on the right-hand side is identically zero (since the disturbance is defined to be zero at the end points), so we can substitute the second term back into the preceding equation and factor out the disturbance δx(λ) to give

Again, since this equations must be satisfied for every possible (smooth) disturbance function δx(λ), it requires that the quantity in parentheses vanish identically, so we arrive at the Euler equation

which is the basis for solving a wide variety of problems in the calculus of variations. The application of Euler's equation that most interests us is in finding the general equation of the straightest possible path in an arbitrary smooth manifold with a defined metric. In this case the function whose integral we wish to make stationary is the absolute spacetime interval, defined by the metric equation

where, as usual, summation is implied over repeated indices. Multiplying the right side by (dλ/dλ)2 and taking the square root of both sides gives the differential "distance" ds along a path parameterized by λ. Integrating along the path from λ1 and λ2 gives the distance to be made stationary

For each individual coordinate xσ this can be treated as a variational problem with the function

where again dots signify differentiation with respect to λ. (Incidentally, the metric need not be positive-definite, since we can always choose our sign convention so that the squared intervals in question are positive, provided we never integrate along a path for which the squared interval changes sign, which would represent changing from timelike to spacelike, or vice versa, in relativity.) Therefore, we can apply Euler's equation to immediately give the equations of geodesic paths on the surface with the specified metric

For an n-dimensional space this represents n equations, one for each of the coordinates x1, x2, ..., xn. Letting w = (ds/dλ)2 = F2 =

this can be written as

To simplify these equations, let us put the parameter λ equal to the integrated path length s, so that we have w = 1 and dw/dλ = 0. The right-most term drops out, and we're left with

Notice that even though w equals a constant 1 in these circumstances and the total derivative vanishes, the partial derivatives do not necessarily vanish. Indeed, if we substitute for

into this equation we get

Evaluating the derivative in the left-hand term and dividing through by 2, this gives

At this point it's conventional to make use of the identity

(where we have simply swapped the α and β indices) to represent the middle term of the preceding equation as half the sum of these two expressions. This enables us to write the geodesic equations in the form

where the symbol [αβσ] is defined as

These are called connection coefficients, also known as Christoffel symbols of the first kind. Finally, if we multiply through by the contravariant metric gσν, we have

where

are known as Christoffel symbols of the second kind. As an example, consider the simple two-dimensional surface h = ax2 + bxy + cy2 discussed in Chapter 5.3. Using the metric tensor, its inverse, and partial derivatives we can now directly compute the Christoffel symbols, from which we can give explicit parametric equations for the geodesic paths on our surface:

If we scale and rotate the coordinates so that the surface height has the form h = xy/R, the geodesic equations reduce to

These equations show that if either dx/ds or dy/ds equals zero, the second derivatives of x and y with respect to s must be zero, so lines of constant x and lines of constant y are geodesics (as expected, since these are straight lines in space). Of course, given an initial trajectory that is not parallel to either the x or y axis the resulting geodesic path on this surface will be curved, and can be explicitly computed from the above formulas.

5.8 The Field Equations

You told us how an almost churchlike atmosphere is pervading your desolate house now. And justifiably so, for unusual divine powers are at work in there. Besso to Einstein, 30 Oct 1915

The basis of Einstein's general theory of relativity is the audacious idea that not only do the metrical relations of spacetime deviate from perfect Euclidean flatness, but that the metric itself is a dynamical object. In every other field theory the equations describe the behavior of a physical field, such as the electric or magnetic field, within a constant and immutable arena of space and time, but the field equations of general relativity describe the behavior of space and time themselves. The spacetime metric is the field. This fact is so familiar that we may be inclined to simply accept it without reflecting on how ambitious it is, and how miraculous it is that such a theory is even possible, not to mention (somewhat) comprehensible. Spacetime plays a dual role in this theory, because it constitutes both the dynamical object and the context within which the dynamics are defined. This self-referential aspect gives general relativity certain characteristics different from any other field theory. For example, in other theories we formulate a Cauchy initial value problem by specifying the condition of the field everywhere at a given instant, and then use the field equations to determine the future evolution of the field. In contrast, because of the inherent self-referential quality of the metrical field, we are not free to specify arbitrary initial conditions, but only conditions that already satisfy certain self-consistency requirements (a system of differential relations called the Bianchi identities) imposed by the field equations themselves. The self-referential quality of the metric field equations also manifests itself in their nonlinearity. Under the laws of general relativity, every form of stress-energy gravitates, including gravitation itself. This is really unavoidable for a theory in which the metrical relations between entities determine the "positions" of those entities, and those positions in turn influence the metric. This non-linearity raises both practical and theoretical issues. From a practical standpoint, it ensures that exact analytical solutions will be very difficult to determine. More importantly, from a conceptual standpoint, non-linearity ensures that the field cannot in general be uniquely defined by the distribution of material objects, because variations in the field itself can serve as "objects". Furthermore, after eschewing the comfortable but naive principle of inertia as a suitable foundation for physics, Einstein concluded that "in the general theory of relativity, space and time cannot be defined in such a way that differences of the spatial coordinates can be directly measured by the unit measuring rod, or differences in the time coordinate by a standard clock...this requirement ... takes away from space and time the last remnant of physical objectivity". It seems that we're completely at sea, unable to even begin to formulate a definite solution, and lacking any definite system of reference for defining even the most rudimentary quantities. It's not obvious how a viable physical theory could emerge from such an austere level of abstraction. These difficulties no doubt explain why Einstein's route to the field equations in the years 1907 to 1915 was so convoluted, with so much confusion and backtracking. One of the principles that heuristically guided his search was what he called the principle of general

covariance. This was understood to mean that the laws of physics ought to be expressible in the form of tensor equations, because such equations automatically hold with respect to any system of curvilinear coordinates (within a given diffeomorphism class, as discussed in Section 9.2). He abandoned this principle at one stage, believing that he and Grossmann had proven it could not be made consistent with the Poisson equation of Newtonian gravitation, but subsequently realized the invalidity of their arguments, and re-embraced general covariance as a fundamental principle. It strikes many people as ironic that Einstein found the principle of general covariance to be so compelling, because, strictly speaking, it's possible to express almost any physical law, including Newton's laws, in generally covariant form (i.e., as tensor equations). This was not clear when Einstein first developed general relativity, but it was pointed out in one of the very first published critiques of Einstein's 1916 paper, and immediately acknowledged by Einstein. It's worth remembering that the generally covariant formalism had been developed only in 1901 by Ricci and Levi-Civita, and the first real use of it in physics was Einstein's formulation of general relativity. This historical accident made it natural for people (including Einstein, at first) to imagine that general relativity is distinguished from other theories by its general covariance, whereas in fact general covariance was only a new mathematical formalism, and does not connote a distinguishing physical attribute. For this reason, some people have been tempted to conclude that the requirement of general covariance is actually vacuous. However, in reply to this criticism, Einstein clarified the real meaning (for him) of this principle, pointing out that its heuristic value arises when combined with the idea that the laws of physics should not only be expressible as tensor equations, but should be expressible as simple tensor equations. In 1918 he wrote "Of two theoretical systems which agree with experience, that one is to be preferred which from the point of view of the absolute differential calculus is the simplest and most transparent". This is still a bit vague, but it seems that the quality which Einstein had in mind was closely related to the Machian idea that the expression of the dynamical laws of a theory should be symmetrical up to arbitrary continuous transformations of the spacetime coordinates. Of course, the presence of any particle of matter with a definite state of motion automatically breaks the symmetry, but a particle of matter is a dynamical object of the theory. The general principle that Einstein had in mind was that only dynamical objects could be allowed to introduce asymmetries. This leads naturally to the conclusion that the coefficients of the spacetime metric itself must be dynamical elements of the theory, i.e., must be acted upon. With this Einstein believed he had addressed what he regarded as the strongest of Mach's criticisms of Newtonian spacetime, namely, the fact that Newton's space acted on objects but was never acted upon by objects. Let's follow Einstein's original presentation in his famous paper "The Foundation of the General Theory of Relativity", which was published early in 1916. He notes that for empty space, far from any gravitating object, we expect to have flat (i.e., Minkowskian) spacetime, which amounts to requiring that Riemann's curvature tensor Rabcd vanishes. However, in regions of space near gravitating matter we must clearly have non-zero intrinsic curvature, because the gravitational field of an object cannot simply be "transformed away" (to the second order) by a change of coordinates. Thus there is no

system of coordinates with respect to which the manifold is flat to the second order, which is precisely the condition indicated by a non-vanishing Riemann curvature tensor. Nevertheless, even at points where the full curvature tensor Rabcd is non-zero, the contracted tensor of the second rank, Rbc= gadRabcd = Rdbcd may vanish. Of course, a tensor of rank four can be contracted in six different ways (the number of ways of choosing two of the four indices), and in general this gives six distinct tensors of rank two. We are able to single out a more or less unique contraction of the curvature tensor only because of that tensor’s symmetries (described in Section 5.7), which imply that of the six contractions of Rabcd, two are zero and the other four are identical up to sign change. Specifically we have

By convention we define the Ricci tensor Rbc as the contraction gadRabcd. In seeking suitable conditions for the metric field in empty space, Einstein observes that …there is only a minimum arbitrariness in the choice... for besides Rµν there is no tensor of the second rank which is formed from the gµν and it derivatives, contains no derivative higher than the second, and is linear in these derivatives… This prompts us to require for the matter-free gravitational field that the symmetrical tensor Rµν ... shall vanish.

Thus, guided by the belief that the laws of physics should be the simplest possible tensor equations (to ensure general covariance), he proposes that the field equations for the gravitational field in empty space should be

Noting that Rµν takes on a particularly simple form on the condition that we choose coordinates such that Christoffel symbols as

= 1, Einstein originally expressed this in terms of the

(except that in his 1916 paper Einstein had a different sign because he defined the symbol Γabc as the negative of the Christoffel symbol of the second kind.) He then concludes the section with words that obviously gave him great satisfaction, since he repeated essentially the same comments at the conclusion of the paper: These equations, which proceed, by the method of pure mathematics, from the requirement of the general theory of relativity, give us, in combination with the [geodesic] equations of motion, to a first approximation Newton's law of attraction, and to a second approximation the explanation of the motion of the perihelion of the planet Mercury discovered by Leverrier. These facts must, in my opinion, be taken as a convincing proof of the correctness of the theory.

To his friend Paul Ehrenfest in January 1916 he wrote that "for a few days I was beside myself with joyous excitement", and to Fokker he said that seeing the anomaly in Mercury's orbit emerge naturally from his purely geometrical field equations "had given him palpitations of the heart". (These recollections are remarkably similar to the presumably apocryphal story of Newton's trembling hand when he learned, in 1675, of Picard's revised estimates of the Earth's size, and was thereby able to reconcile his previous calculations of the Moon's orbit based on the assumption of an inverse-square law of gravitation.) The expression Rµν = 0 represents ten distinct equations in the ten unknown metric components gµν at each point in empty spacetime (where the term "empty" signifies the absence of matter or electromagnetic energy, but obviously not the absence of the metric/gravitational field.) Since these equations are generally covariant, it follows that given any single solution we can construct infinitely many others simply by applying arbitrary (continuous) coordinate transformations. Thus, each individual physical solution has four full degrees of freedom which allow it to be expressed in different ways. In order to uniquely determine a particular solution we must impose four coordinate conditions on the gµν, but this gives us a total of fourteen equations in just ten unknowns, which could not be expected to possess any non-trivial solutions at all if the fourteen equations were fully independent and arbitrary. Our only hope is if the ten formal conditions represented by our basic field equations automatically satisfy four identities for any values of the metric components, so that they really only impose six independent conditions, which then would uniquely determine a solution when augmented by a set of four arbitrary coordinate conditions. It isn't hard to guess that the four "automatic" conditions to be satisfied by our field equations must be the vanishing of the covariant derivatives, since this will guarantee local conservation of any energy-momentum source term that we may place on the right side of the equation, analogous to the mass density on the right side of Poisson's equation

In tensor calculus the divergence generalizes to the covariant derivative, so we expect that the covariant derivatives of the metrical field equations must identically vanish. The Ricci tensor Rµν itself does not satisfy this requirement, but we can create a tensor that does satisfy the requirement with just a slight modification of the Ricci tensor, and without disturbing the relation Rµν = 0 for empty space. Subtracting half the metric tensor times the invariant R = gµνRµν gives what is now called the Einstein Tensor

Obviously the condition Rµν = 0 implies Gµν = 0. Conversely, if Gµν = 0 we can see from the mixed form

that R must be zero, because otherwise Rµν would need to be diagonal, with the components R/2, which doesn't contract to the scalar R (except in two dimensions). Consequently, the condition Gµν = 0 is equivalent to Rµν = 0 for empty space, but for coupling with a non-zero source term we must use Gµν to represent the metrical field. To represent the "source term" we will use the covariant energy-momentum tensor Tµν, and regard it as the "cause" of the metric curvature (although one might also conceive of the metric curvature as, in some temporally symmetrical sense, "causing" the energymomentum). Einstein acknowledged that the introduction of this tensor is not justified by the relativity principle alone, but it has the virtues of being closely related by analogy with the Poisson equation from Newton's theory, it gives local conservation of energy and momentum, and finally that it implies gravitational energy gravitates just as does every other form of energy. On this basis we surmise that the field equations coupled to the source term can be written in the form Gµν = kTµν where k is a constant which must equal 8πG (where G is Newton's gravitational constant) in order for the field equations to reduce to Newton's law in the weak field limit. Thus we have the complete expression of Einstein's metrical law of general relativity

It's worth noting that although the left side of the field equations is quite pure and almost uniquely determined by mathematical requirements, the right side is a hodge-podge of miscellaneous "stuff". As Einstein wrote, The energy tensor can be regarded only as a provisional means of representing matter. In reality, matter consists of electrically charged particles... It is only the circumstance that we have no sufficient knowledge of the electromagnetic field of concentrated charges that compels us, provisionally, to leave undetermined in presenting the theory, the true form of this tensor... The right hand side [of (2)] is a formal condensation of all things whose comprehension in the sense of a field theory is still problematic. Not for a moment... did I doubt that this formulation was merely a makeshift in order to give the general principle of relativity a preliminary closed-form expression. For it was essentially no more than a theory of the gravitational field, which was isolated somewhat artificially from a total field of as yet unknown structure.

Alas, neither Einstein nor anyone since has been able to make further progress in determining the true form of the right hand side of (2), although it is at the heart of current efforts to reconcile quantum mechanics with general relativity. At present we must be content to let Tµν represent, in a vague sort of way, the energy density of the electromagnetic field and matter. A different (but equivalent) form of the field equations can be found by contracting (2) with gµν to give R  2R = R = 8πGT, and then substituting for R in (2) to give

which again makes clear that the field equations for empty space are simply Rµν = 0. Incidentally, the tensor Gµν was named for Einstein because of his inspired use of it, not because he discovered it. Indeed the vanishing of the covariant derivative of this tensor had been discovered by Aurel Voss in 1880, by Ricci in 1889, and again by Luigi Bianchi in 1902, all apparently independently. Bianchi had once been a student of Felix Klein, so it's not surprising that Klein was able in 1918 to point out regarding the conservation laws in Einstein's theory of gravitation that we need only "make use of the most elementary formulae in the calculus of variations". Recall from Section 5.7 that the Riemann curvature tensor in terms of arbitrary coordinates is

At the origin of Riemann normal coordinates this reduces to gad,cb – gac,bd , because in such coordinates the Christoffel symbols are all zero and we have the special symmetry gab,cd = gcd,ab. Now, if we consider partial derivatives (which in these special coordinates are the same as covariant derivatives) of this tensor, we see that the derivative of the quantity in square brackets still vanishes, because the product rule implies that each term is a Christoffel symbol times the derivative of a Christoffel symbol. We might also be tempted to take advantage of the special symmetry gab,cd = gcd,ab , but this is not permissible because although the two quantities are equal (at the origin of Riemann normal coordinates), their derivatives are not generally equal. Hence when evaluating the derivatives of the Riemann tensor, even at the origin or Riemann normal coordinates, we must consider all four of the metric tensor derivatives in the above expression. Denoting covariant differentiation with respect to a coordinate xm by the subscript ;m, we have

Noting that partial differentiation is commutative, and the metric tensor is symmetrical, we see that the sum of these three tensors vanishes at the origin of Riemann normal coordinates, and therefore with respect to all coordinates. Thus we have the Bianchi identities

Multiplying through by gadgbc , making use of the symmetries of the Riemann tensor, and the fact that the covariant derivative of the metric tensor vanishes identically, we have

which reduces to

Thus we have

showing that the "divergence" of the tensor inside the parentheses (the Einstein tensor) vanishes identically. As an example of how the theory of relativity has influenced mathematics (in appropriate reaction to the obvious influence of mathematics on relativity), in the same year that Einstein, Hilbert, Klein, and others were struggling to understand the conservation laws of the relativistic field equations, Emmy Noether published her famous work on the relation between symmetries and conservation laws, and Klein didn't miss the opportunity to show how Einstein's theory embodied aspects of his Erlangen program. A slight (but significant) extension of the field equations was proposed by Einstein in 1917 based on cosmological considerations, as a means of ensuring stability of a static closed universe. To accomplish this, he introduced a linear term with the cosmological constant λ as follows

When Hubble and other astronomers began to find evidence that in fact the large-scale universe is expanding, and Einstein realized his ingenious introduction of the cosmological constant had led him away from making such a fantastic prediction, he called it "the biggest blunder of my life”. It's worth noting that Einsteinian gravity is possible only in four dimensions, because in any fewer dimensions the vanishing of the Ricci tensor Rµν implies the vanishing of the full Riemann tensor, which means no curvature and therefore no gravity in empty space. Of course, the actual field equations for the vacuum assert that the Einstein tensor (not the Ricci tensor) vanishes, so we should consider the possibility of G being zero while R is non-zero. We saw above that G = 0 implies R = 0, but that was based on the assumption of a four-dimensional manifold. In general for an n-dimensional manifold we have R  (n/2)R = G, so if n is not equal to 2, and if Guv vanishes, we have G = 0 and it follows that R = 0, and therefore Ruv must vanish. However, if n = 2 it is possible for G to equal zero even though R is non-zero. Thus, in two dimensions, the vanishing of Guv

does not imply the vanishing of Ruv. In this case we have

where λ can be any constant. Multiplying through by guv gives

This is the vacuum solution of Einstein's field equations in two dimensions. Oddly enough, this is also the vacuum solution for the field equations in four dimensions if λ is identified as the non-zero cosmological constant. Any space of constant curvature is of this form, although a space of this form need not be of constant curvature. Once the field equations have been solved and the metric coefficients have been determined, we then compute the paths of objects by means of the equations of motion. It was originally taken as an axiom that the equations of motion are the geodesic equations of the manifold, but in a series of papers from 1927 to 1949 Einstein and others showed that if particles are treated as singularities in the field, then they must propagate along geodesic paths. Therefore, it is not necessary to make an independent assumption about the equations of motion. This is one of the most remarkable features of Einstein's field equations, and is only possible because of the non-linear nature of the equations. Of course, the hypothesis that particles can be treated as field singularities may seem no more intuitively obvious than the geodesic hypothesis itself. Indeed Einstein himself was usually very opposed to admitting any singularities, so it is somewhat ironic that he took this approach to deriving the equations of motion. On the other hand, in 1939 Fock showed that the field equations imply geodesic paths for any sufficiently small bodies with negligible self-gravity, not treating them as singularities in the field. This approach also suggests that more massive bodies would deviate from geodesics, and it relies on representing matter by the stress-energy tensor, which Einstein always viewed with suspicion. To appreciate the physical significance of the Ricci tensor it's important to be aware of a relation between the contracted Christoffel symbol and the scale factor of the fundamental volume element of the manifold. This relation is based on the fact that if the square matrix A is the inverse of the square matrix B, then the components of A can be expressed in terms of the components of B by the equation Aij = (B/Bij)/B where B is the determinant of B. Accordingly, since the covariant metric tensor gµν and the contravariant metric tensor gµν are matrix inverses of each other, we have

If we multiply both sides by the partial of gµν with respect to the coordinate xα we have

Notice that the left hand side looks like part of a Christoffel symbol. Recall the general form of these symbols

If we set one of the lower indices of the Christoffel symbol, say c, equal to a, then we have the contracted symbol

Since the indices a and σ are both dummies (meaning they each take on all possible values in the implied summation), and since gaσ = gσa, we can swap a and σ in any of the terms without affecting the result. Swapping a and σ in the last term inside the parentheses we see it cancels with the first term, and we're left with

Comparing this with our previous result (4), we find that the contracted Christoffel symbol can be written in the form

Furthermore, recalling the elementary fact that the derivative of ln(y) equals 1/y times the derivative of y, and the fact that k ln(y) = ln(yk), this result can also be written in the form

Since our metrics all have negative determinants, we can replace |g| with -g in these expressions. We're now in a position to evaluate the geometrical and physical significance of the Ricci tensor, the vanishing of which constitutes Einstein's vacuum field equations. The general form of the Ricci tensor is

which of course is a contraction of the full Riemann curvature tensor. Making use of the preceding identity, this can be written as

In his original 1916 paper on the general theory Einstein initially selected coordinates such that the metric determinant g was a constant -1, in which case the partial derivatives of

all vanish and the Ricci tensor is simply

The vanishing of this tensor constitutes Einstein's vacuum field equations (1'), provided the coordinates are such that g is constant. Even if g is not constant in terms of the natural coordinates, it is often possible to transform the coordinates so as to make g constant. For example, Schwarzschild replaced the usual r and θ coordinates with x = r3/3 and y = cos(θ), together with the assumption that gtt = 1/grr, and thereby expressed the spherically symmetrical line element in a form with g = -1. It is especially natural to impose the condition of constant g in static systems of coordinates and spatially uniform fields. Indeed, since we spend most of our time suspended quasi-statically in a nearly uniform gravitational field, we are most intuitively familiar with gravity in this form. From this point of view we identify the effects of gravity with the geodesic accelerations relative to our static coordinates, as represented by the Christoffel symbols. Indeed Einstein admitted that he conceptually identified the gravitational field with the Christoffel symbols, despite the fact that it's possible to have non-vanishing Christoffel symbols in flat spacetime, as discussed in Section 5.6 However, we can also take the opposite view. Rather than focusing on "static" coordinate systems with constant metric determinants which make the first two terms of (5) vanish, we can focus on "free-falling" inertial coordinates (also known as Riemann normal coordinates) in terms of which the Christoffel symbols, and therefore the second and fourth terms of (5), vanish at the origin. In other words, we "abstract away" the original sense of gravity as the extrinsic acceleration relative to some physically distinguished system of static coordinates (such as the Schwarzschild coordinates), and focus instead on the intrinsic tidal accelerations (i.e., local geodesic deviations) that correspond to the intrinsic curvature of the manifold. At the origin of Riemann normal coordinates the Ricci tensor

reduces to

where subscripts following commas signify partial derivatives with respect to the designated coordinate. Making use of the skew symmetry on the lower three indices of the Christoffel symbol partial derivatives in these coordinates (as described in Section 5.7), the second term on the right hand side can be replaced with the negative of its two complementary terms given by rotating the lower indices, so we have

Noting that each of the three terms on the right side is now a partial derivative of a contracted Christoffel symbol, we have

At the origin of Riemann normal coordinates the first partial derivatives of g, and therefore of , all vanish, so the chain rule allows us to bring those factors outside the differentiations, and noting the commutativity of partial differentiation we arrive at the expression for the components of the Ricci tensor at the origin of Riemann normal coordinates

Thus the vacuum field equations Rab = 0 reduce to

The quantity is essentially a scale factor for the incremental volume element V. In fact, for any scalar field Φ we have

and taking Φ=1 gives the simple volume. Therefore, at the origin of Riemann normal (free-falling inertial) coordinates we find that the components of the Ricci tensor Rab are

simply the second derivatives of the proper volume of an incremental volume element, divided by that volume itself. Hence the vacuum field equations Rab = 0 simply express the vanishing of these second derivatives with respect to any two coordinates (not necessarily distinct). Likewise the "complete" field equations in the form of (3) signify that three times the second derivatives of the volume, divided by the volume, equal the corresponding components of the "divergence-free" energy-momentum tensor expressed by the right hand side of (3). In physical terms this implies that a small cloud of free-falling dust particles initially at rest with respect to each other does not change it's volume during an incremental advance of proper time. Of course, this doesn't give a complete description of the effects of gravity in a typical gravitational field, because although the volume of the cloud isn't changing at this instant, its shape may be changing due to tidal acceleration. In a spherically symmetrical field the cloud will become lengthened in the radial direction and shortened in the normal directions. This variation in the shape is characterized by the Weyl tensor, which in general may be non-zero even when the Ricci tensor vanishes. It may seem that conceiving of gravity purely as tidal effect ignores what is usually the most physically obvious manifestation of gravity, namely, the tendency of objects to "fall down", i.e., the acceleration of the geodesics relative to our usual static coordinates near a gravitating body. However, in most cases this too can be viewed as tidal accelerations, provided we take a wider view of events. For example, the fall of a single apple to the ground at one location on Earth can be transformed away (locally) by a suitable system of accelerating coordinates, but the fall of apples all over the Earth cannot. In effect these apples can be seen as a spherical cloud of dust particles, each following a geodesic path, and those paths are converging and the cloud's volume is shrinking at an accelerating rate as the shell collapses toward the Earth. The rate of acceleration (i.e., the second derivative with respect to time) is proportional to the mass of the Earth, in accord with the field equations. 5.5 The Schwarzschild Metric From Kepler's 3rd Law In that same year [1665] I began to think of gravity extending to the orb of the Moon & from Kepler’s rule of the periodical times of the Planets being in sesquialterate proportion of their distances from the centers of their Orbs, I deduced that the forces which keep the Planets in their Orbs must be reciprocally as the squares of their distances from the centers about which they revolve: and thereby compared the force requisite to keep the Moon in her Orb with the force of gravity at the surface of the earth, and found them answer pretty nearly. Isaac Newton

The first and still the most important rigorous solution of the Einstein field equations was found by Schwarzschild in 1916. Although it's quite difficult to find exact analytical solutions of the complete field equations for general situations, the task is immensely simplified if we restrict our attention to highly symmetrical physical configurations. For example, it's obvious that the flat Minkowski metric trivially satisfies the field equations. The simplest non-trivial configuration in which gravity plays a role is a static mass point,

for which we can assume the metric has perfect spherical symmetry and is independent of time. Let r denote the radial spatial coordinate, so that every point on a surface of constant r has the same intrinsic geometry and the same relation to the mass point, which we fix at r = 0. Also, let t denote our temporal coordinate. Any surface of constant r and t must possess the two-dimensional intrinsic geometry of a 2-sphere, and we can scale the radial parameter r such that the area of this surface is 4π r2. (Notice that since the space may not be Euclidean, we don't claim that r is "the radial distance" from the mass point. Rather, at this stage r is simply an arbitrary radial coordinate scaled to give the familiar Euclidean surface area.) With this scaling, we can parameterize the two-dimensional surface at any given r (and t) by means of the ordinary "longitude and latitude" spherical metric

where dS is the incremental distance on the surface of an ordinary sphere of radius r corresponding to the incremental coordinate displacements dθ and dϕ. The coordinate θ represents "latitude", with θ = 0 at the north pole and θ = π/2 at the equator. The coordinate ϕ represents the longitude relative to some arbitrary meridian. On this basis, we can say that the complete spacetime metric near a spherically symmetrical mass m must be of the form

where gθθ = r2, gϕϕ = r2 sin(θ)2, and gtt and grr are (as yet) unknown functions of r and the central mass m. Of course, if we set m = 0 the functions gtt and grr must both equal 1 in order to give the flat Minkowski metric (in polar form), and we also expect that as r increases to infinity these functions both approach 1, regardless of m, since we expect the metric to approach flatness sufficiently far from the gravitating mass. This metric is diagonal, so the non-zero components of the contravariant metric tensor are gαα = 1/gαα. In addition, the diagonality of the metric allows us to simplify the definition of the Christoffel symbols to

Now, the only non-zero partial derivatives of the metric coefficients are

along with gtt/dr and grr/dr, which are yet to be determined. Inserting these values into

the preceding equation, we find that the only non-zero Christoffel symbols are

These are the coefficients of the four geodesic equations near a spherically symmetrical mass. We assume that, in the absence of non-gravitational forces, all natural motions (including light rays and massive particles) follow geodesic paths, so these equations provide a complete description of inertial/gravitational motions of test particles in a spherically symmetrical field. All that remains is to determine the metric coefficients gtt and grr. We expect that one possible solution should be circular Keplerian orbits, i.e., if we regard r as corresponding (at least approximately) to the Newtonian radial distance from the center of the mass, then there should be a circular geodesic path at constant r that revolves around the central mass m with an angular velocity of ω, and these quantities must be related (at least approximately) in accord with Kepler's third law

(The original deductions of an inverse-square law of gravitation by Hooke, Wren, Newton, and others were all based on this same empirical law. See Section 8.1 for a discussion of the origin of Kepler's law.) If we consider purely circular motion on the equatorial plane (θ = π/2) at constant r, the metric reduces to

and since dr/dτ = 0 the geodesic equations are simply

Multiplying through by (dτ/dt)2 and identifying the angular speed ω with the derivative of ϕ with respect to the coordinate time t, the right hand equation becomes

For consistency with Kepler's Third Law we must have ω2 equal (or very nearly equal) to m/r3, so we make this substitution to give

Integrating this equation, we find that the metric coefficient gtt must be of the form k  (2m/r) where k is a constant of integration. Since gtt must equal 1 when m = 0 and/or as r approaches infinity, it's clear that k = 1, so we have

Also, for a photon moving away from the gravitating mass in the purely radial direction we have dτ = 0, and so our basic metric for a purely radial ray of light gives

Invoking the symmetry v  1/v, we select the factorization gtt = dr/dt and grr = dt/dr, which implies grr = 1/gtt. This gives the complete Schwarzschild metric

from which nearly all of the experimentally accessible consequences of general relativity follow. In matrix form the Schwarzschild metric is written as

Now that we've determined gtt and grr, we have the partials

so the Christoffel symbols that we previously left undetermined are

Therefore, the complete set of geodesic equations for the Schwarzschild metric are

There are all parametric equations, where λ denotes a parameter that monotonically varies along the path. When dealing with massive particles, which travel at sub-light speeds, we must choose λ proportional to τ, the integrated lapse of proper time along the path. On the other hand, the lapse of proper time along the path of a massless particle (such as a photon) is zero by definition, so this raises an interesting question: How is it possible to extremize the “length” of a path whose length is identically zero? Even though the path of a photon has singular proper time, the path is not singular in all respects, so we can still parameterize the path by simply assigning monotonic values of λ to the points on the path. (Notice that, since geodesics are directionally symmetrical, it doesn’t matter whether λ is increasing or decreasing in the direction of travel.) An alternative approach to solving for light-like geodesics, based on Fermat’s principle of least time, will be discussed in Section 8.4. We applied Kepler's Third Law as a heuristic guide to these equations of motion, but there is a certain ambiguity in the derivation, due to the distinction between coordinate time t and the orbiting object's proper time τ. Recall that we defined the angular speed ω of the orbit as dϕ/dt rather than dϕ/dτ This illustrates the unavoidable ambiguity in

carrying over Newtonian laws of mechanics to the relativistic framework. Newtonian physics didn't distinguish between the proper time along a particular path and coordinate time - not surprisingly - since the two are practically indistinguishable for objects moving at much less than the speed of light. Nevertheless, the slight deviation between these two time parameters has observable consequences, and provides important tests for distinguishing between the space geodesic approach and the Newtonian force-at-adistance approach to gravitation. We've assumed that Kepler's Third law is exactly satisfied with respect to coordinate time t, but only approximately with respect to the orbiting object's proper time τ. It's interesting that the Newtonian free-fall formulas for purely radial paths are also applicable exactly in relativity, but only if time is interpreted as the proper time of the falling particle. Thus we can claim an exact correspondence between Newtonian and relativistic laws in each of these two fundamental cases by a suitable correspondence of the time coordinates, but no single correspondence works for both of them. To show that the equations of motion derived above (taking τ as the parameter λ) are fully equivalent to those of Newtonian gravity in the weak slow limit, we need only note that the scale factor between r and t is so great that we can neglect any terms that have a factor of dr/dt unless that term is also divided by r, in which case the scale factor cancels out. Also we can assume that dt/dτ is essentially equal to 1, and it's easy to see that if the motion of a test particle is initially in the plane θ = π/2 then it remains always in that plane, and by spherical symmetry this applies to all planes. So we can assume θ = π/2 and with the stated approximations the equations of motion reduce to the familiar Newtonian equations

where ω is the angular velocity. 5.6 The Equivalence Principle The important thing is this: to be able at any moment to sacrifice what we are for what we could become. Charles du Bois

At the end of a review article on special relativity in 1907, in which he surveyed the stunning range and power of his unique relativistic interpretation, Einstein included a section discussing the possibility of extending the idea still further. So far we have applied the principle of relativity, i.e., the assumption that physical laws are independent of the state of motion of the reference system, only to unaccelerated reference systems. Is it conceivable that the principle of relativity also applies to systems that are accelerated relative to each other?

This might have been regarded as merely a kinematic question, with no new physical

content, since we can obviously re-formulate physical laws to make them applicable in terms of alternative systems of coordinates. However, as Einstein later recalled, the thought occurred to him while writing this paper that a person in gravitational free-fall doesn’t feel their own weight. It’s as if the gravitational field does not exist. This is remarkably similar to Galileo’s realization (three centuries earlier) that, for a person in uniform motion, it is as if the motion does not exist. Interestingly, Galileo is also closely associated with the fact that a (homogeneous) gravitational field can be “transformed away” by a state of motion, because he was among the first to explicitly recognize the equality of inertial and gravitational mass. As a consequence of this equality, the free-fall path of a small test particle in a gravitational field is independent of the particle's composition. If we consider two coordinate systems S1 and S2, the first accelerating (in empty space) at a rate γ in the x direction, and the second at rest in a homogeneous gravitational field that imparts to all objects an acceleration of –γ in the x direction, then Einstein observed that …as far as we know, the physical laws with respect to the S1 system do not differ from those with respect to the S2 system… we shall therefore assume the complete physical equivalence of a gravitational field and a corresponding acceleration of the reference system.

This was the beginning of Einstein’s search for an extension of the principle of relativity to arbitrary coordinate systems, and for a satisfactory relativistic theory of gravity, a search which ultimately led him to reject special relativity as a suitable framework in which to formulate the most fundamental physical laws. Despite the importance that Einstein attached to the equivalence principle (even stating that the general theory of relativity “rests exclusively on this principle”), many subsequent authors have challenged its significance, and even its validity. For example, Ohanian and Rufinni (1994) emphatically assert that “gravitational effects are not equivalent to the effects arising from an observer's acceleration...", even limited to sufficiently small regions. In support of this assertion they describe how accelerometers “of arbitrarily small size” can detect tidal variations in a non-homogeneous gravitational field based on “local” measurements. Unfortunately they overlook the significance of their own comment regarding gradiometers that “the sensitivity attained depends on the integration time… with a typical integration time of 10 seconds the sensitivity demonstrated in a recent test was about the same as that of the Eotvos balance…”. Needless to say, the “locality” restriction refers to sufficiently small regions of spacetime, not just to small regions of space. The gradiometer may be only a fraction of a meter in spatial extent, but 10 seconds of temporal extent corresponds to three billion meters, which somewhat undermines the claim that the detection can be performed with such accuracy in an arbitrarily small region of spacetime. The same kind of conceptual error appears in every example that purports to show the invalidity of the equivalence principle. For example, one well-known modern author points out that an arbitrarily small droplet of liquid falling freely in the gravitational field of a spherical body (neglecting surface tension and wind resistance, etc) will not be perfectly spherical, but will be slightly ellipsoidal, due to the tidal effects of the inhomogeneous field… and the shape does not approach sphericity as the radius of the

droplet approaches zero. Furthermore, this applies to an arbitrarily brief “snapshot” of the falling droplet. He takes this to be proof of the falsity of the equivalence principle, whereas in fact it is just the opposite. If we began with a perfectly spherical droplet, it would take a significant amount of time traversing an inhomogeneous field for the shape to acquire its final ellipsoidal form, and as the length of time goes to zero, the deviation from sphericity also goes to zero. Likewise, once the droplet has acquired its ellipsoidal shape, that becomes its initial configuration upon entering any brief “snapshot”, and of course it departs from that snapshot with the same shape, in perfect agreement with the equivalence principle, which tells us to expect all the parts of the droplet to maintain their initial mutual relations when in free fall. Other authors have challenged the validity of the equivalence principle by considering the effects of rotation. Of course, a "sufficiently small" region of spacetime for transforming away the translatory motion of an object to some degree of approximation may not be sufficiently small for transforming away the rotational motion to the same degree of accuracy, but this does not conflict with the equivalence principle; it simply means that for an infinitesimal particle in a rotating body the "sufficiently small" region of spacetime is generally much smaller than for a particle in a non-rotating body, because it must be limited to a small arc of angular travel. In general, all such arguments against the validity of the (local) equivalence principle are misguided, based on a failure to correctly limit the extent of the subject region of space and time. Others have argued that, although the equivalence principle is valid for infinitesimal regions of spacetime, this limitation renders it more or less meaningless. But this was answered by Einstein himself several times. For example, when the validity of the equivalence principle was challenged on the grounds that an arbitrary (inhomogeneous) gravitational field over some finite region cannot be “transformed away” by any particular state of motion, Einstein replied To achieve the essential equivalence of inertia and gravitation it is not necessary that the mechanical behavior of two or more masses must be explainable by the mere effect of inertia by the same choice of coordinates. After all, nobody denies, for example, that the theory of special relativity does justice to the nature of uniform motion, even though it cannot transform all acceleration-free bodies together to a state of rest by one and the same choice of coordinates.

This observation should have settled the matter, but unfortunately the same specious objection to the equivalence principle has been raised by successive generations of critics. This is ironic, considering a purely geometrical interpretation of gravity would clearly be impossible if gravitational and inertial acceleration were not intrinsically identical. The meaning of the equivalence principle (which Einstein called “the happiest thought of my life”) is that gravitation is not something that exists within spacetime, but is rather an attribute of spacetime. Inertial motion is just a special case of free-fall in a gravitational field. There is no additional entity or coupling present to produce the effects of gravity on a test body. Gravity is geometry. This may be expressed somewhat informally by saying that if we take sufficiently small pieces of curved and flat spacetime we can't tell one from the other, because they are the same stuff. The perfect equivalence between gravitational and inertial mass noted by Galileo implies that kinematic

acceleration and the acceleration of gravity are intrinsically identical, and this makes possible a purely geometrical interpretation of gravity. At the beginning of his 1916 paper on the foundations of the general theory of relativity, Einstein discussed “the need for an extension for an extension of the postulate of relativity”, and by considering the description of a physical object in terms of a rotating system of coordinates he explained why Euclidean geometry does not apply. This is the most common way of justifying the abandonment of Euclidean geometry, but in a paper written in 1914 Einstein gave a more elementary and (arguably) more profound reason for turning from Euclidean to Riemannian geometry. He pointed out that, prior to Faraday and Maxwell, the fundamental laws of physics contained finite distances, such as the distance r in Coulomb’s inverse-square law for the electric force F = q1q2/ r2. Euclidean geometry is the appropriate framework in which to represent such laws, because it is an axiomatic structure based on finite distances, as can be seen from propositions such as the Pythagorean theorem r12 = r22 + r32, where r1, r2, r3 are the finite lengths of the edges of a right triangle. However, Einstein wrote Since Maxwell, and by his work, physics has undergone a fundamental revision insofar as the demand gradually prevailed that distances of points at a finite range should not occur in the elementary laws, i.e., theories of “action at a distance” are now replaced by theories of “local action”. One forgot in this process that the Euclidean geometry too – as it is used in physics – consists of physical theorems that, from a physical aspect, are on an equal footing with the integral laws of Newtonian mechanics of points. In my opinion this is an inconsistent attitude of which we should free ourselves.

In other words, when “action at a distance” theories were replaced by “local action” theories, such as Maxwell’s differential equations for the electromagnetic field, in which only differentials of distance and time appear, we should have, for consistency, replaced the finite distances of Euclidean geometry with the differentials of Riemannian geometry. Thus the only valid form of the Pythagorean theorem is the differential form ds2 = dx2 + dy2. Einstein then commented that it is rather unnatural, having taken this step, to insist that the coefficients of the squared differentials must be constant, i.e., that the RiemannChristoffel curvature tensor must vanish. Hence we should regard Riemannian geometry rather than Euclidean geometry as the natural framework in which to formulate the elementary laws of physics. From these considerations it follows rather directly that the influence of both inertia and gravitation on a particle should be expressed by the geodesic equations of motion

Einstein often spoke of the first term as representing the inertial part, and the second term, with the Christoffel symbols Γµαβ, as representing the gravitational field, and he was criticized for this, because the Christoffel symbols are not tensors, and they can be nonzero in perfectly flat spacetime simply by virtue of curvilinear coordinates. To illustrate, consider a flat plane with either Cartesian coordinates x,y or polar coordinates r,θ as

shown below

With respect to the Cartesian coordinates we have the familiar Pythagorean line element (ds)2 = (dx)2 + (dy)2. Also, we know the polar coordinates are related to the Cartesian coordinates by the equations x = r cos(θ) and y = r sin(θ), so we can evaluate the differentials

which of course are the transformation equations for the covariant metric tensor. Substituting these differentials into the Pythagorean metric equation, we have the metric for polar coordinates (ds)2 = (dr)2 + r2 (dθ)2. Therefore, the covariant and contravariant metric tensors for these polar coordinates are

and we have the determinant g = r2. The only non-zero partial derivatives of the covariant metric components are and , so the only non-zero r θ θ Christoffel symbols are Γ θθ = -r and Γ θr = Γ rθ = 1/r. Inserting these values into (1) gives the geodesic equations for this surface

Since we know this surface is a flat plane, the geodesic curves must be simply straight lines, and indeed it's clear from these equations that any purely radial path (for which dθ /ds = 0) is a geodesic. However, paths going "straight" in the θ direction (at constant r) are not geodesics, and these equations describe how the coordinates must vary along any given trajectory in order to maintain a geodesic path on the plane. Of course, if we insert

these polar metric components into Gauss's curvature formula we get K = 0, consistent with the fact that the surface is flat. The reason the geodesics on this surface are not simple linear functions of the coordinates is not because the geodesics are curved, but because the coordinates are curved. Hence it cannot be strictly correct to identify the second term (or the Christoffel symbols) as the components of a gravitational field. As early as 1916 Einstein was criticized for referring to the Christoffel symbols as the components of the gravitational. In response to a paper by Friedlich Kottler, Einstein wrote Kottler censures that I interpret the second term in the equations of motion as an expression of the influence of the gravitational field upon the mass point, and the first term more or less as the expression of the Galilean inertia. Allegedly this would introduce real forces of the gravitational field and this would not comply with the spirit of the equivalence principle. My answer to this is that this equation as a whole is generally covariant, and therefore is quite in compliance with the hypothesis of covariance. The naming of the parts, which I have introduced, is in principle meaningless and only meant to appeal to our physical habit of thinking… that is why I introduced these quantities even though they do not have tensorial character. The principle of equivalence, however, is always satisfied when equations are covariant.

To some extent, Einstein side-stepped the criticism, because he actually did regard the Christoffel symbols as, in some sense, representing “true” gravity, even in flat spacetime. The "correct" classroom view today is that gravity is present only when intrinsic curvature is present, but it is actually no so easy to characterize the presence or absence of “gravity” in general relativity, especially because the flat metric of spacetime can be regarded as a special case of a gravitational field, rather than the absence of a gravitational field. This is the point of view the Einstein maintained throughout his life, to the consternation of some school teachers. Consider again the flat two-dimensional space discussed above, and imagine some creatures living on a small region of this plane, and suppose they are under the impression that the constant-r and constant-θ loci are “straight”. They would have to conclude that the geodesic paths were curved, and that objects which naturally follow those paths are being influenced by some "force field". This is exactly analogous to someone in an upwardly accelerating elevator in empty space (i.e., far from any gravitating body). In terms of a coordinate system co-moving with the elevator, the natural paths of things are different than they would normally be, as if those objects were being influenced by an additional force field. This is exactly analogous to the perceptions of the creatures on our flat plane, except that it is their θ axis which is non-linear, whereas our elevator's t axis is non-linear. Inside the accelerating elevator the additional tendency for geodesic paths to "veer off" is not really due to any extra non-linearity of the geodesics, it's due to the non-linearity of the elevator's coordinate system. Hence most people today would say that non-zero Christoffel symbols, by themselves, should not be regarded as indicative of the presence of "true" gravity. If the intrinsic curvature is zero, then non-vanishing Christoffel symbols simply represent the necessary compensation for non-linear coordinates, so, at most (the argument goes) they represent "pseudo-gravity" rather than “true gravity” in such circumstances.

But the distinction between “pseudo-gravity” and “true gravity” is precisely what Einstein denied. The equivalence principle asserts that these are intrinsically identical. Einstein’s point hasn't been fully appreciated by some subsequent writers of relativity text books. In a letter to his friend Max von Laue in 1950 he tried to explain: ...what characterizes the existence of a gravitational field from the empirical standpoint is the nonvanishing of the Γlik, not the non-vanishing of the [curvature]. If one does not think intuitively in such a way, one cannot grasp why something like a curvature should have anything at all to do with gravitation. In any case, no reasonable person would have hit upon such a thing. The key for the understanding of the equality of inertial and gravitational mass is missing.

The point of the equivalence principle is that curving coordinates are gravitation, and there is no intrinsic ontological difference between “true gravity” and “pseudo-gravity”. On a purely local (infinitesimal) basis, the phenomena of gravity and acceleration were, in Einstein's view, quite analogous to the electric and magnetic fields in the context of special relativity, i.e., they are two ways of looking at (or interpreting) the same thing, in terms of different coordinate systems. Now, it can be argued that there are clear physical differences between electricity and magnetism (e.g., no magnetic monopoles) and how they are "produced" by elementary particle "sources", but one of the keys to the success of special relativity was that it unified the electric and magnetic fields in free space without getting bogged down (as Lorentz did) in trying to fathom the ultimate constituency of elementary charged particles, etc. Likewise, general relativity unifies gravity and non-linear coordinates - including acceleration and polar coordinates - in free space, without getting bogged down in the "source" side of the equation, i.e., the fundamental nature of how gravity is ultimately "produced", why the elementary massive particles have the masses they have, and so on. What Einstein was describing to von Laue was the conceptual necessity of identifying the purely geometrical effects of non-inertial coordinates with the physical phenomenon of gravitation. In contrast, the importance and conceptual significance of the curvature (as opposed to the connection) is mainly due to the fact that it defines the mode of coupling of the coordinates with the "source" side of the equation. Of course, since the effects of gravitation are reciprocal, all test particles are also sources of gravitation, and it can be argued that the equivalence principle is incomplete because it considers only the “passive” response of inertial mass points to a gravitational field, whereas a complete account must include the active participation of each mass point in the mutual production of the field. In view of this, it might seem to be a daunting task to attempt to found a viable theory of gravitation on the equivalence principle – just as it had seemed impossible to most 19th-century physicists that classical electrodynamics could proceed without determining the structure and self-action of the electron. But in both cases, almost miraculously, it turned out to be possible. On the other hand, as Einstein himself pointed out, the resulting theories were necessarily incomplete, precisely because they side-stepped the “source” aspect of the interactions. Maxwell's theory of the electric field remained a torso, because it was unable to set up laws for the behaviour of electric density, without which there can, of course, be no such thing as an electromagnetic field. Analogously the general theory of relativity furnished a field theory of gravitation, but no theory of the field-creating masses.

5.7 Riemannian Geometry Investigations like the one just made, which begin from general concepts, can serve only to ensure that this work is not hindered by too restricted concepts, and that progress in comprehending the connection of things is not obstructed by traditional prejudices. Riemann, 1854

An N-dimensional Riemannian manifold is characterized by a second-order metric tensor gµν(x) which defines the differential metrical distance along any smooth curve in terms of the differential coordinate components according to

where, as usual, summation is implied over repeated indices in any product. We've written the metric components as gµν(x) to emphasize that they are not constant, but are allowed to be continuous differentiable functions of position. The fact that the metric components are defined as continuous implies that over a sufficiently small region around any point they may be regarded as constant to the first order. Given any such region in which the metric components are constant we can apply a linear transformation to the coordinates so as to diagonalize the metric, and rescale the coordinates so that the diagonal elements of the metric are all 1 (or -1 in the case of a pseudo-Riemannian metric). Therefore, the metrical relations on the manifold over any sufficiently small region approach arbitrarily close to flatness to the first order in the coordinate differentials. In general, however, the metric components need not be constant to the second order of changes in position. If there exists a coordinate system at a point on the manifold such that the metric components are constant in the first and second order, then the manifold is said to be totally flat at that point (not just asymptotically flat). Since the metric components are continuous and differentiable, we can expand each component into a Taylor series about any given point p as follows

where gµν is evaluated at the point p, and in general the symbol gµν,αβγ... denotes the partial derivatives of gµν with respect to xα, xβ, xγ,... at the point p. Thus we have

and so on. These matrices (which are not necessarily tensors) are obviously symmetric under transpositions of µ and ν, as well as under any permutations of α,β,γ,... (because

partial differentiation is commutative). In terms of these symbols we can write the basic line element near the point p as

where the matrices gµν, gµν,α, gµν,αβ, etc., are constants. For incremental paths sufficiently close to the origin, all the terms involving xα become vanishingly small, and we're left with the familiar formula for the differential line element (ds)2 = gµν dxµ dxν. If all the components of gµν,α and gµν,αβ are zero at the point p, then the manifold is totally flat at that point (by definition). However, the converse doesn't follow, because it's possible to define a coordinate system on a flat manifold such that the derivatives of the metric are non-zero at points where the manifold is totally flat. (For example, polar coordinates on a flat plane have this characteristic.) We seek a criterion for determining whether a given metric at a point p can be transformed into one for which the first and second order coefficients gµν,α and gµν,αβ all vanish at that point. By the definition of a Riemannian manifold there exists a coordinate system with respect to which the first partial derivatives of the metric components vanish (local flatness). This can be visualized by imagining an N-dimensional Euclidean space with a Cartesian coordinate system tangent to the manifold at the given point, and projecting the coordinate system (with the origin at the point of tangency) from this Euclidean space onto the manifold in the region near the origin O. With respect to such coordinates the first-order metric components gµν,α vanish, so the lowest-order nonconstant terms of the metric are of the second order, and the line element is given by

In terms of such coordinates the matrix gµν,αβ contains all the information about the intrinsic curvature (if any) of the manifold at the origin of these coordinates. Naturally the gµναβ coefficients are symmetric in the first two indices because of the symmetry of the metric, and they are also symmetric in the last two indices because partial differentiation is commutative. Furthermore, we can always transform and rescale the coordinates in such a way that the ratios of the coordinates of any given point P are equal to the ratios of the differential components of the geodesic OP at the origin, and the sum of the squares of the coordinates equals the square of the geodesic distance from the origin. These are called Riemann normal coordinates, since they were introduced by Riemann in his 1854 lecture. (Note that these coordinates are well-defined only out to some finite distance from the origin, beyond which it's possible for geodesics emanating from the origin to intersect with each other, resulting in non-unique coordinates, closely analogous to the accelerating coordinate systems discussed in Section 4.5.) The advantage of these coordinates is that, in addition to ensuring all gµν,α = 0, they impose two more symmetries

on the gab,cd, namely, symmetry between the two pairs of indices, and cyclic skew symmetry on the last three indices. In other words, at the origin of Riemann normal coordinates we have

To understand why these symmetries occur, first consider the simple two-dimensional case with x,y coordinates on the surface, and recall that Riemann normal coordinates are defined such that the squared geodesic distance to any point x,y near the origin is given by s2 = x2 + y2. It follows that if we move from the point x,y to the point x+dx, y+dy, and if the increments dx,dy are in the same proportion to each other as x is to y, then the new position is along the same geodesic, and so the squared incremental distance (ds)2 equals the sum (dx)2 + (dy)2. Now, if the surface is flat, this simple expression for (ds)2 will hold regardless of the ratio of dx/dy, but for a curved surface it will hold when and only when dx/dy = x/y. In other words, the line element at a point near the origin of Riemann normal coordinates on a curved surface reduces to the Pythagorean line element if and only if the quantity xdy  ydx equals zero. Furthermore, we know that the firstorder terms of the metric vanish in Riemann coordinates, so even when xdy  ydx is nonzero, the line element differs from the Pythagorean form only by second-order (and higher) terms in the metric. Therefore, the deviation of the line element from the simple Pythagorean sum of squares must consist of terms of the form xαxβdxµdxν, and it must identically vanish if and only if xdy  ydx equals zero. The only expression satisfying these requirements is k(xdy  ydx)2 for some constant k, so the line element on a twodimensional surface with Riemann normal coordinates is of the form

The same reasoning can be applied in N dimensions. If we are given a point (x1,x2,...,xn) in an N-dimensional manifold near the origin of Riemann coordinates, then the distance (ds)2 from that point to the point (x1+dx1, x2+dx2, ..., xN+dxN) is given by the sum of squares of the components if the differentials are in the same proportions to each other as the xα coordinates, which implies that every expression of the form (xαdxβ  xβdxα) vanishes. If one or more of these N(N1)/2 expressions does not vanish, then the line element of a curved manifold will contain metric terms of the second order. The most general combination of second-order terms that vanishes if all the differentials are in proportion to the coordinates is a linear combination of the products of two of those terms. In other words, the general line element (up to second order) near the origin of Riemann normal coordinates on a curved surface must be of the form

where the Kµναβ are constants at the given point of the manifold. These coefficients represent the deviation from flatness of the manifold, and they vanish if and only if the curvature is zero (i.e., the manifold is flat). Notice that if all but two of the x and dx are

zero, this reduces to the preceding two-dimensional formula involving just the square of (x1dx2  x2dx1) and a single curvature coefficient. Also note that in a flat manifold, the quantity xρdxσ  xσdxρ is equal to twice the area of the incremental triangle formed by the origin and the nearby points (xρ, xσ) and (dxσ,dxρ) on the subsurface containing those three points, so it is invariant under coordinate transformations that do not change the scale. Each individual term in the expansions of the right-hand product in (5) involves four indices (not necessarily distinct). We can expand each product as shown below

Obviously we have the symmetries and anti-symmetries

Furthermore, we see that the value of K for each of the 24 permutations of indices contributes to four of the coefficients in the expanded sum of products, so each of those coefficients is a sum (with appropriate signs) of four K values. Thus the coefficient of xα xβdxµdxν is

Both of the identities (3) immediately follow, making use of the symmetries of the K array. It’s also useful to notice that each of the K index permutations is a simple transposition of the indices of the metric coefficient in this expression, so the relationship is invertible up to a constant factor. Using equation (6) we can sum four derivatives of g (with appropriate signs) to give

provided we impose the same skew symmetry on the K values as applies to the g derivatives, i.e.,

Hence at any point in a differentiable manifold we can define a system of Riemann normal coordinates and in terms of those coordinates the curvature of the manifold is completely characterized by an array Rµναβ = 12Kµναβ . (The factor of -12 is conventional.) We can verify that this is a covariant tensor of rank 4. It is called the

Riemann-Christoffel curvature tensor. At the origin of coordinates such that the first derivatives of the metric coefficients vanish, the components of the Riemann tensor are

If we further specialize to a point at the origin of Riemann normal coordinates, we can take advantage of the special symmetry gab,cd = gcd,ab , allowing us to express the curvature tensor in the very simple form

Since the gµναβ are symmetrical under transpositions of [µν] and of [αβ], it's apparent from (8) that if we transpose the first two indices of Rµναβ we simply reverse the sign of the quantity, and likewise for the last two indices. Also, if we swap the first and last pairs of indices we leave the quantity unchanged. Of course, we also have the same skew symmetry on three indices as we have with the K array, i.e., if we hold one index fixed and cyclically permute the other three, the sum of those three quantities vanishes. Symbolically these algebraic symmetries can be summarized as

These symmetries imply that there are only 20 algebraically independent components of the curvature tensor in four dimensions. (See Part 7 of the Appendix for a proof.) It should be emphasized that (8) gives the components of the covariant metric tensor only at the origin of a tangent coordinate system (in which the first derivatives of the metric are zero). The unique fully-covariant tensor that reduces to (8) when transformed to tangent coordinates is

where gµν is the matrix inverse of the zeroth-order metric array gµν. and Γabc is the Christoffel symbol (of the first kind) [ab,c] as defined in Chapter 5.4. By inspection of the quantity in brackets we verify that all the symmetry properties of Rabcd continue to apply in this general form, applicable to any curvilinear coordinates. We can illustrate Riemann's approach to curvature with some simple examples in twodimensional manifolds. First, it's clear that if the geodesic lines emanating from a point on a flat plane are drawn out, and symmetrical x,y coordinates are assigned to every point in accord with the prescription for Riemannian coordinates, we will find that all the components of Rabcd equal zero, and the line element is simply (ds)2 = (dx)2 + (dy)2. Now consider a two-dimensional surface whose height above the xy plane is h = bxy for some constant k. This is a special case of the family of two-dimensional surfaces discussed in Section 5.3. The line element in terms of projected x and y coordinates is

Using the equations of the geodesic paths on this surface given at the end of Section 5.4, we can plot the geodesic paths emanating from the origin, and superimpose the Riemann normal coordinate (X,Y) grid, as shown below.

From the shape of the loci of constant X and constant Y, we infer that the transformation between the original (x,y) coordinates and the Riemann normal coordinates (X,Y) is approximately of the form

Substituting these expressions into the line element and discarding all terms higher than second order (because we are interested only in the region arbitrarily close to the origin) we get

In order for X,Y to be Riemann normal coordinates we must have

and so we must set µ = b2/3. This allows us to write the line element in the form

The last term formally represents four components of the curvature, but the symmetries make them all equal up to sign, i.e., we have

Therefore, we have b2 = 12K1212 = R1212 , which implies that the curvature of this surface at the origin is R1212 = b2, in agreement with what we found in Section 5.3. In general, the Gaussian curvature K, i.e., the product of the two principle curvatures, on a twodimensional surface, is related to the Riemann tensor by K = R1212 / g where g is the determinant of the metric tensor, which is unity at the origin of Riemann normal coordinates. We also have K = 3k for a surface with the line element (4). For another example, consider a two dimensional surface whose height above the tangent plane at the origin is h = Ax2 + Bxy + Cy2. We can rotate the coordinates to bring the height into diagonal form, so we need only consider the form h = Mx2 + Ny2 for constants M,N, and by re-scaling x and y if necessary we can set N equal to M, so we have a symmetrical paraboloid with height h = M(x2 + y2). For x and y coordinates projected onto this surface the metric is

and we have dh = 2M(xdx + ydy). Making this substitution, we find the metric tensor is

At the origin, the first derivatives of the metric all vanish and g = 1, consistent with the fact that x,y is a tangent coordinate system. Also we have the symmetry gab,cd = gcd,ab. Therefore, since gxy,xy = 4M2 and gxx,yy = 0, we can compute all the components of the Riemann tensor at the origin, such as

which equals the curvature at that point. However, as an alternative, we could make use of the Fibonacci identity

to substitute for (dh)2 into the expression for the squared line element. This gives

Rearranging terms, this can be written in the form

This is not in the form of (4), because the Euclidean part of the metric has a variable coefficient. However, it’s interesting to observe that the ratio of the coefficients of the Riemannian part to the square of the coefficient of the Euclidean part is precisely the Gaussian curvature on the surface

where subscripts on h denote partial derivatives. The numerator and denominator are both determinants of 2x2 matrices, representing different "ground forms" of the surface. This shows that the curvature of a two-dimensional space (or sub-space) at the origin of tangent coordinates at a point is proportional to the coefficient of (xdyydx)2 in the line element of the surface at that point when decomposed according to the Fibonacci identity. Returning to general N-dimensional manifolds, for any point p of the manifold we can express the partial derivatives of the metric to first order in terms of these quantities as

The “connection” of this manifold is customarily expressed in the form of Christoffel symbols. To the first order near the origin of our coordinate system the Christoffel symbols of the first kind are

Obviously the Christoffel symbols vanish at the origin of Riemann coordinates, where the first derivatives of the metric coefficients vanish (by definition). We often make use of the first partial derivatives of these symbols with respect to the position coordinates. These can be expressed to the lowest order as

It follows from the symmetries of the partial derivatives of the metric at the origin of Riemann normal coordinates that the first partials of the Christoffel symbols possess the same cyclic skew symmetry, i.e.,

Consequently we have the useful relation (at the origin of Riemann normal coordinates)

Other useful formula can be derived based on the fact that we frequently need to deal with expressions involving the components of the inverse (i.e., contravariant) metric tensor, gµν(x), which tend to be extremely elaborate expressions except in the case of diagonal matrices. For this reason it's often very advantageous to work with diagonal metrics, noting that every static spacetime metric can be diagonalized. Given a diagonal metric, all the components of the curvature tensor can be inferred from the expressions

by applying the symmetries of the Riemann tensor. If we further specialize to Riemann coordinates, in terms of which all the first derivatives of the metric vanish, the components of the Riemann curvature tensor for a diagonal metric are summarized by

It is easily verified that this is consistent with the expression for the curvature tensor in Riemann coordinates given in equation (8), together with the symmetries of this tensor, if we set all the non-diagonal metric components to zero. To find the equations for geodesic paths on a Riemannian manifold, we can take a slightly different approach than we took in Section 5.4. For clarity, we will describe this in terms of a two-dimensional manifold, but it immediately generalizes to any number of dimensions. Since by definition a Riemannian manifold is essentially flat on a sufficiently small scale (a fact which corresponds to the equivalence principle for the spacetime manifold), there necessarily exist coordinates x,y at any given point such that the geodesic paths through that point are simply straight lines. Thus if we let functions x(s) and y(s) denote the parametric equations of the path, where s is the path length, these functions satisfy the differential equation

Any other (possibly curvilinear) system of coordinates X,Y will be related to the x,y coordinates by a transformation of the form

Focusing on just the x expression, we can divide through by ds to give

Substituting this into the equation of motion for the x coordinate gives

Expanding the differentiation, we have

Noting the differential identities

we can divide through by ds and then substitute into the preceding equation to give

A similar equation results from the original geodesic equation for y. To abbreviate these expressions we can use superscripts to denote different coordinates, i.e., let X1 = X

X2 = Y

x1 = x

x2 = y

Then with the usual summation convention we can express both the above equation and the corresponding equation for y in the form

In order to isolate the second derivative of the new coordinates Xα with respect to s, we can multiply through these equations by

The partial derivatives represented by

to give

are just the components of the

transformation from x to X coordinates, whereas the partials represented by are the components of the inverse transformation from X to x. Therefore the product of these two is simply the identity transformation, i.e.,

where signifies the Kronecker delta, defined as 1 is β  µ and 0 otherwise. Hence the first term of (10) is

and so equation (10) can be re-written as

This is the equation for a geodesic with respect to the arbitrary system of curvilinear coordinates Xα. The expression inside the parentheses is the Christoffel symbol , which makes it clear that this symbol describes the relationship between the arbitrary coordinates Xα and the special coordinates xα with respect to which the geodesics of the surface are unaccelerated. We saw in Section 5.4 how this can be expressed purely in terms of the metric coefficients and their first derivatives with respect to any given set of coordinates. That's obviously a more useful way of expressing them, because we seldom are given special "geodesically aligned" coordinates. In fact, the geodesic paths are essentially what we are trying to determine, given only an arbitrary system of coordinates and the metric coefficients with respect to those coordinates. The formula in Section 5.4 enables us to do this, but it's conceptually useful to understand that

where x essentially represents Cartesian coordinates tangent to the manifold, with respect to which geodesics of the surface (or space) are simple straight lines, and X represents the arbitrary coordinates in terms of which we are trying to express the conditions for geodesic paths. In a sense we can say that the Christoffel symbols describe how our chosen coordinates are curved relative to the geodesic paths at a point. This is why it's possible for the Christoffel symbols to be non-zero even on a flat surface, if we are using curved coordinates (such as polar coordinates) as discussed in Section 5.6. 5.8 The Field Equations You told us how an almost churchlike atmosphere is pervading your desolate house now. And justifiably so, for unusual divine powers are at work in there. Besso to Einstein, 30 Oct 1915 The basis of Einstein's general theory of relativity is the audacious idea that not only do the metrical relations of spacetime deviate from perfect Euclidean flatness, but that the metric itself is a dynamical object. In every other field theory the equations describe the behavior of a physical field, such as the electric or magnetic field, within a constant and immutable arena of space and time, but the field equations of general relativity describe the behavior of space and time themselves. The spacetime metric is the field. This fact is so familiar that we may be inclined to simply accept it without reflecting on how ambitious it is, and how miraculous it is that such a theory is even possible, not to mention (somewhat) comprehensible. Spacetime plays a dual role in this theory, because

it constitutes both the dynamical object and the context within which the dynamics are defined. This self-referential aspect gives general relativity certain characteristics different from any other field theory. For example, in other theories we formulate a Cauchy initial value problem by specifying the condition of the field everywhere at a given instant, and then use the field equations to determine the future evolution of the field. In contrast, because of the inherent self-referential quality of the metrical field, we are not free to specify arbitrary initial conditions, but only conditions that already satisfy certain self-consistency requirements (a system of differential relations called the Bianchi identities) imposed by the field equations themselves. The self-referential quality of the metric field equations also manifests itself in their nonlinearity. Under the laws of general relativity, every form of stress-energy gravitates, including gravitation itself. This is really unavoidable for a theory in which the metrical relations between entities determine the "positions" of those entities, and those positions in turn influence the metric. This non-linearity raises both practical and theoretical issues. From a practical standpoint, it ensures that exact analytical solutions will be very difficult to determine. More importantly, from a conceptual standpoint, non-linearity ensures that the field cannot in general be uniquely defined by the distribution of material objects, because variations in the field itself can serve as "objects". Furthermore, after eschewing the comfortable but naive principle of inertia as a suitable foundation for physics, Einstein concluded that "in the general theory of relativity, space and time cannot be defined in such a way that differences of the spatial coordinates can be directly measured by the unit measuring rod, or differences in the time coordinate by a standard clock...this requirement ... takes away from space and time the last remnant of physical objectivity". It seems that we're completely at sea, unable to even begin to formulate a definite solution, and lacking any definite system of reference for defining even the most rudimentary quantities. It's not obvious how a viable physical theory could emerge from such an austere level of abstraction. These difficulties no doubt explain why Einstein's route to the field equations in the years 1907 to 1915 was so convoluted, with so much confusion and backtracking. One of the principles that heuristically guided his search was what he called the principle of general covariance. This was understood to mean that the laws of physics ought to be expressible in the form of tensor equations, because such equations automatically hold with respect to any system of curvilinear coordinates (within a given diffeomorphism class, as discussed in Section 9.2). He abandoned this principle at one stage, believing that he and Grossmann had proven it could not be made consistent with the Poisson equation of Newtonian gravitation, but subsequently realized the invalidity of their arguments, and re-embraced general covariance as a fundamental principle. It strikes many people as ironic that Einstein found the principle of general covariance to be so compelling, because, strictly speaking, it's possible to express almost any physical law, including Newton's laws, in generally covariant form (i.e., as tensor equations). This was not clear when Einstein first developed general relativity, but it was pointed out in one of the very first published critiques of Einstein's 1916 paper, and immediately

acknowledged by Einstein. It's worth remembering that the generally covariant formalism had been developed only in 1901 by Ricci and Levi-Civita, and the first real use of it in physics was Einstein's formulation of general relativity. This historical accident made it natural for people (including Einstein, at first) to imagine that general relativity is distinguished from other theories by its general covariance, whereas in fact general covariance was only a new mathematical formalism, and does not connote a distinguishing physical attribute. For this reason, some people have been tempted to conclude that the requirement of general covariance is actually vacuous. However, in reply to this criticism, Einstein clarified the real meaning (for him) of this principle, pointing out that its heuristic value arises when combined with the idea that the laws of physics should not only be expressible as tensor equations, but should be expressible as simple tensor equations. In 1918 he wrote "Of two theoretical systems which agree with experience, that one is to be preferred which from the point of view of the absolute differential calculus is the simplest and most transparent". This is still a bit vague, but it seems that the quality which Einstein had in mind was closely related to the Machian idea that the expression of the dynamical laws of a theory should be symmetrical up to arbitrary continuous transformations of the spacetime coordinates. Of course, the presence of any particle of matter with a definite state of motion automatically breaks the symmetry, but a particle of matter is a dynamical object of the theory. The general principle that Einstein had in mind was that only dynamical objects could be allowed to introduce asymmetries. This leads naturally to the conclusion that the coefficients of the spacetime metric itself must be dynamical elements of the theory, i.e., must be acted upon. With this Einstein believed he had addressed what he regarded as the strongest of Mach's criticisms of Newtonian spacetime, namely, the fact that Newton's space acted on objects but was never acted upon by objects. Let's follow Einstein's original presentation in his famous paper "The Foundation of the General Theory of Relativity", which was published early in 1916. He notes that for empty space, far from any gravitating object, we expect to have flat (i.e., Minkowskian) spacetime, which amounts to requiring that Riemann's curvature tensor Rabcd vanishes. However, in regions of space near gravitating matter we must clearly have non-zero intrinsic curvature, because the gravitational field of an object cannot simply be "transformed away" (to the second order) by a change of coordinates. Thus there is no system of coordinates with respect to which the manifold is flat to the second order, which is precisely the condition indicated by a non-vanishing Riemann curvature tensor. Nevertheless, even at points where the full curvature tensor Rabcd is non-zero, the contracted tensor of the second rank, Rbc= gadRabcd = Rdbcd may vanish. Of course, a tensor of rank four can be contracted in six different ways (the number of ways of choosing two of the four indices), and in general this gives six distinct tensors of rank two. We are able to single out a more or less unique contraction of the curvature tensor only because of that tensor’s symmetries (described in Section 5.7), which imply that of the six contractions of Rabcd, two are zero and the other four are identical up to sign change. Specifically we have

By convention we define the Ricci tensor Rbc as the contraction gadRabcd. In seeking suitable conditions for the metric field in empty space, Einstein observes that …there is only a minimum arbitrariness in the choice... for besides Rµν there is no tensor of the second rank which is formed from the gµν and it derivatives, contains no derivative higher than the second, and is linear in these derivatives… This prompts us to require for the matter-free gravitational field that the symmetrical tensor Rµν ... shall vanish. Thus, guided by the belief that the laws of physics should be the simplest possible tensor equations (to ensure general covariance), he proposes that the field equations for the gravitational field in empty space should be

Noting that Rµν takes on a particularly simple form on the condition that we choose coordinates such that Christoffel symbols as

= 1, Einstein originally expressed this in terms of the

(except that in his 1916 paper Einstein had a different sign because he defined the symbol Γabc as the negative of the Christoffel symbol of the second kind.) He then concludes the section with words that obviously gave him great satisfaction, since he repeated essentially the same comments at the conclusion of the paper: These equations, which proceed, by the method of pure mathematics, from the requirement of the general theory of relativity, give us, in combination with the [geodesic] equations of motion, to a first approximation Newton's law of attraction, and to a second approximation the explanation of the motion of the perihelion of the planet Mercury discovered by Leverrier. These facts must, in my opinion, be taken as a convincing proof of the correctness of the theory. To his friend Paul Ehrenfest in January 1916 he wrote that "for a few days I was beside myself with joyous excitement", and to Fokker he said that seeing the anomaly in Mercury's orbit emerge naturally from his purely geometrical field equations "had given him palpitations of the heart". (These recollections are remarkably similar to the presumably apocryphal story of Newton's trembling hand when he learned, in 1675, of Picard's revised estimates of the Earth's size, and was thereby able to reconcile his previous calculations of the Moon's orbit based on the assumption of an inverse-square law of gravitation.) The expression Rµν = 0 represents ten distinct equations in the ten unknown metric

components gµν at each point in empty spacetime (where the term "empty" signifies the absence of matter or electromagnetic energy, but obviously not the absence of the metric/gravitational field.) Since these equations are generally covariant, it follows that given any single solution we can construct infinitely many others simply by applying arbitrary (continuous) coordinate transformations. Thus, each individual physical solution has four full degrees of freedom which allow it to be expressed in different ways. In order to uniquely determine a particular solution we must impose four coordinate conditions on the gµν, but this gives us a total of fourteen equations in just ten unknowns, which could not be expected to possess any non-trivial solutions at all if the fourteen equations were fully independent and arbitrary. Our only hope is if the ten formal conditions represented by our basic field equations automatically satisfy four identities for any values of the metric components, so that they really only impose six independent conditions, which then would uniquely determine a solution when augmented by a set of four arbitrary coordinate conditions. It isn't hard to guess that the four "automatic" conditions to be satisfied by our field equations must be the vanishing of the covariant derivatives, since this will guarantee local conservation of any energy-momentum source term that we may place on the right side of the equation, analogous to the mass density on the right side of Poisson's equation

In tensor calculus the divergence generalizes to the covariant derivative, so we expect that the covariant derivatives of the metrical field equations must identically vanish. The Ricci tensor Rµν itself does not satisfy this requirement, but we can create a tensor that does satisfy the requirement with just a slight modification of the Ricci tensor, and without disturbing the relation Rµν = 0 for empty space. Subtracting half the metric tensor times the invariant R = gµνRµν gives what is now called the Einstein Tensor

Obviously the condition Rµν = 0 implies Gµν = 0. Conversely, if Gµν = 0 we can see from the mixed form

that R must be zero, because otherwise Rµν would need to be diagonal, with the components R/2, which doesn't contract to the scalar R (except in two dimensions). Consequently, the condition Gµν = 0 is equivalent to Rµν = 0 for empty space, but for coupling with a non-zero source term we must use Gµν to represent the metrical field. To represent the "source term" we will use the covariant energy-momentum tensor Tµν,

and regard it as the "cause" of the metric curvature (although one might also conceive of the metric curvature as, in some temporally symmetrical sense, "causing" the energymomentum). Einstein acknowledged that the introduction of this tensor is not justified by the relativity principle alone, but it has the virtues of being closely related by analogy with the Poisson equation from Newton's theory, it gives local conservation of energy and momentum, and finally that it implies gravitational energy gravitates just as does every other form of energy. On this basis we surmise that the field equations coupled to the source term can be written in the form Gµν = kTµν where k is a constant which must equal 8πG (where G is Newton's gravitational constant) in order for the field equations to reduce to Newton's law in the weak field limit. Thus we have the complete expression of Einstein's metrical law of general relativity

It's worth noting that although the left side of the field equations is quite pure and almost uniquely determined by mathematical requirements, the right side is a hodge-podge of miscellaneous "stuff". As Einstein wrote, The energy tensor can be regarded only as a provisional means of representing matter. In reality, matter consists of electrically charged particles... It is only the circumstance that we have no sufficient knowledge of the electromagnetic field of concentrated charges that compels us, provisionally, to leave undetermined in presenting the theory, the true form of this tensor... The right hand side [of (2)] is a formal condensation of all things whose comprehension in the sense of a field theory is still problematic. Not for a moment... did I doubt that this formulation was merely a makeshift in order to give the general principle of relativity a preliminary closed-form expression. For it was essentially no more than a theory of the gravitational field, which was isolated somewhat artificially from a total field of as yet unknown structure. Alas, neither Einstein nor anyone since has been able to make further progress in determining the true form of the right hand side of (2), although it is at the heart of current efforts to reconcile quantum mechanics with general relativity. At present we must be content to let Tµν represent, in a vague sort of way, the energy density of the electromagnetic field and matter. A different (but equivalent) form of the field equations can be found by contracting (2) with gµν to give R  2R = R = 8πGT, and then substituting for R in (2) to give

which again makes clear that the field equations for empty space are simply Rµν = 0. Incidentally, the tensor Gµν was named for Einstein because of his inspired use of it, not

because he discovered it. Indeed the vanishing of the covariant derivative of this tensor had been discovered by Aurel Voss in 1880, by Ricci in 1889, and again by Luigi Bianchi in 1902, all apparently independently. Bianchi had once been a student of Felix Klein, so it's not surprising that Klein was able in 1918 to point out regarding the conservation laws in Einstein's theory of gravitation that we need only "make use of the most elementary formulae in the calculus of variations". Recall from Section 5.7 that the Riemann curvature tensor in terms of arbitrary coordinates is

At the origin of Riemann normal coordinates this reduces to gad,cb – gac,bd , because in such coordinates the Christoffel symbols are all zero and we have the special symmetry gab,cd = gcd,ab. Now, if we consider partial derivatives (which in these special coordinates are the same as covariant derivatives) of this tensor, we see that the derivative of the quantity in square brackets still vanishes, because the product rule implies that each term is a Christoffel symbol times the derivative of a Christoffel symbol. We might also be tempted to take advantage of the special symmetry gab,cd = gcd,ab , but this is not permissible because although the two quantities are equal (at the origin of Riemann normal coordinates), their derivatives are not generally equal. Hence when evaluating the derivatives of the Riemann tensor, even at the origin or Riemann normal coordinates, we must consider all four of the metric tensor derivatives in the above expression. Denoting covariant differentiation with respect to a coordinate xm by the subscript ;m, we have

Noting that partial differentiation is commutative, and the metric tensor is symmetrical, we see that the sum of these three tensors vanishes at the origin of Riemann normal coordinates, and therefore with respect to all coordinates. Thus we have the Bianchi identities

Multiplying through by gadgbc , making use of the symmetries of the Riemann tensor, and the fact that the covariant derivative of the metric tensor vanishes identically, we have

which reduces to

Thus we have

showing that the "divergence" of the tensor inside the parentheses (the Einstein tensor) vanishes identically. As an example of how the theory of relativity has influenced mathematics (in appropriate reaction to the obvious influence of mathematics on relativity), in the same year that Einstein, Hilbert, Klein, and others were struggling to understand the conservation laws of the relativistic field equations, Emmy Noether published her famous work on the relation between symmetries and conservation laws, and Klein didn't miss the opportunity to show how Einstein's theory embodied aspects of his Erlangen program. A slight (but significant) extension of the field equations was proposed by Einstein in 1917 based on cosmological considerations, as a means of ensuring stability of a static closed universe. To accomplish this, he introduced a linear term with the cosmological constant λ as follows

When Hubble and other astronomers began to find evidence that in fact the large-scale universe is expanding, and Einstein realized his ingenious introduction of the cosmological constant had led him away from making such a fantastic prediction, he called it "the biggest blunder of my life”. It's worth noting that Einsteinian gravity is possible only in four dimensions, because in any fewer dimensions the vanishing of the Ricci tensor Rµν implies the vanishing of the full Riemann tensor, which means no curvature and therefore no gravity in empty space. Of course, the actual field equations for the vacuum assert that the Einstein tensor (not the Ricci tensor) vanishes, so we should consider the possibility of G being zero while R is non-zero. We saw above that G = 0 implies R = 0, but that was based on the assumption of a four-dimensional manifold. In general for an n-dimensional manifold we have R  (n/2)R = G, so if n is not equal to 2, and if Guv vanishes, we have G = 0 and it follows that R = 0, and therefore Ruv must vanish. However, if n = 2 it is possible for G to equal zero even though R is non-zero. Thus, in two dimensions, the vanishing of Guv does not imply the vanishing of Ruv. In this case we have

where λ can be any constant. Multiplying through by guv gives

This is the vacuum solution of Einstein's field equations in two dimensions. Oddly enough, this is also the vacuum solution for the field equations in four dimensions if λ is identified as the non-zero cosmological constant. Any space of constant curvature is of this form, although a space of this form need not be of constant curvature. Once the field equations have been solved and the metric coefficients have been determined, we then compute the paths of objects by means of the equations of motion. It was originally taken as an axiom that the equations of motion are the geodesic equations of the manifold, but in a series of papers from 1927 to 1949 Einstein and others showed that if particles are treated as singularities in the field, then they must propagate along geodesic paths. Therefore, it is not necessary to make an independent assumption about the equations of motion. This is one of the most remarkable features of Einstein's field equations, and is only possible because of the non-linear nature of the equations. Of course, the hypothesis that particles can be treated as field singularities may seem no more intuitively obvious than the geodesic hypothesis itself. Indeed Einstein himself was usually very opposed to admitting any singularities, so it is somewhat ironic that he took this approach to deriving the equations of motion. On the other hand, in 1939 Fock showed that the field equations imply geodesic paths for any sufficiently small bodies with negligible self-gravity, not treating them as singularities in the field. This approach also suggests that more massive bodies would deviate from geodesics, and it relies on representing matter by the stress-energy tensor, which Einstein always viewed with suspicion. To appreciate the physical significance of the Ricci tensor it's important to be aware of a relation between the contracted Christoffel symbol and the scale factor of the fundamental volume element of the manifold. This relation is based on the fact that if the square matrix A is the inverse of the square matrix B, then the components of A can be expressed in terms of the components of B by the equation Aij = (B/Bij)/B where B is the determinant of B. Accordingly, since the covariant metric tensor gµν and the contravariant metric tensor gµν are matrix inverses of each other, we have

If we multiply both sides by the partial of gµν with respect to the coordinate xα we have

Notice that the left hand side looks like part of a Christoffel symbol. Recall the general form of these symbols

If we set one of the lower indices of the Christoffel symbol, say c, equal to a, then we have the contracted symbol

Since the indices a and σ are both dummies (meaning they each take on all possible values in the implied summation), and since gaσ = gσa, we can swap a and σ in any of the terms without affecting the result. Swapping a and σ in the last term inside the parentheses we see it cancels with the first term, and we're left with

Comparing this with our previous result (4), we find that the contracted Christoffel symbol can be written in the form

Furthermore, recalling the elementary fact that the derivative of ln(y) equals 1/y times the derivative of y, and the fact that k ln(y) = ln(yk), this result can also be written in the form

Since our metrics all have negative determinants, we can replace |g| with -g in these expressions. We're now in a position to evaluate the geometrical and physical significance of the Ricci tensor, the vanishing of which constitutes Einstein's vacuum field equations. The general form of the Ricci tensor is

which of course is a contraction of the full Riemann curvature tensor. Making use of the preceding identity, this can be written as

In his original 1916 paper on the general theory Einstein initially selected coordinates such that the metric determinant g was a constant -1, in which case the partial derivatives of

all vanish and the Ricci tensor is simply

The vanishing of this tensor constitutes Einstein's vacuum field equations (1'), provided the coordinates are such that g is constant. Even if g is not constant in terms of the natural coordinates, it is often possible to transform the coordinates so as to make g constant. For example, Schwarzschild replaced the usual r and θ coordinates with x = r3/3 and y = cos(θ), together with the assumption that gtt = 1/grr, and thereby expressed the spherically symmetrical line element in a form with g = -1. It is especially natural to impose the condition of constant g in static systems of coordinates and spatially uniform fields. Indeed, since we spend most of our time suspended quasi-statically in a nearly uniform gravitational field, we are most intuitively familiar with gravity in this form. From this point of view we identify the effects of gravity with the geodesic accelerations relative to our static coordinates, as represented by the Christoffel symbols. Indeed Einstein admitted that he conceptually identified the gravitational field with the Christoffel symbols, despite the fact that it's possible to have non-vanishing Christoffel symbols in flat spacetime, as discussed in Section 5.6 However, we can also take the opposite view. Rather than focusing on "static" coordinate systems with constant metric determinants which make the first two terms of (5) vanish, we can focus on "free-falling" inertial coordinates (also known as Riemann normal coordinates) in terms of which the Christoffel symbols, and therefore the second and fourth terms of (5), vanish at the origin. In other words, we "abstract away" the original sense of gravity as the extrinsic acceleration relative to some physically distinguished system of static coordinates (such as the Schwarzschild coordinates), and focus instead on the intrinsic tidal accelerations (i.e., local geodesic deviations) that correspond to the intrinsic curvature of the manifold. At the origin of Riemann normal coordinates the Ricci tensor

reduces to

where subscripts following commas signify partial derivatives with respect to the designated coordinate. Making use of the skew symmetry on the lower three indices of the Christoffel symbol partial derivatives in these coordinates (as described in Section 5.7), the second term on the right hand side can be replaced with the negative of its two complementary terms given by rotating the lower indices, so we have

Noting that each of the three terms on the right side is now a partial derivative of a contracted Christoffel symbol, we have

At the origin of Riemann normal coordinates the first partial derivatives of g, and therefore of , all vanish, so the chain rule allows us to bring those factors outside the differentiations, and noting the commutativity of partial differentiation we arrive at the expression for the components of the Ricci tensor at the origin of Riemann normal coordinates

Thus the vacuum field equations Rab = 0 reduce to

The quantity is essentially a scale factor for the incremental volume element V. In fact, for any scalar field Φ we have

and taking Φ=1 gives the simple volume. Therefore, at the origin of Riemann normal (free-falling inertial) coordinates we find that the components of the Ricci tensor Rab are simply the second derivatives of the proper volume of an incremental volume element, divided by that volume itself. Hence the vacuum field equations Rab = 0 simply express the vanishing of these second derivatives with respect to any two coordinates (not

necessarily distinct). Likewise the "complete" field equations in the form of (3) signify that three times the second derivatives of the volume, divided by the volume, equal the corresponding components of the "divergence-free" energy-momentum tensor expressed by the right hand side of (3). In physical terms this implies that a small cloud of free-falling dust particles initially at rest with respect to each other does not change it's volume during an incremental advance of proper time. Of course, this doesn't give a complete description of the effects of gravity in a typical gravitational field, because although the volume of the cloud isn't changing at this instant, its shape may be changing due to tidal acceleration. In a spherically symmetrical field the cloud will become lengthened in the radial direction and shortened in the normal directions. This variation in the shape is characterized by the Weyl tensor, which in general may be non-zero even when the Ricci tensor vanishes. It may seem that conceiving of gravity purely as tidal effect ignores what is usually the most physically obvious manifestation of gravity, namely, the tendency of objects to "fall down", i.e., the acceleration of the geodesics relative to our usual static coordinates near a gravitating body. However, in most cases this too can be viewed as tidal accelerations, provided we take a wider view of events. For example, the fall of a single apple to the ground at one location on Earth can be transformed away (locally) by a suitable system of accelerating coordinates, but the fall of apples all over the Earth cannot. In effect these apples can be seen as a spherical cloud of dust particles, each following a geodesic path, and those paths are converging and the cloud's volume is shrinking at an accelerating rate as the shell collapses toward the Earth. The rate of acceleration (i.e., the second derivative with respect to time) is proportional to the mass of the Earth, in accord with the field equations. 6.1 An Exact Solution Einstein had been so preoccupied with other studies that he had not realized such confirmation of his early theories had become an everyday affair in the physical laboratory. He grinned like a small boy, and kept saying over and over “Ist das wirklich so?” A. E. Condon The special theory of relativity assumes the existence of a unique class of global coordinate systems - called inertial coordinates - with respect to which the speed of light in vacuum is everywhere equal to the constant c. It was natural, then, to express physical laws in terms of this preferred class of coordinate systems, characterized by the global invariance of the speed of light. In addition, the special theory also strongly implied the fundamental equivalence of mass and energy, according to which light (and every other form of energy) must be regarded as possessing inertia. However, it soon became clear that the global invariance of light speed together with the idea that energy has inertia (as expressed in the famous relation E2 = m2 + |p|2) were incompatible with one of the most

firmly established empirical results of physics, namely, the exact proportionality of inertial and gravitational mass, which Einstein elevated to the status of a Principle. This incompatibility led Einstein, as early as 1907, to the belief that the global invariance of light speed, in the sense of the special theory, could not be maintained. Indeed, he concluded that we cannot assume, as do both Newtonian theory and special relativity, the existence of any global inertial systems of coordinates (although we can carry over the existence of a local system of inertial coordinates in a vanishingly small region of spacetime around any event). Since no preferred class of global coordinate systems is assumed, the general theory essentially places all (smoothly related) systems of coordinates on an equal footing, and expresses physical laws in a way that is applicable to any of these systems. As a result, the laws of physics will hold good even with respect to coordinate systems in which the speed of light takes on values other than c. For example, the laws of general relativity are applicable to a system of coordinates that is fixed rigidly to the rotating Earth. According to these coordinates the distant galaxies are "circumnavigating" nearly the entire universe in just 24 hours, so their speed is obviously far greater than the constant c. The huge implied velocities of the celestial spheres was always problematical for the ancient conception of an immovable Earth, but it is beautifully accommodated within general relativity by the effect which the implied centrifugal acceleration field - whose strength increases in direct proportion to the distance from the Earth - has on the values of the metric components guv for this rotating system of coordinates at those locations. It's true that, when expressed in this rotating system of coordinates, those stars are moving with dx/dt values that far exceed the usual numerical value of c, but they are not moving faster than light, because the speed of light at those locations, expressed in terms of those coordinates, is correspondingly greater. In general, the velocity of light can always be inferred from the components of the metric tensor, and typically looks something like recall that in special relativity we have

. To understand why this is so,

and the trajectory of a light ray follows a null path, i.e., a path with dτ = 0. Thus, dividing by (dt)2, we see that the path of light through spacetime satisfies the equation

and so the velocity of light is unambiguous in the context of special relativity, which is restricted to inertial coordinate systems with respect to which equation (1) is invariant. However, in the general theory we are no longer guaranteed the existence of a global coordinate system of the simple form (1). It is true that over a sufficiently small spatial and temporal region surrounding any given point in spacetime there exists a coordinate

system of that simple Minkowskian form, but in the presence of a non-vanishing gravitational field ("curvature") equation (1) applies only with respect to "free-falling" reference frames, which are necessarily transient and don't extend globally. So, for example, instead of writing the metric in the xt plane as (dτ)2 = (dt)2  (dx)2 , we must consider the more general form

As always, the path of a light ray is null, so we have dτ = 0, and the differentials dx and dt must satisfy the equation

Solving this gives

If we diagonalize our metric we get gxt = 0, in which case the "velocity" of a null path in the xt plane with respect to this coordinate system is simply dx/dt = . This quantity can (and does) take on any value, depending on our choice of coordinate systems. Around 1911 Einstein proposed to incorporate gravitation into a modified version of special relativity by allowing the speed of light to vary as a scalar from place to place as a function of the gravitational potential. This "scalar c field" is remarkably similar to a simple refractive medium, in which the speed of light varies as a function of the density. Fermat's principle of least time can then be applied to define the paths of light rays as geodesics in the spacetime manifold (as discussed in Section 8.4). Specifically, Einstein wrote in 1911 that the speed of light at a place with the gravitational potential φ would be c0 (1 + φ/c02), where c0 is the nominal speed of light in the absence of gravity. In geometrical units we define c0 = 1, so Einstein's 1911 formula can be written simply as c = 1 + φ. However, this formula for the speed of light (not to mention this whole approach to gravity) turned out to be incorrect, as Einstein realized during the years leading up to 1915 and the completion of the general theory. In fact, the general theory of relativity doesn't give any equation for the speed of light at a particular location, because the effect of gravity cannot be represented by a simple scalar field of c values. Instead, the "speed of light" at a each point depends on the direction of the light ray through that point, as well as on the choice of coordinate systems, so we can't generally talk about the value of c at a given point in a non-vanishing gravitational field. However, if we consider just radial light rays near a spherically symmetrical (and non- rotating) mass, and if we agree to use a specific set of coordinates, namely those in which the metric coefficients are independent of t, then we can read a formula analogous to

Einstein's 1911 formula directly from the Schwarzschild metric. But how does the Schwarzschild metric follow from the field equations of general relativity? To deduce the implications of the field equations for observable phenomena Einstein originally made use of approximate methods, since no exact solutions were known. These approximate methods were adequate to demonstrate that the field equations lead in the first approximation to Newton's laws, and in the second approximation to a natural explanation for the anomalous precession of Mercury (see Section 6.2). However, these results can now be directly computed from the exact solution for a spherically symmetric field, found by Karl Schwarzschild in 1916. As Schwarzschild wrote, it's always pleasant to find exact solutions, and the simple spherically symmetrical line element "let's Mr. Einstein's result shine with increased clarity". To this day, most of the empirically observable predictions of general relativity are consequences of this simple solution. We will discuss Schwarzschild's original derivation in Section 8.7, but for our present purposes we will take a slightly different approach. Recall from Section 5.5 that the most general form of the metrical spacetime line element for a spherically symmetrical static field (although it is not strictly necessary to assume the field is static) can be written in polar coordinates as

where gθθ = r2, gϕϕ = r2 sin(θ)2, and gtt and grr are functions of r and the gravitating mass m. We expect that if m = 0, and/or as r increases to infinity, we will have gtt = 1 and grr = 1 in order to give the flat Minkowski metric in the absence of gravity. We've seen that in this highly symmetrical context there is a very natural way to derive the metric coefficients gtt and grr simply from the requirement to satisfy Kepler's third law and the principle of symmetry between space and time. However, we now wish to know what values for these metric coefficients are implied by Einstein's field equations. In any region that is free of (non-gravitational) mass-energy the vacuum field equations must apply, which means the Ricci tensor

must vanish, i.e., all the components are zero. Since our metric is in diagonal form, it's easy to see that the Christoffel symbols for any three distinct indices a,b,c reduce to

with no summations implied. In two of the non-vanishing cases the Christoffel symbols are of the form qa/(2q), where q is a particular metric component and subscripts denote partial differentiation with respect to xa. By an elementary identity these can also be

written as . Hence if we define the new variable we can write the 2Q Christoffel symbol in the form Qa with q = e . Accordingly if we define the variables (functions of r)

then we have

and the non-vanishing Christoffel symbols (as given in Section 5.5) can be written as

We can now write down the components of the Ricci tensor, each of which must vanish in order for the field equations to be satisfied. Writing them out explicitly and expanding all the implied summations for our line element, we find that all the non-diagonal components are identically zero (which we might have expected from symmetry arguments), so the only components of interest in our case are the diagonal elements

Inserting the expressions for the Christoffel symbols gives the equations for the four diagonal components of the Ricci tensor as functions of u and v:

The necessary and sufficient condition for the field equations to be satisfied by a line element of the form (2) is that these four quantities each vanish. Combining the expressions for Rtt and Rrr we immediately have ur = -vr , which implies u = -v + k for some arbitrary constant k. Making these substitutions into the equation for Rθθ and setting the constant of integration to k = πi/2 gives the condition

Remembering that e2u = gtt, and that the derivative of e2u is 2ur e2u, this condition expresses the requirement

The left side is just the chain rule for the derivative of the product r gtt, and since this derivative equals 1 we immediately have rgtt = r + α for some constant α. Also, since grr = e2v where v = u + πi/2, it follows that grr = 1/gtt, and so we have the results

To match the Newtonian limit we set α = 2m where m is classically identified with the mass of the gravitating body. These metric coefficients were derived by combining the expressions for Rtt and Rrr, but it's easy to verify that they also satisfy each of those equations separately, so this is indeed the unique spherically symmetrical static solution of Einstein's field equations. Now that we have derived the Schwarzschild metric, we can easily correct the "speed of light" formula that Einstein gave in 1911. A ray of light always travels along a null trajectory, i.e., with dτ = 0, and for a radial ray we have dθ and dϕ both equal to zero, so the equation for the light ray trajectory through spacetime, in Schwarzschild coordinates (which are the only spherically symmetrical ones in which the metric is independent of t) is simply

from which we get

where the  sign just indicates that the light can be going radially inward or outward. (Note that we're using geometric units, so c = 1.) In the Newtonian limit the classical gravitational potential at a distance r from mass m is φ = m/r, so if we let cr = dr/dt denote the radial speed of light in Schwarzschild coordinates, we have cr = 1 + 2 φ which corresponds to Einstein's 1911 equation, except that we have a factor of 2 instead of 1 on the potential term. Thus, as φ becomes increasingly negative (i.e., as the magnitude of the potential increases), the radial "speed of light" cr defined in terms of the Schwarzschild parameters t and r is reduced to less than the nominal value of c. On the other hand, if we define the tangential speed of light at a distance r from a gravitating mass center in the equatorial plane (θ = π/2) in terms of the Schwarzschild coordinates as ct = r(dϕ/dt), then the metric divided by (dt)2 immediately gives

Thus, we again find that the "velocity of light" is reduced a region with a strong gravitational field, but this speed is the square root of the radial speed at the same point, and to the first order in m/r this is the same as Einstein's 1911 formula, although it is understood now to signify just the tangential speed. This illustrates the fact that the general theory doesn't lead to a simple scalar field of c values. The effects of gravitation can only be accurately represented by a tensor field. One of the observable implications of general relativity (as well as any other theory that respects the equivalence principle) is that the rate of proper time at a fixed radial position in a gravitational field relative to the coordinate time (which corresponds to proper time sufficiently far from the gravitating mass) is given by

It follows that the characteristic frequency ν1 of light emitted by some known physical process at a radial location r1 will represent a different frequency ν1 with respect to the proper time at some other radial location r2 according to the formula

From the Schwarzschild metric we have gtt(rj) = 12φj where φj = -m/rj is the gravitational potential at rj, so

Neglecting the higher-order terms and rearranging, this can also be written as

Observations of the light emitted from the surface of the Sun, and from other stars, is consistent with this predicted amount of gravitational redshift (up to first order), although measurements of this slight effect are difficult. A terrestrial experiment performed by Rebka and Pound in 1960 exploited the Mossbauer effect to precisely determine the redshift between the top and bottom of a tower. The results were in good agreement with the above formula, and subsequent experiments of the same kind have improved the accuracy to within about 1 percent. (Note that if r1 and r2 are nearly equal, as, for example, at two heights near the Earth's surface, then the leading factor of the right-most expression is essentially just the acceleration of gravity a = -m/r2, and the factor in parentheses is the difference in heights Δh, so we have Δν/ν = a Δh.) However, it's worth noting that this amount of gravitational redshift is a feature of just about any viable theory of gravity that includes the equivalence principle, so these experimental results, although useful for validating that principle, are not very robust for distinguishing between competing theories of gravity. For this we need to consider other observations, such as the paths of light near a gravitating body, and the precise orbits of planets. These phenomena are discussed in the subsequent sections. 6.2 Anomalous Precessions In these last months I had great success in my work. Generally covariant gravitation equations. Perihelion motions explained quantitatively… you will be astonished. Einstein to Besso, 17 Nov 1915

The Earth's equatorial plane maintains a nearly constant absolute orientation in space throughout the year due to the gyroscopic effect of spinning about its axis. Similarly the plane of the Earth's orbit around the Sun remains essentially constant. These two planes are tilted by 23.5 degrees with respect to each other, so they intersect along a single line

whose direction remains constant, assuming the planes themselves maintain fixed attitudes. At the Spring and Autumn equinoxes the Sun is located precisely on this fixed line in opposite directions from the Earth. Since this line is a highly stable directional reference, it has been used by astronomers since ancient times to specify the locations of celestial objects. (Of course, when we refer to "the location of the Sun" we are speaking somewhat loosely. With the increased precision of observations made possible by the invention of the telescope, it is strictly necessary to account for the Sun's motion about the center of mass of the solar system. It is this center of mass of the Sun and planets, rather than just of the Sun, that is taken as the central inertial reference point for the most precise astronomical measurements and calculations.) By convention, the longitude of celestial objects is referenced from the direction of this line pointing to the Spring equinox, and this is called the "right ascension" of the object. In addition, the "declination" specifies the latitude, i.e., the angular position North or South of the Earth's equatorial plane. This system of specifying positions is quite stable, but not perfect. Around 150 BC the Greek astronomer Hipparchus carefully compared his own observations of certain stars with observations of the same stars recorded by Timocharis 169 years earlier (and with some even earlier measurements from the Babylonians), and noted a slight but systematic difference in the longitudes. Of course, these were all referenced to the supposedly fixed direction of the line of intersection between the Earth's rotational and orbital planes, but Hipparchus was led to the conclusion that this direction is not perfectly stationary, i.e., that the direction of the Sun at the equinoxes is not constant with respect to the fixed stars, but precesses by about 0.0127 degrees each year. This is a remarkably good estimate, considering the limited quality of the observations that were available to Hipparchus. The accepted modern value for the precession of the equinoxes is 0.01396 degrees per year, which implies that the line of the equinoxes actually rotates completely around 360 degrees over a period of about 26,000 years. Interpreting this as a gradual change in the orientation of the Earth's axis of rotation, the precession of the equinoxes is the third of what Copernicus called the "threefold movement of the Earth", the first two being a rotation about its axis once per day, and a revolution about the Sun once per year. Awareness of this third motion is arguably a distinguishing feature of human culture, since it can only be discerned on the basis of information spanning multiple generations. The reason for mentioning this, aside from expressing admiration for human ingenuity, is that when we observe the axis of the elliptical orbit of a planet such as Mercury (for example) over a long period of time, referenced to our equinox line, we must expect to find an apparent precession of about 0.01396 degrees per year, which equals 5025 arc seconds per century, assuming Mercury's orbital axis is actually stationary. However, astronomers have actually observed a precession rate of 5600 arc seconds per century for the axis of Mercury's orbit, so evidently the axis is not truly stationary. This might seem like a problem for Newtonian gravity, until we remember that Newton predicted stable elliptical orbits only for the idealized two-body case. When analyzing the actual orbit of Mercury we must also take into account the gravitational pull of the other planets, especially Venus and Earth (because of their proximity) and Jupiter (because of its size). It isn't simple to work out these effects, and unfortunately there is no simple analytical

solution to the n-body problem in Newtonian mechanics, but using the calculational techniques developed by Lagrange, Laplace, and others, it is possible to determine that the effects of all the other planets should contribute an additional 532 arc seconds per century to the precession of Mercury's orbit. Combined with the precession of our equinox reference line, this accounts for 5557 arc seconds per century, which is close to the observed value of 5600, but still short by 43 arc seconds per century. The astronomers assure us that their observations can't be off by more than a fraction of an arc second, so there seems to be a definite problem here. A similar problem had appeared in the 1840's when the newly discovered planet Uranus began to deviate noticeably from the precise course that Newtonian theory prescribed. On that occasion, the astronomer Le Verrier and the mathematician Adams had (independently) inferred the existence of a previously unknown planet beyond the orbit of Uranus, and even gave instructions where it could be found. Sure enough, when that indicated region of the sky was searched by Johann Galle at the Berlin Observatory, the planet that came to be called Neptune was discovered in 1846, astonishingly close to the predicted location. This was a tremendous triumph for Le Verrier, and surely gave him confidence that all apparent anomalies in the planetary orbits could be explained on the basis of Newtonian theory, and could be used as an aid to the discovery of new celestial objects. He soon turned his attention to the anomalous precession of Mercury's orbit (which he estimated at 38 arc seconds per century, somewhat less than the modern value), and suggested that it must be due to some previously unknown mass near the Sun, possibly a large number of small objects, or perhaps even another planet, inside the orbit of Mercury. At one point there were reports that a small planet orbiting very near the Sun had actually been sighted, and it was named Vulcan, after the Roman God of fire. Le Verrier became convinced that the new planet existed, but subsequent attempts to observe the hypothetical planet failed to find any sign of it. Even the original sightings were cast into doubt, since they had been made by an amateur, and other astronomers reported that they had been observing the Sun at the very same time and had seen nothing. Another popular theory to explain Mercury's anomalous precession, championed by the astronomer Simon Newcomb, was that the small particles of matter that cause the "zodiacal light" might account for Mercury's anomalous precession, but Newcomb soon realized that if there were enough matter to affect Mercury's perihelion so significantly there would also be enough to cause other effects on the orbits of the inner planets - effects which are not observed. Similar inconsistencies undermined the “Vulcan” hypothesis. As a result of the failures to arrive at a realistic Newtonian explanation for the anomalous precession, some researchers, notably Asaph Hall and Newcomb, began to think that perhaps Newtonian theory was at fault, and that perhaps gravity isn't exactly an inverse square law. Hall noted that he could account for Mercury's precession if the law of gravity, instead of falling off as 1/r2, actually falls of as 1/rn where the exponent n is 2.00000016. However, most people didn't (and still don't) find that idea to be very appealing, since it conflicts with basic conservation laws, e.g., Gauss's Law, unless we also postulate a correspondingly modified metric for space (ironically enough).

More recently, efforts have been made to explain some or all of Mercury's precession by oblateness in the shape of the sun. In 1966 Dicke and Goldenberg reported that the sun's polar axis is shorter than its equatorial axes by about 50 parts per million. If true that would account for 3.4" per century, so the unexplained part would be only 39.6", significantly different from GR's prediction of 43". The Brans-Dicke theory of gravity can account for 39.6" precisely by adjusting a free parameter of the theory. However, Dicke's and Goldenberg's solar oblateness data was contradicted by a number of other heliometric measurements, all of which showed that the solar axes differ by no more than about 4 parts per million. In addition, the sun doesn't appear to rotate nearly fast enough to be as oblate as Dicke and Goldenberg thought, so their results could only be right if the interior of the sun is spinning about 25 times faster than the visible exterior, which is highly implausible. The current consensus is that the Sun is not nearly oblate enough to upset the agreement between Mercury's observed precession and the predictions of GR. This is all the more impressive considering that, in contrast to the Brans-Dicke and other alternative theories, GR has almost no "freedom" to adjust its predictions. It is highly constrained by its own logic, so it's remarkable that it continues to survive experimental challenges. It should be noted that Mercury isn't the only object in the solar system that exhibits anomalous precession. The effect is most noticeable for objects near the Sun with highly elliptical orbits, but it can be seen even in the nearly circular orbits of Venus and Earth, although the discrepancy isn't nearly so large as for Mercury. In addition, the asteroid Icarus is ideal for studying this effect, because it has an extremely elliptical orbit and periodically passes very close to the Sun. Here's a table showing the anomalous precession of four objects in the inner solar system, based on direct observations:

The large tolerances for Venus and Earth are mainly due to the fact that their orbits are so nearly circular, making it difficult to precisely determine the axes of their elliptical orbits. Incidentally, Icarus periodically crosses the Earth's path, and has actually passed within a million kilometers of us - less than 3 times the distance to the Moon. It's about 1 mile in diameter, and may eventually collide with the Earth - reason enough to keep an eye on its precession. One hope that Einstein had throughout the time he was working on the general theory was that it would explain the anomalous precession of Mercury. Of course, as we've seen, "explanations" of this phenomenon were never in short supply, but none of them

were very compelling, all seeming to be ad hoc. In contrast, Einstein found that the extra precession arises unavoidably from the fundamental principles of general relativity. To determine the relativistic prediction for the advance of an elliptical orbit, let's work in the single plane θ = π/2, so of course dθ/dt and all higher derivatives also vanish, and we have sin(θ) = 1. Thus the term involving θ in the Schwarzschild metric drops out, leaving just

The Christoffel symbols and the equations of geodesic motion for this metric were already given in Section 5.5. Taking the parameter λ equal to the proper time τ, those equations are

We can immediately integrate equations (2) and (4) to give

where k and h are constants of integration, determined by the initial conditions of the orbit. We can now substitute for these derivatives into the basic Schwarzschild metric divided by (dτ2 to give

Solving for (dr/dτ2, we have

Differentiating this with respect to τ and dividing by 2(dr/dτ gives

(We arrive at this same equation if we insert the squared derivatives of the coordinates into equation (3), because one of the geodesic equations is always redundant to the line element.) Letting ω = dϕ/dτ denote the proper angular speed, we have h = ωr2, and the above equation can be written as

Obviously if ω = 0 this gives the "proper" analog of Newton's inverse-square law for radial gravitational acceleration. With non-zero ω the term ω2r corresponds to the Newtonian centripetal acceleration which, if we defined the tangential velocity v = ωr, would equal the classical v2/r. This term serves to offset the inward pull of gravity, but in the relativistic version we find not ω2r but ω2(r3m). (To avoid confusion, it’s worth nothing that the quantity ω2(13m/r) would be simply ω2 if ω were defined as the derivative of ϕ with respect to the Schwarzschild coordinate time t instead of the proper time τ. Hence, as we saw in Section 5.5, the relativistic version of Kepler’s third law for circular orbits is formally identical to the Newtonian version – but only if we identify the Newtonian coordinates with the Schwarzschild coordinates.) For values of r much greater than 3m this difference can be neglected, but clearly if r approaches 3m we can expect to see non-classical effects, and of course if r ever becomes less than 3m we would expect completely un-classical behavior. In fact, this corresponds to the cases when an orbiting particle spirals into the center, which never happens in classical theory (see below). Since the above equations involve powers of (1/r) it's convenient to work with the parameter u = 1/r. Differentiating u with respect to ϕ gives du/dϕ = (1/r2) dr/dϕ. Also, since r2 = h/(dϕ/dτ), we have dr/dτ = h (du/dϕ). Substituting for dr/dτ and 1/r into equation (5) gives the following differential equation relating u to ϕ

Differentiating again with respect to ϕ and dividing by 2h2 (du/dϕ), we arrive at

where

denotes d2u/dϕ2. Solving this quadratic for u gives

The quantity in the parentheses under the square root is typically quite small compared with 1, so we can approximate the square root by the first few terms of its expansion

Expanding the right hand side and re-arranging terms gives

The value of

in typical astronomical problems is numerically quite small (many orders

of magnitude less than 1), so the quantity on the right hand side will be negligible for planetary motions. Therefore, we're left with a simple harmonic oscillator of the form where M and F are constants. For some choice of initial ϕ the general solution of this equation can be expressed as where k is a constant of integration. Therefore, reverting back to the parameter r = 1/u, the relation between r and ϕ is

where If the "frequency" Ω was equal to unity, this would be the polar equation of an ellipse with the pole at one focus, and the constant k would signify the eccentricity. Also, the leading factor would be the radial distance from the focus to the ellipse at an angle of π/2 from the major axis, i.e., it would represent the semilatus rectum. However, the value of Ω is actually slightly less than 1, which implies that ϕ must go slightly beyond 2π in order to complete one cycle of the radial distance. Consequently, for small values of m/h the path is approximately a Keplerian ellipse, but the axis of the ellipse precesses slightly, as illustrated below.

This illustration depicts a much more severe case than could exist for any planet in our solar system, because the perihelion of the orbit is only 200m where m is the gravitational radius (in geometrical units) of the central object, which means it is only 100 times the corresponding "black hole radius". Our Sun's mass is not nearly concentrated enough to permit this kind of orbit, since the Sun's gravitational radius is only m = 1.475 kilometers, whereas it's matter fills a sphere of radius 696,000 kilometers. To determine the relativistic prediction for the orbital precession of the planetary orbits, we can expand the expression for Ω as follows

Since m/h is so small, we can take just the first-order term, and noting that one cycle of the radial function will be completed when Ωϕ = 2π, we see that ϕ must increase by 2π/ Ω for each radial cycle, so the precession per revolution is

We saw above that the semilatus rectum L is approximately h2/m, so the amount of precession per revolution (for slow moving objects in weak gravitational fields, such as the planets in our solar system) can be written as simply 6πm/L, where m is the gravitational radius of the central body. As noted above, the gravitational radius of our Sun is 1.475 kilometers, so based on the elements of the planetary orbits we can construct the following table of relativistic precession.

The observed precession of 43.1  0.5 arc seconds per century for the planet Mercury is in close agreement with the theory. We noted in section 5.8 how Einstein proudly concluded his presentation of the vacuum field equations in his 1916 paper on general relativity by pointing out that they explained the anomalous precession of Mercury. He returned to this subject at the end of the paper, giving the precession formula and closing his masterpiece with the words Calculation gives for the planet Mercury a rotation of the orbit of 43" per century, corresponding exactly to the astronomical observation (Leverrier); for the astronomers have discovered in the motion of the perihelion of this planet, after allowing for disturbances by the other planets, an inexplicable remainder of this magnitude.

We mentioned previously that the small eccentricities of Venus and Earth make it difficult to determine their lines of apsides with precision, but modern measurement techniques (including the use of interplanetary space probes and radar ranging) and computerized analysis of the data have enabled the fitting of the entire solar system to a parameterized post-Newtonian (PPN) model that encompasses a fairly wide range of theories (including general relativity). Once the parameters of this model have been fit to all the available data for the Sun and planets, the model can then be used to compute the "best observational fit" for the precessions of the individual planets based on the PPN formalism. This gives precessions (in excess of the Newtonian predictions) of 43.1, 8.65, 3.85, and 1.36 arcseconds per century for the four inner planets respectively, in remarkable agreement with the predictions of general relativity. If we imagine an extremely dense central object, whose mass is concentrated inside it's gravitational radius, we can achieve much greater deviations from conventional Newtonian orbits. For example, if the precession rate is roughly equal to the orbital rate, we have an orbit as shown below:

For an orbit with slightly less energy the path looks like this:

where the dotted circle signifies the "light orbit" radius r = 3m. With sufficient angular momentum it's possible to arrange for persistent timelike orbits periodically descending down to any radius greater than 3m, which is the smallest possible radius of a circular orbit (but note that a circular orbit with radius less than 6m is unstable). If a timelike geodesic ever passes inside that radius it must then spiral in to the central mass, as illustrated below.

Here the outer dotted circle is at 3m, and the inner circle is at the event horizon, 2m. Once a worldline has fallen within 2m, whether geodesic or not, it's radial coordinate

must (according to the Schwarzschild solution) thereafter decrease monotonically to zero. Regarding these spiral solutions there is an ironic historical precedent. A few years before writing the Principia Newton once described in a letter to Robert Hooke the descent of an object along a spiral path to the center of a gravitating body. Several years later, after the Principia had established Newton's reputation, the two men became engaged in a bitter priority dispute over the discovery of universal gravitation, and Hooke used this letter as evidence that Newton hadn't understood gravity at that time, because the classical inverse-square law of gravity permits no such spiral solutions. Newton replied that it had simply been a "negligent stroke with his pen". Interestingly, although people sometimes credit Newton with originating the idea of photons based on his erroneous corpuscular theory of light, it's never been suggested that his "negligent spiral" was a premonition of the Schwarzschild solution of Einstein's field equations. Incidentally, the relativistic contribution to a planet's orbital precession rate is often derived as a "resonance" effect. Recall that the general solution of an ordinary linear differential equation contains a term proportional to eλx for each root λ of the characteristic polynomial, and a resonance occurs when the characteristic polynomial has a repeated root, in which case the solution has a term proportional to xeλx. If there is another repetition of the root it is represented by a term proportional to x2eλx, and so on. As a means of approximating the solution of the non-linear equation (6), many authors introduce a trial solution of the form c0 + c1cos(ϕ) + c2ϕsin(ϕ), suggesting that the last term is to be regarded as a resonance, whose effect grows cumulatively over time because the factor ϕ is not periodic, and therefore eventually has observable effects over a large number of orbits, such as the 414 revolutions of Mercury per century. Now, provided (c2/c1)ϕ is many orders of magnitude smaller than 1, we can use the small-angle approximations sin(x) ~ x and cos(x) ~ 1 to write the solution as

where we’ve used the trigonometric identity cos(x)cos(y)  sin(x)sin(y) = cos(xy). This yields the correct result, but the interpretation of it as a resonance effect is misleading, because the predominant cumulative effect of a resonant term proportional to ϕsin(ϕ) in the long run is not a precession of the ellipse, but rather an increase in the magnitude of the radial excursions of a component that is at an angle of π/2 relative to the original major axis. It just so happens that on the initial cycles this effect causes the overall perihelion to precess slightly, simply because the phase of the sine component is beginning to assert itself over the phase of the cosine component. In other words, the apparent "precession" resulting from the ϕsin(ϕ) term on the initial cycles is really just a

one-time phase shift corresponding to a secular increase in the radial amplitude, and does not actually represent a change in the frequency of the solution. It can be shown that a term involving ϕsin(ϕ) appears in the second-order power series expansion of the solution to equation (6), which explains why it is a useful curve-fitting function for small ϕ, but it does not represent a true resonance effect, as shown by the fact that the ultimate cumulative effect of this term is discarded when we apply the small-angle approximation to estimate the frequency shift. 6.3 Bending Light When Lil’s husband got demobbed, I said – I didn’t mince my words, I said to her myself, HURRY UP PLEASE IT’S TIME T. S. Eliot, 1922 At the conclusion of his treatise on Opticks in 1704, the 62 year old Newton lamented that he could "not now think of taking these things into farther consideration", and contented himself with proposing a number of queries "in order to a farther search to be made by others". The very first of these was Do not Bodies act upon Light at a distance, and by their action bend its Rays, and is not this action strongest at the least distance? Superficially this may not seem like a very radical suggestion, because on the basis of the corpuscular theory of light, and Newton's laws of mechanics and gravitation, it's easy to conjecture that a beam of light might be deflected slightly as it passes near a large massive body, assuming particles of light respond to gravitational acceleration similarly to particles of matter. For any conical orbit of a small test particle in a Newtonian gravitational field around a central mass m, the eccentricity is given by

where E = v2/2  m/r is the total energy (kinetic plus potential), h = rvt is the angular momentum, v is the total speed, vt is the tangential component of the speed, and r is the radial distance from the center of the mass. Since a beam of light travels at such a high speed, it will be in a shallow hyperbolic orbit around an ordinary massive object like the Sun. Letting r0 denote the closest approach (the perihelion) of the beam to the gravitating body, at which v = vt, we have

Now we set v = 1 (the speed of light in geometric units) at the perihelion, and from the geometry of the hyperbola we know that the asymptotes make an angle of α with the axis of symmetry, where cos(α) = 1/ε.

With a hyperbola as shown in the figure above, this implies that the total angular deflection of the beam of light is δ = 2(α  π/2), which for small angles α and for m (in geometric units) much less than r0 is given in Newtonian mechanics by

The best natural opportunity to observe this deflection would be to look at the stars near the perimeter of the Sun during a solar eclipse. The mass of the Sun in gravitational units is about m = 1475 meters, and a beam of light just skimming past the Sun would have a closest distance equal to the Sun's radius, r = (6.95)108 meters. Therefore, the Newtonian prediction would be 0.000004245 radians, which equals 0.875 seconds of arc. (There are 2π radians per 360 degrees, each of degree representing 60 minutes of arc, and each minute represents 60 seconds of arc.) However, there is a problematical aspect to this "Newtonian" prediction, because it's based on the assumption that particles of light can be accelerated and decelerated just like ordinary matter, and yet if this were the case, it would be difficult to explain why (in nonrelativistic absolute space and time) all the light that we observe is traveling at a single characteristic speed. Admittedly if we posit that the rest mass of a particle of light is extremely small, it might be impossible to interact with such a particle without imparting to it a very high velocity, but this doesn't explain why all light seems to have precisely the same velocity, as if this particular speed is somehow a characteristic property of light. As a result of these concerns, especially as the wave conception of light began to supersede the corpuscular theory, the idea that gravity might bend light rays was largely discounted in Newtonian physics. (The same fate befell the idea of black holes, originally proposed by Mitchell based on the Newtonian escape velocity for light. Laplace also mentioned the idea in his Celestial Mechanics, but deleted it in the third edition, possibly because of the conceptual difficulties discussed here.) The idea of bending light was revived in Einstein's 1911 paper "On the Influence of Gravitation on the Propagation of Light". Oddly enough, the quantitative prediction given in this paper for the amount of deflection of light passing near a large mass was identical

to the old Newtonian prediction, δ = 2m/r0. There were several attempts to measure the deflection of starlight passing close by the Sun during solar eclipses to test Einstein's prediction in the years between 1911 and 1915, but all these attempts were thwarted by cloudy skies, logistical problems, the First World War, etc. Einstein became very exasperated over the repeated failures of the experimentalists to gather any useful data, because he was eager to see his prediction corroborated, which he was certain it would be. Ironically, if any of those early experimental efforts had succeeded in collecting useful data, they would have proven Einstein wrong! It wasn't until late in 1915, as he completed the general theory, that Einstein realized his earlier prediction was incorrect, and the angular deflection should actually be twice the size he predicted in 1911. Had the World War not intervened, it's likely that Einstein would never have been able to claim the bending of light (at twice the Newtonian value) as a prediction of general relativity. At best he would have been forced to explain, after the fact, why the observed deflection was actually consistent with the completed general theory. (This would have made it somewhat similar the cosmological expansion, which would have been one of the most magnificent theoretical predictions in the history of science, but the experimentalist Hubble got there first.) Luckily for Einstein, he corrected the light-bending prediction before any expeditions succeeded in making useful observations. In 1919, after the war had ended, scientific expeditions were sent to Sobral in South America and Principe in West Africa to make observations of the solar eclipse. The reported results were angular deflections of 1.98  0.16 and 1.61  0.40 seconds of arc, respectively, which was taken as clear confirmation of general relativity's prediction of 1.75 seconds of arc. This success, combined with the esoteric appeal of bending light, and the romantic adventure of the eclipse expeditions themselves, contributed enormously to making Einstein a world celebrity. One other intriguing aspect of the story, in retrospect, is the fact that there is serious doubt about whether the measurement techniques used by the 1919 expeditions were robust enough to have legitimately detected the deflections which were reported. Experimentalists must always be wary of the "Ouija board" effect, with their hands on the instruments, knowing what results they want or expect. This makes it especially interesting to speculate on what values would have been recorded if they had managed to take readings in 1914, when the expected deflection was still just 0.875 seconds of arc. (It should be mentioned that many subsequent observations, summarized below, have independently confirmed the angular deflection predicted by general relativity, i.e., twice the "Newtonian" value.) To determine the relativistic prediction for the bending of light past the Sun, the conventional approach is to simply evaluate the solution of the four geodesic equations presented in Chapter 5.2, but this involves a three-dimensional manifold, with a large number of Christoffel symbols, etc. It's possible to treat the problem more efficiently by considering it from a two-dimensional standpoint. Recall the Schwarzschild metric in the usual polar coordinates

We'll restrict our attention to a single plane through the center of mass by setting ϕ = 0, and since light travels along null paths, we set dτ = 0, which allows us to write the remaining terms in the form

This can be regarded as the (positive-definite) line element of a two-dimensional surface (r, θ), with the parameter t serving as the metrical distance. The null paths satisfying the complete spacetime metric with dτ = 0 are stationary if and only if they are stationary with respect to (1). This implies Fermat’s Principle of “least time”, i.e., light follows paths that minimize the integrated time of flight, or, more generally, paths for which the elapsed Schwarzschild coordinate time is stationary, as discussed in Chapter 3.5. (Equivalently, we have an angular analog of Fermat’s Principle, i.e., light follows paths that make the angular displacement dθ stationary, because the coefficients of (1) are independent of both t and θ.) Therefore, we need only determine the geodesic paths on this surface. The covariant and contravariant metric tensors are simply

and the only non-zero partial derivatives of the components of gµν are

so the non-zero Christoffel symbols are

Taking the coordinate time t as the path parameter (since it plays the role of the metrical distance in this geometry), the two equations for geodesic paths on the (r, θ) surface are

These equations of motion describe the paths of light rays in a spherically symmetrical gravitational field. The figure below shows the paths of a set of parallel incoming rays.

The dotted circles indicate radii of m, 2m, ..., 6m from the mass center. Needless to say, a typical star's physical radius is much greater than it's gravitational radius m, so we will not find such severe deflection of light rays, even for rays grazing the surface of the star. However, for a "black hole" we can theoretically have rays of light passing at values of r on the same order of magnitude as m, resulting in the paths shown in this figure. Interestingly, a significant fraction of the oblique incoming rays are "scattered" back out, with a loop at r = 3m, which is the "light radius". As a consequence, if we shine a broad light on a black hole, we would expect to see a "halo" of back-scattered light outlining a circle with a radius of 3m. To quantitatively assess the angular deflection of a ray of light passing near a large gravitating body, note that in terms of the variable u = dθ/dt the second geodesic equation (2) has the form (1/u)du = [(2/r)(r3m)/(r2m)]dr, which can be integrated immediately to give ln(u) = ln(r2m)  3ln(r) + C, so we have

To determine the value of K, we divide the metric equation (1) by (dt)2 and evaluate it at the perihelion r = r0, where dr/dt = 0. This gives

Substituting into the previous equation we find K2 = r03/(r0  2m), so we have

Now we can substitute this into the metric equation divided by (dt)2 and solve for dr/dt to give

Dividing dθ/dt by dr/dt then gives

Integrating this from r = r0 to  gives the mass-centered angle swept out by a photon as it moves from the perihelion out to an infinite distance. If we define ρ = r0/r the above equation can be written in the form

The magnitude of the second term in the right-hand square root is always less than 1 provided r0 is greater than 3m (which is the radius of light-like circular orbits, as discussed further in Section 6.5), so we can expand the square root into a power series in that quantity. The result is

This can be analytically integrated term by term. The integral of the first term is just π/2, as we would expect, since with a mass of m = 0 the photon would travel in a straight line, sweeping out a right angle as it moves from the perihelion to infinity. The remaining terms supply the “excess angle”, which represents the angular deflection of the light ray. If m/r0 is small, only the first-order term is significant. Of course, the path of light is symmetrical about the perihelion, so the total angular deflection between the asymptotes of the incoming and outgoing rays is twice the excess of the above integral beyond π/2. Focusing on just the first order term, the deflection is therefore

Evaluating the integral

from ρ = 0 to 1 gives the constant factor 2, so the first-order deflection is δ = 4m/r0. This gives the relativistic value of 1.75 seconds of arc, which is twice the Newtonian value. To higher orders in m/r0 we have

The difficulty of performing precise measurement of optical starlight deflection during an eclipse can be gathered from the following list of results:

Fortunately, much more accurate measurements can now be made in the radio wavelengths, especially of quasars, since such measurements can be made from observatories with the best equipment and careful preparation (rather than hurriedly in a remote location during a total eclipse). In particular, the use of Very Long Baseline Interferometry (VBLI), combining signals from widely separate observatories, gives a tremendous improvement in resolving power. With these techniques it’s now possible to precisely measure the deflection (due to the Sun’s gravitational field) of electromagnetic waves from stars at great angular distances from the Sun. According to Will, an analysis in 2004 of over 2 million VBLI observations has shown that the ratio of the actual observed deflections to the deflections predicted by general relativity is 0.99992 ± 0.00023. Thus the dramatic announcement of 1919 has been retro-actively justified. The first news of the results of Eddington’s expedition reached Einstein by way of Lorentz, who on September 22 sent the telegram quoted at the beginning of this chapter. On the 7th of October Lorentz followed with a letter, providing details of Eddington’s presentation to the “British Association at Bournemouth”. Oddly enough, at this meeting Eddington reported that “one can say with certainty that the effect (at the solar limb) lies between 0.87” and 1.74”, although he qualified this by saying the plates had been measured only preliminarily, and the final value was still to be determined. In any case, Lorentz’s letter also included a rough analysis of the amount of deflection that would be expected due to ordinary refraction in the gas surrounding the Sun. His calculations indicated that a suitably chosen gas density at the Sun’s surface could indeed produce a deflection on the order of 1”, but for any realistic density profile the effect would drop off very rapidly for rays passing just slightly further from the Sun. Thus the effect of refraction, if there was any, would be easily distinguishable from the relativistic effect. He concluded We may surely believe (in view of the magnitude of the detected deflection) that, in reality, refraction is not involved at all, and your effect alone has been observed. This is certainly one of the finest results that science has ever accomplished, and we may be very pleased about it.

6.4 Radial Paths in a Spherically Symmetrical Field It is no longer clear which way is up even if one wants to rise. David Riesman, 1950 In this section we consider the simple spacetime trajectory of an object moving radially with respect to a spherical mass. As we’ve seen, according to general relativity the metric of spacetime in the region surrounding an isolated spherical mass m is given by

where t is time coordinate, r is the radial coordinate, and the angles θ and ϕ are the usual angles for polar coordinates. Since we're interested in purely radial motions the differentials of the angles dθ and dϕ are zero, and we're left with a 2-dimensional surface with the coordinates t and r, with the metric

This formula tells us how to compute the absolute lapse of proper time dτ along a given path corresponding to the coordinate increments dt and dr. The metric tensor on this 2dimensional space is given by the diagonal matrix

which has determinant g = 1. The inverse of the covariant tensor guv is the contravariant tensor

In order to make use of index notation, we define x1 = t and x2 = r. Then the equations for the geodesic paths on any surface can be expressed as

where summation is implied over any indices that are repeated in a given product, and Γijk denotes the Christoffel symbols. Note that the index i can be either 1 or 2, so the above expression actually represents two differential equations involving the 1st and 2nd derivatives of our coordinates x1 and x2 (which, remember, are just t and r) with respect to the "affine parameter" λ. This parameter just represents the normalized "distance" along the path, so it's proportional to the proper time τ for timelike paths. The Christoffel symbol is defined in terms of the partial derivatives of the components of the metric tensor as follows

Taking the partials of the components of our guv with respect to t and r we find that they are all zero, with the exception of

Combining this with the fact that the only non-zero components of the inverse metric tensor guv are g11 and g22, we find that the only non-zero Christoffel symbols are

So, substituting these expressions into the geodesic formula (2), and reverting back to the symbols t and r for our coordinates, we have the two ordinary differential equations for the geodesic paths on the surface

These equations can be integrated in closed form, although the result is somewhat messy.

They can also be directly integrated numerically using small incremental steps of "dλ" for any initial position and trajectory. This allows us to easily generate geodesic paths in terms of r as a function of t. If we do this, we will notice that the paths invariably go to infinite t as r approaches 2m. Is our 2-dimensional surface actually singular at r = 2m, or are the coordinates simply ill-behaved (like longitude at the North pole)? As we saw above, the surface has an invariant Gaussian curvature at each point. Let's determine the curvature to see if anything strange occurs at r = 2m. The curvature can be computed in terms of the components of the metric tensor and their first and second partial derivatives. The non-zero first derivatives for our surface (and the determinant g = 1) were noted above. The only non-zero second derivatives are

So we can compute the intrinsic curvature of our surface using Gauss's formula for the curvature invariant K of a two-dimensional surface given in the section on Curvature. Inserting the metric components and derivatives for our surface into that equation gives the intrinsic curvature

Therefore, at r = 2m the curvature of this surface is 1/(4m2), which is certainly finite (and in fact can be made arbitrarily small for sufficiently large m). The only singularity in the intrinsic curvature of the surface occurs at r = 0. In order to plot r as a function of the proper time τ we would like to eliminate t from the two equations. To do this, notice that if we define T = dt/dλ the first equation can be written in the form

which is just an ordinary first-order differential equation in T with variable coefficients. Recall that the solution of any equation of the form

is given by

where k is a constant of integration and w =

. Thus the solution of (4) is

The integral in the exponential is just ln(r)  ln(r  2m) so the result is

Let's suppose our test particle is initially stationary at r = R and then allowed to fall freely. Thus the point r = R is the "apogee" of the radial orbit. Our affine parameter λ is proportional to the proper time τ along a path, and the value we assign to "k" determines the scale factor between λ and τ. From the original metric equation (1) we know that at the apogee (where dr/dτ = 0) we have

Multiplying this with the previous derivative at r = R gives

Thus in order to scale our affine parameter to the proper time τ for this radial orbit we need to set k =

, and so

(Notice that this implies the initial value of dt/dλ at the apogee is , and of course dr/dλ at that point is 0.) Substituting this into the 2nd geodesic equation (3) gives a single equation relating the radial parameter r and the affine parameter λ, which we have made equivalent to the proper time τ, so we have

At the apogee r = R where dr/dt = 0 this reduces to

This is a measure of the acceleration of a static test particle at the radial parameter r. More generally, we can use equation (5) to numerically integrate the geodesic path from any given initial trajectory, and it confirms that the radial coordinate passes smoothly through r = 2m as a function of the proper time τ. This may seem surprising at first, because the denominator of the leading factor contains (r  2m), so it might appear that the second derivative of r with respect to proper time τ "blows up" at r = 2m. However, remarkably, the square of dr/dτ is invariably forced to 1  2m/R precisely at r = 2m, so the quantity in the square brackets goes to zero, canceling the zero in the denominator. Interestingly, equation (5) has the same closed-form solution as does radial free-fall in Newtonian mechanics (if τ is identified with Newton's absolute time). The solution can be expressed in terms of the parameter α by the "cycloid relations"

The coordinate time t can also be given explicitly in terms of α by the formula

where Q =

. A typical timelike radial orbit is illustrated below.

6.5 Intersecting Orbits Time is the longest distance between two places. Tennessee Williams, 1945 The lapse of proper time for moving clocks in a gravitational field is often computed by splitting the problem into separate components, one to account for the velocity effect in accord with special relativity, and another to account for the gravitational effect in accord with general relativity. However, the general theory subsumes the special theory, and it's often easier to treat such problems holistically from a purely general relativistic standpoint. (The persistent tendency to artificially bifurcate problems into "special" and "general" components is partly due to the historical accident that Einstein arrived at the final theory in two stages.) In the vicinity of an isolated non-rotating spherical body whose Schwarzschild radius is 2m the metric has the form

where ϕ = longitude and θ = latitude (e.g., θ = 0 at the North Pole and θ = π/2 at the equator). Let's say our radial position r and our latitude θ are constant for each path in question (treating r as the "radius" in the weak field approximation). Then the coefficients of (dt)2 and (dϕ)2 are both constants, and the metric reduces to

If we're sitting on the Earth's surface at the North Pole, we have sin(θ) = 0, so it follows that ds =

dt where r is the radius of the Earth.

On the other hand, in an equatorial orbit with radius r = R then we have θ = π/2, sin2(θ) = 1, and so the coefficient of (dϕ)2 is simply R2. Now, recall Kepler's law ω2 R3 = m, which also happens to hold exactly in GR (provided that R is interpreted as the radial Schwarzschild coordinate and ω is defined with respect to Schwarzschild coordinate time). Since ω = dϕ/dt we have R2 = m/(ω2 R) = (dt/dϕ)2 (m/R). Thus the path of the orbiting particle satisfies

Now for each test particle, one sitting at the North Pole and one in a circular orbit of radius R, the path parameter s is the local proper time, so the ratio of the orbital proper time to the North Pole's proper time is

To isolate the difference in the two proper times, we can expand the above function into a power series in m/r to give

The mass of the earth, represented in geometrical units by half the Schwarzschild radius, is about 0.00443 meters, and the radius of the earth is about 6.38(10)6 meters, so this gives

which shows that the discrepancy in the orbit's lapse of proper time during a given lapse Δ T of proper time measured on Earth is

Consequently, for an orbit at the radius R=3r/2 (about 2000 miles up) there is no

difference in the lapses of proper time. Thus, if someone wants to get a null result, that would be their best choice. For orbits lower than 3r/2 the satellite will show slightly less lapse of proper time (i.e., the above discrepancy will be negative), whereas for higher orbits it will show slightly more elapsed time than the corresponding interval at the North Pole. For example, in a low Earth orbit of, say, 360 miles, we have r/R = 0.917, so the proper time runs about 22.5 microseconds per day slower than a clock at the North Pole. On the other hand, for a 22,000 mile orbit we have r/R = 0.18, and so the orbit's lapse of proper time actually exceeds the corresponding lapse of proper time at the North Pole by about 43.7 microseconds per day. Of course, as R continues to increase the orbital velocity drops to zero and we are left with just coordinate time for the orbit, relative to which the North Pole on Earth is "running slow" by about 60 micro-seconds per day, due entirely to the gravitational potential of the earth. (This means that during a typical human life span the Earth's gravity stretches out our lives to cover an extra 1.57 seconds of coordinate time.) Incidentally, equation (2) goes to zero when the orbit radius R equals 3m, consistent with the fact that 3m is the radius of the orbit of light. This suggests that even if something prevented a massive object from collapsing within its Schwarzschild radius 2m, it would still be a very remarkable object if it was just within 3m, because then it could (theoretically) support circular light orbits, although I don't believe such orbits would be stable (even neglecting interference from infalling matter). If neutrinos are massless there could also be neutrinos in 3m (unstable) orbits near such an object. The results of this and the previous section can be used to clarify the so-called twins paradox. In some treatments of special relativity the difference between the elapsed proper times along different paths between two fixed events is attributed to a difference in the locally "felt" accelerations along those paths. In other words, the asymmetry in the proper times is "explained" by the asymmetry in local accelerations. However, this explanation fails in the context of general relativity and gravity, because there are generally multiple free-fall (i.e., locally unaccelerated) paths of different proper lengths connecting two fixed events. This occurs, for example, with any two intersecting orbits with different eccentricities, provided they are arranged so that the clocks coincide at two intersections. To illustrate, consider the intersections between a circular and a purely radial “orbit” in the gravitational field of a spherically symmetrical mass m. One clock follows a perfectly circular orbit of radius r, while the other follows a purely radial (up and down) trajectory, beginning at a height r, climbing to R, and falling back to r, as shown below.

We can arrange for the two clocks to initially coincide, and for the first clock to complete n circular orbits in the same (coordinate) time it takes for the second clock to rise and fall. Thus the objects coincide at two fixed events, and they are each in free-fall continuously in between those two events. Nevertheless, we will see that the elapsed proper times for these two objects are not the same. Throughout this example, we will use dimensionless times and distances by dividing each quantity by the mass m in geometric units. For a circular orbit of radius r in Schwarzschild spacetime, Kepler's third law gives the proper time to complete n revolutions as

Applying the constant ratio of proper time to coordinate time for a circular orbit, we also have the coordinate time to complete n revolutions

For the radially moving object, the usual parametric cycloid relation gives the total proper time for the rise and fall

where the parameter α satisfies the relation

The total elapsed coordinate time for the radial object is

where

In order for the objects to coincide at the two events, the coordinate times must be equal, i.e., we must have Δtcirc = Δtradial. Therefore, replacing r with q(1+cos(α)) in the expression for the coordinate time in circular orbits, we find that for any given n and q (= R/2) the parameter α must satisfy

Once we’ve determined the value of α for a given q and n, we can then determine the ratio of the elapsed proper times for the two paths from the relation

With n = 1 and fairly small value of r the ratio of proper times behaves as shown below.

Not surprisingly, the ratio goes to infinity as r drops to 3, because the proper time for a circular orbit of radius 3m is zero. (Recall that the "r" in our equations signifies r/m in normal goemetrical units.) The α parameters and proper time ratios for some larger values of r with n = 1 are

tabulated below.

To determine the asymptotic behavior we can substitute 1/u for the variable q in the equation expressing the relation between q and α, and then expand into a series in u to give

Now for any given n let αn be defined such that

For large values of r the values of α will be quite close to αn because the ratio of proper times for the two free-falling clocks is close to 1. Thus we can put α = αn + dα in equation (3) and expand into a series in dα to give

To determine the asymptotic dα as a function of R and n we can put α = αn + dα in equation (4) and expand into a series in dα to give

where

For sufficiently large R the value of Bn is negligible, so we have

Inserting this into (6) and recalling that 2/R is essentially equal to [1+cos(αn)]/r since α is nearly equal to αn, we arrive at the result

where

So, for any given n, we can solve (5) for αn and substitute into the above equation to give kn, and then the ratio of proper times for two free-falling clocks, one moving radially from r to R and back to r while the other completes n circular orbits at radius r, is given (for any value of r much greater than the mass m of the gravitating body) by equation (7). The values of αn, kn, and R/r for several values of n are listed below.

As an example, consider a clock in a circular orbit at 360 miles above the Earth's surface. In this case the radius of the orbit is about (6.957)106 meters. Since the mass of the Earth in geometrical units is 0.00443 meters, we have the normalized radius r = (1.57053)109, and the total time of one orbit is approximately 5775 seconds (i.e., about 1.604 hours). In order for a radial trajectory to begin and end at this altitude and have the same elapsed coordinate time as one circular orbit at this altitude, the radial trajectory must extend up to R=(1.55)107 meters, which is about 5698 miles above the Earth's surface. Taking the value of k1 from the table, we have

and so the difference in elapsed proper times is given by

This is the amount by which the elapsed time on the radial (up-down) path would exceed the elapsed time on the circular path. 6.6 Ideal Clocks in Arbitrary Motion What is a clock? By a clock we understand any thing characterized by a phenomeon passing periodically through identical phases so that we must assume, by the principle of sufficient reason, that all that happens in a given period is identical with all that happens in any arbitrary period. Albert Einstein, 1910 In his 1905 paper on the electrodynamics of moving bodies, Einstein noted that the Lorentz transformation has a “peculiar consequence”, namely, the elapsed time on an ideal clock as it proceeds from one given event to another depends on the path followed by that clock between those two events. The maximum elapsed time between two given events (in flat spacetime) applies to a clock that proceeds inertially between those events, whereas clocks that have followed any other path will undergo a lesser elapsed time. He expressed this as follows If at the points A and B there are stationary clocks which, viewed in the resting system, are synchronous; and if the clock at A is moved with the velocity v along the line AB to B, then on its arrival at B the two clocks no longer synchronize, but the clock moved from A to B lags behind the other which has remained at B… It is at once apparent that this result still holds good if the clock moves from A to B in any polygonal line, and also when the points A and B coincide. If we assume that the result proved for a polygonal line is also valid for a continuously curved line, we obtain the theorem: If one of two synchronous clocks at A is moved in a closed curve with constant velocity until it returns to A… then the clock that moved runs slower than the one that remained at rest. Thus we conclude that a balance-clock at the equator must go more slowly… than a precisely similar clock situated at one of the poles under otherwise identical conditions. The qualifying words “under otherwise identical conditions”, as well as the context, make it clear that the clocks are to be situated at the same gravitational potential – which of course will not be the case if they are both located at sea level (because the Earth’s rotation causes it to bulge at the equator by just the amount necessary to cause clocks at

sea level to run at the same rate). This complication has sometimes caused people to claim that Einstein’s assertion about polar and equatorial clocks was in error, but at worst it just unnecessarily introduced an extraneous factor. A more serious point of criticism of the above passage was partially addressed by a footnote added by Sommerfeld to the 1913 re-printing of Einstein’s paper. This pertains to the term “balance clock”, about which Sommerfeld said “Not a pendulum clock, which is physically a system to which the earth belongs. This case had to be excluded.” This reinforces the point that we are to exclude any differential effects of the earth’s gravitation, but it leaves unanswered the deeper question of what precisely constitutes a suitable “clock” for purposes of quantifying the elapsed proper time along any path. Some critics have claimed that Einstein’s assertion about time dilation involves circular reasoning, arguing that if any particular clock (or physical process) should fail to conform to the assertion, it would simply be deemed an unsuitable clock. Of course, ultimately all physical assertions involve this kind of circularity of definition, but the value of an assertion and definition depends not on its truth but on its applicability. If no physical phenomena were found to conform to the definition of proper time, then the assertion would indeed be worthless, but experience shows that the advance of the quantum wave function of any physical system moving from the event with coordinates x,y,z,t (in terms of an inertial coordinate system) to the event x+dx, y+dy, z+dz, t+dt is invariably in proportion to dτ where

Nevertheless, it can be argued that Einstein was not in a position to know this in 1905, because observations of the decay rates of sub-atomic particles (for example) under conditions of extreme acceleration had not yet been made. Miller has commented that Einstein’s “extension to the case where [the clock’s] trajectory was a continuous curve was unwarranted in 1905, but perhaps he considered that this case could always be treated as the limiting case of a many-sided polygon”. It should be noted, though, that Einstein carefully prefaced this “extension” with the words “if we assume”, so he can hardly be accused of smuggling. Also, as many others have pointed out, this “assumption” (the so-called “clock hypothesis”) can simply be taken as the definition of an ideal clock, and we are quite justified in expecting any real system with a periodic process to conform to this definition provided the restoring forces involved in the process are much greater than the inertial forces due to the acceleration of the overall system. Whether this kind of mechanistic assessment can be applied to the decay rates of subatomic particles is less clear. If for some extreme acceleration the decay rates of subatomic particles were found to differ from dτ given by (1), would we conclude that the Minkowski structure of spacetime was falsified, or that we had reached a level of acceleration that affects the decay process? Presumably if (1) broke down at the same point for a wide variety of processes, we would interpret this as the failure of Lorentz covariance, but if various processes begin to violate (1) at different levels of acceleration, we would be more likely to interpret those violations as being characteristics of the respective processes.

From the rationalist point of view, proper time can be conceived as independent of acceleration precisely because we can sense acceleration and correct for its effect, just as we can sense and correct for temperature, pressure, humidity, and so on. In contrast, we cannot sense velocity in any intrinsic way, so a purely local intrinsic clock cannot be corrected for velocity. Our notion of true time seems to be based on the idea of a characteristic periodic process under standard reference conditions, and then any intrinsically sensible changes in conditions are abstracted away. But even this notion involves idealizations, because (for example) there do not appear to be any perfectly periodic isolated processes. An ordinary clock is not in exactly the same state after each cycle of the escapement mechanism, because the driving spring has slightly relaxed. We regard the clock as essentially periodic because of the demonstrated insensitivity of the periodic components to the secular changes in the non-periodic components. It’s possible to conceive of paradoxical “clocks”, such as a container of cooled gas, whose gradual increase in temperature (up to the ambient temperature) is used to indicate the passage of time. If we have two such containers, initially cooled to the same temperature, and then send one on a high speed journey in a spaceship with the same ambient temperature, we expect to find that the traveling container will be cooler than the stationary container when they are re-united. Furthermore, if the gas consisted of radioactive particles, we expect less decay in the gas in the traveling container. However, this applies only because we accelerated the gas molecules coherently. Another way of increasing the velocities of the molecules in a container is by apply a separate heat source. Obviously this has the effect of “speeding up” the time as indicated by the temperature rise, but it slows down the radio-active decay of those molecules. This is just a simple illustration of how the rate of progression of a macroscopic system toward thermodynamic equilibrium may be affected in the opposite sense as the rate of quantum decay of the elementary particles comprising that system. The key to maintaining a consistent proper time for macroscopic as well as microscopic processes seems to be coherent acceleration (work) as opposed to incoherent acceleration (heat). In the preceding sections we've looked at circular and radial free-falling paths in a spherically symmetrical gravitational field, but many circumstances involve more complicated paths, including acceleration. For example, suppose we place highly accurate cesium clocks in an airplane and fly it around in a circle with a 100 mile radius above an airport on the equator. Assume that for the duration of the experiment the Earth has uniform translational velocity and rotates once per 24 hours. In terms of an inertial coordinate system whose origin is at the center of the Earth, the coordinates of the plane are

where R is the radius of the Earth plus the height of the airplane above the Earth's surface, W is the Earth's rotational speed, r is the radius of the circular flight path, and w

is the airplane's angular speed. Differentiating these inertial coordinates with respect to the coordinate time t gives expressions for dx/dt, dy/dt, and dz/dt. Now, the proper time of the clock is given by the integral of dτ over its worldline. Neglecting (for the moment) the effect of the Earth's gravitational field, we have

so we can divide through by (dt)2 and take the square root to give

Therefore, if we let V and v denote the speeds RW and rw respectively, the elapsed proper time for the clock corresponding to T of inertial coordinate time is given exactly by the integral

Since all the dimensionless parameters V, v, rW are extremely small compared to 1, we can approximate the square root very closely using easily integrable expression

 1  u/2, which gives the

Subtracting the result from T gives the amount of dilation for the path in question. The result is

Only the first on the right is multiplied by T, so it represents the secular contributions to the time dilation, i.e., the parts that grow in proportion to the total elapsed time, whereas the two right hand terms are cyclical and don't accumulate as T increases. Not surprisingly, if we set v = r = 0 the amount of dilation is simply V2 T / 2, which is the dilation for the fixed point at the airplane's height above the equator, due entirely to the Earth's rotation. On the other hand, if we take the following values

we find that the clock fixed at a point on the equator runs slow by 100.69 nsec per 24 hours relative to our Earth-centered inertial coordinate system, whereas a clock going around in a circle of radius 100 miles at 600 mph would lose 134.99 nsec per 24 hours (neglecting the cyclical components). Another experiment that could be performed is to fly clocks completely around the Earth's equator in opposite directions, so the eastbound clock's flight speed (relative to the ground) would be added to the circumferential speed of the Earth's surface due to the Earth's rotation, whereas the westbound clock's flight speed would be subtracted. In this situation the spatial coordinates of the clocks in the equatorial plane would be given by

where we take the + sign for the eastbound plane and  for the westbound plane. This gives the derivatives

Substituting into equation (1) and simplifying gives

Multiplying through by dt and integrating from t = 0 to some arbitrary coordinate time Δt, we find that the corresponding lapse of proper time for the plane is

It follows that the lapse of time on the westbound clock by any coordinate time Δt will exceed the lapse of time on the eastbound clock by 2(Δt)Vv. To this point we have neglected the gravitational field of the Earth by assuming that the

metric of spacetime was the flat Minkowski metric. To account for the effects of gravity we should really use the Schwarzschild metric (assuming a spherical Earth). We saw in Section 6.4 that the metric in the equatorial plane of a spherical gravitating body of mass m at a constant Schwarzschild radial parameter r from the center of that body is

where τ is the proper time along the path, t is the coordinate time, and ϕ is the longitude. Dividing through by (dt)2 and taking the square root of both sides gives

Let R denote the "radius" of the Earth, and let r = R + h denote the radius of the airplane's flight path at the constant altitude h. If we again let V denote the tangential speed of the Earth's rotation at the airplane's radial position at the equator, and let v denote the tangential speed of the airplane (either eastward or westward), we have dϕ/dt = (Vv)/r, so the above equation leads to the integral for the elapsed proper time along a path in the equatorial plane at radial parameter r = R+h from the Earth's center and with a tangential speed (relative to a fixed position above a point on the Earth's surface) is

Again making use of the approximation  1  u/2 for small u, we can integrate this over some interval Δt of coordinate time to give the corresponding lapse Δτ of proper time along the path

Naturally this is the same as equation (2) except for the extra term -2m/r, which represents the effect of the gravitational field. The mass of the Earth in gravitational units is about m = 0.0044 meters = (1.4766)10-11 sec, and if the airplanes are flying at an altitude of h = 6 miles above the Earth's surface we have r = 3986 miles = 0.021031 sec. Also, assume the speed of the airplanes (relative to the ground) is v = 500 mph, which is v = (0.747)10-6 in dimensionless units, compared with the tangential speed of the Earth's surface at the equator V = (1.527)10-6. In these conditions the above formula gives the relation between coordinate time and elapsed proper time for a clock sitting stationary at the equator on the Earth's surface as

whereas for clocks flying at an altitude of 6 miles and 500 mph eastward and westward the relations are

This shows that the difference in radial location between the clock on the Earth's surface and the clock up at flight altitude results in a slowing of the Earthbound clock's proper time relative to the airplane clocks of about (2.073)10-12 seconds per second of coordinate time. On the other hand, the eastbound clock has a relative slowing (compared to the Earthbound clock) in the amount of (1.419)10-12 seconds per second due to is greater speed, so the net effect is that the eastbound clock's proper time runs ahead of the Earthbound clock by about (0.654)10-12 seconds per second of coordinate time. In contrast, the westbound clock is actually moving slower than the Earthbound clock (because it's flight speed counteracts the rotation of the Earth), so it gains an additional (0.862)10-12 seconds per second. The net effect is that the westbound clock's proper time runs ahead of the Earthbound clock by a total of (2.935)10-12 seconds per second of coordinate time. These effects are extremely small, but if an experiment is performed for an extended period of time the differences in elapsed time on highly accurate cesium clocks is large enough to be detectable. Since there are 86400 seconds in a day, we would expect to see the eastbound and westbound flying clocks in advance of the Earthbound clock by 57 nanoseconds and 254 nanoseconds respectively. Experiments of this type have actually been performed, and the results have agreed with the predictions of relativity. Notice that the "moving" clocks actually show greater lapses of proper time than the "stationary" clock, seeming to contradict special relativity, but the explanation (as we've seen) is that the gravitational effects of general relativity override the velocity effects in these particular circumstances. Suppose we return to our original problem, which involved airplanes flying in a small circle around a fixed point on the Earth's equator, but now we want to include the effects of the Earth's gravity. The principles are the same as in the circumnavigating case, i.e., we need only integrate the proper time along the path, making use of the Schwarzschild metric to give the correct line element. However, the path of the airplane in this case is not so easy to express in terms of the usual Schwarzschild polar coordinates. One way of approaching a problem such as this is to work with the Schwarzschild metric expressed in terms of "orthogonal" quasi-Minkowskian coordinates. If we split up the coefficient of (dr)2 into the form 1 + 2m/(r2m), then the usual Schwarzschild metric can be written as

Now if we define the quasi-Euclidean parameters

we recognize the last three terms of the preceding equation as just the expression of (dx)2 + (dy)2 + (dz)2 in polar coordinates. Also, since r = we have dr = (x dx + y dy + z dz) / r, so the Schwarzschild metric can be written in the quasiMinkowskian form

This form is similar to Riemann normal coordinates if we expand this metric about any radius r. Also, for sufficiently large r the quantity 2m in the denominator of the final term becomes negligible, and the coefficient approaches 2m/r3, so it isn't surprising that this is one of the characteristic magnitudes of the sectional curvature of Schwarzschild spacetime at radius r. Expanding the above expression, we find that the Schwarzschild metric can be expressed as a sum of the Minkowski metric plus some small quantities as shown below

Thus in matrix notation the Schwarzschild metric tensor for these coordinates is

where κ = 1 / [r2(1  2m/r)]. The determinant of this metric is -1. Dividing the preceding expression by (dt)2 and taking the square root of both sides, we arrive at a relation between dt and dt into which we can substitute the expressions for x,y,z, r, dx/dt, dy/dt, and dz/dt, and then integrate to give the proper time Δτ along the path as a function of coordinate time Δt. Hence if we know x,y, and z as explicit functions of t along a particular path, we can immediately write down the explicit integral for the lapse of

proper time along that path. 6.7 Gravitational Acceleration in Schwarzschild Coordinates If bodies, moved in any manner among themselves, are urged in the direction of parallel lines by equal accelerative forces, they will all continue to move among themselves, after the same manner as if they had not been urged by those forces. Isaac Newton, 1687 According to Newton's theory the acceleration of gravity of a test particle at a given radial distance from a large mass is independent of the particle’s state of motion. Consequently it would be impossible to tell, from the relative motions of a group of freefalling test particles in a small region of space, that those particles were subject to any force. Maxwell emphasized the same point when he wrote (in the posthumously published “Matter and Motion”) that acceleration is relative, because only the differences between the accelerations of bodies can be detected. Our whole progress up to this point may be described as a gradual development of the doctrine of relativity of all physical phenomena... There are no landmarks in space; one portion of space is exactly like every other portion, so that we cannot tell where we are. We are, as it were, on an unruffled sea, without stars, compass, soundings, wind, or tide, and we cannot tell in what direction we are going. We have no log which we can cast out to take a dead reckoning by; we may compute our rate of motion with respect to the neighbouring bodies, but we do not know how these bodies may be moving in space. We cannot even tell what force may be acting on us; we can only tell the difference between the force acting on one thing and that acting on another. Of course, he was here referring to forces (such as gravity) that are proportional to inertial mass, so that they impart equal accelerations to every body. As an example of a set of a localized set of bodies subjected to equal acceleration, he considered ordinary objects on the earth’s surface, all of which are subjected (along with the earth itself) to the sun’s gravitational force and the corresponding acceleration. He noted that if this were not the case, i.e., if the sun’s gravity attracted only the earth but not ordinary small objects on the earth’s surface, this would be easily detectable by (for instance) changes in the position of a plumb line between sunrise and sunset. Naturally these facts are closely related to the equivalence principle, but there are some subtle differences when we consider the accelerations of bodies due to gravity in the context of general relativity. We saw in Section 6.4 that the second derivative of r with respect to the proper time τ of the radially moving particle in general relativity is simply

and thus independent of the particle’s state of motion, just as with Newtonian gravity. However, the proper times of two (momentarily) coincident particles may differ depending on their states of motion, so when we consider the motions of such particles in terms of a common system of coordinates the result will not be so simple. The second derivative of the radial coordinate r with respect to the time coordinate t in terms of the usual Schwarzschild coordinates depends not only on the spacetime location of the particle (i.e., r and t) but also on the trajectory of the particle through that point. This is true even for particles with purely radial motion. To derive d2r/dt2 for purely radial motion, we can divide through equation (1) of Section 6.4 by (dt)2 to give

Solving for dr/dt gives

where τ is the proper time of the radially moving particle. We also have from Section 6.4 the relation

where κ is a constant parameter of the given trajectory, and λ is the path length parameter of the geodesic equations. We identify λ with the proper time τ by setting dτ/dλ = 1, so we can write

Substituting into (2), we have

and therefore the second derivative or r with respect to t is

In order to relate the parameter κ to a particular trajectory, we can substitute (3) into equation (1), giving

There are two cases to consider. First, if there is a radius r = R at which the test particle is stationary, meaning dr/dt = 0, then

In this case the magnitude of κ is always greater than 1. Inserting this into (4) gives

At the apogee of the trajectory, when r = R, this reduces to

as expected. If R is infinite, the coordinate acceleration reduces to

A plot of d2r/dt2 divided by m/r2 for various values of R is shown below.

Notice that the value of (d2r/dt2) / (-m/r2) is negative in the range from r = 2m to r = 6m/(1 + 4m/R), where d2r/dt2 changes from negative to positive. This signifies that the acceleration (in terms of the r and t coordinates) is actually outward in this range. In the second case there is no radius at which the trajectory is stationary, so the trajectory escapes to infinity, and the speed dr/dt asymptotically approaches a fixed value V in the limit as r goes to infinity. In this case equation (5) gives

so the magnitude of κ is less than 1. Inserting this into equation (4) gives

The case V = 0 corresponds to the case of R approaching infinity for the bound trajectories, and indeed we see that inserting V = 0 into this expression gives the same result as with R going to infinity in the acceleration equation for bound trajectories. At the other extreme, with V = 1, this equation reduces to

which is consistent with what we get for null (light-like) paths by setting dτ = 0 in the radial metric and the solving for dr/dt = ±(1 – 2m/r). A normalized plot of this acceleration for various values of V is shown below.

This shows that the acceleration d2r/dt2 in terms of the Schwarzschild coordinates r and t for a particle moving radially with ultimate speed V (either toward or away from the gravitating mass) is outward at all radii greater than 2m for all ultimate speeds greater than 0.577 times the speed of light. For light-like paths (V = 1), the magnitude of the acceleration approaches twice the magnitude of the Newtonian acceleration – and is outward instead of inward. The reason for this outward acceleration with respect to Schwarzschild coordinates is that the speed of light (in terms of these coordinates) is greater at greater radial distances from the mass. Notice that the two expressions for d2r/dt2 derived above, applicable to the cases when the kinetic energy of the test particle is or is not sufficient to escape to infinity, are the same if we stipulate that R and V are related according to

If R is greater than 2m, then V2 is negative so V is imaginary. Hence in this case we find it most convenient to use R. On the other hand, if R is negative, from 0 to negative infinity, the value of V2 is real in the range from 0 to 1, so in this case it is convenient to work with V. The remaining possibility (which has no counterpart in Newtonian gravity) is if R is between 0 and 2m, in which case V2 is not only positive, it is greater than 1. Thus the impossibility of having a speed greater than 1 corresponds to the impossibility of being motionless at a radius less than 2m.

Incidentally, for a bound particle we can give an alternative derivation of the r,t acceleration from the well-known cycloidal parametric relations between r and τ:

where R is the "top" of the orbit and θ is an angular parameter that ranges from 0 at the top of the orbit (r = R) to π at the bottom (r = 0). A plot of r versus τ can be drawn by tracing the motion of a point on the rim of a wheel as it rolls along a flat surface. (This same relation applies in Newtonian gravity if we replace τ with t.) Now, differentiating these parametric equations with respect to θ gives

Therefore we have

From the parametric equation for r we have

Denoting this quantity by "u", this implies that

Solving this for tan(θ /2) gives

We want θ = 0 at r = R so we choose the first root and substitute into the preceding equation for dr/dτ to give

In addition, we have the derivative of coordinate time with respect to proper time of the particle

(See Section 6.4 for a derivation of this relation from the basic geodesic equations.) Dividing dr/dτ by dt/dτ gives

Just as we did previously, we can now compute d2r/dt2 = [d(dr/dt)/dr][dr/dt], and we arrive at the same result as before. 6.8 Sources in Motion This means that the velocity of propagation [of gravity] is equal to that of light. It seems at first that this hypothesis ought to be rejected outright. Laplace showed in effect that the propagation is either instantaneous or much faster than that of light. However, Laplace examined the hypothesis of finite propagation velocity ceteris non mutatis; here, on the contrary, this hypothesis is conjoined with many others, and it may be that between them a more or less perfect compensation takes place. The application of the Lorentz transformation has already provided us with numerous examples of this. Poincare, 1905 The preceding sections focused on the spherically symmetrical solution of Einstein's field equations represented by the Schwarzschild solution, combined with the geodesic hypothesis. Most of the directly observable effects of general relativity can be modeled and evaluated on this basis, i.e., in terms of the solution of the “one-body problem”, a single gravitating body that can be regarded as stationary. Having solved the field equations for this single body, we then determine the paths of test particles in its vicinity, based on the assumption that those particles do not significantly affect the field, and that they follow geodesics in the field of the gravitating body. This is obviously a very simplified and idealized case, but it happens to be fairly representative of a small planet (e.g., Mercury) orbiting the Sun, or a light pulse grazing the Sun. From one point of view, the geodesic assumption seems quite natural and unobjectionable. After all, it merely asserts Newton’s first law of motion in each small region of spacetime. Any sufficiently small region is essentially flat, and if we assume that free objects move at constant speed in straight lines in flat spacetime, then overall they follow geodesics.

However, there are two reasons for possibly being dissatisfied with the geodesic assumption. First, just as with Newton’s law of inertia, the geodesic assumption can be regarded as giving a special privileged status to certain paths without a clear justification. Of course, in practice the principle of inertia has proven itself to be extremely robust, but in theory there has always been some epistemological uneasiness about the circularity in the definition of inertial paths. As Einstein commented, we say an object moves inertially if it is free of outside influences, but we infer that it is free of outside influences only by observing that it moves inertially. This concern can be answered, at least in part, by noting that inertia serves as an organizing principle, and its significance resides in the large number of disparate entities that can be coordinated simultaneously on the basis of this principle. The concept of (local) inertial coordinates would indeed be purely circular if it successfully reduced the motions of only a single body to a simple set of patterns (e.g., Newton’s laws), but when the same system of coordinates is found to reduce the motions of multiple (and seemingly independent) objects, we are justified in claiming that it has non-trivial physical significance. Nevertheless, one of Einstein’s objectives in developing the general theory was to eliminate the reliance on the principle of inertia, which is the principle of geodesic motion in curved spacetime. The second reason for dissatisfaction with the geodesic assumption is that all entities whose motions are of interest are not just passive inhabitants of the spacetime manifold, they are sources of gravitation in their own right (since all forms of mass and energy gravitate). This immediately raises the problem – also encountered in electrodynamics – of how to deal with the field produced by the moving entity itself. Moreover, unlike Maxwell’s equations of the electrodynamic field, the field equations of general relativity are non-linear, so we are not even justified in “subtracting out” the self-field of the moving object, because the result will not generally be a solution of the field equations. One possible way of addressing this problem would be to treat the moving objects as contributors to the stress-energy tensor Tµν in the field equations, in which case the vanishing of the covariant derivative (imposed by the field equations) implies that the objects follow geodesics. However, it isn’t clear, a priori, that this is a legitimate representation of matter. Einstein, for one, rejected this approach, saying that Tµν is merely “a formal condensation of all things whose comprehension in the sense of a field theory is still problematic”. Another approach is to treat particles of matter as isolated point-like pole singularities in the field – indeed this was the basis for a paper written by Einstein, Infeld, and Hoffman (EIH) in 1938, in which they argued that (at least when the field equations are integrated to some finite order of approximation, and assuming a weak field and low accelerations) such singularities can exist only if they propagate along geodesics in spacetime. At first sight this is a somewhat puzzling proposition, because geodesics are defined only on smooth manifolds, so it isn’t obvious how a singularity of a manifold can be said to propagate along a geodesic of that manifold. However, against the background of nearly Minkowskian spacetime, it’s possible to define a workable notion of the “position” of an isolated singularity (though not without some ambiguity). Even if we accept all these caveats, it’s odd that Einstein would pursue this approach, considering that he is usually identified with a disdain for singularities, declaring that they render a field theory invalid

– much like an inconsistency in a formal system. In fact, one of his favorite ideas was that we might achieve a complete physically viable field theory precisely by requiring the absence of singularities. Indeed the EIH paper shows that geodesic motion is an example of a physical effect that can be deduced on this basis. Einstein, et al, discovered that when the field equations are integrated in the presence of two specified point-like singularities in the field, a one-dimensional locus of singularity extending from one of the original points to the other ordinarily appears in the solution. There is, however, a special set of conditions on the motions of the two original point-like singularities such that no intervening singular locus appears, and it is precisely the conditions of geodesic motion. Thus EIH concluded that the field equations of general relativity, by themselves, without any separate “geodesic assumption” actually do require mass point singularities to follow geodesic paths. (Just as remarkably, it turns out that even the classical equations of motion are due entirely to the non-linearity of the field equations.) So, this is actually an example of how meaningful physics can come out of Einstein’s principle of “no-singularities”. Of course, the solution retains the two pointlike singularities, so one might question whether Einstein was being hypocritical in banning singularities in the rest of the manifold. In reply he wrote This objection would be justified if the equations of gravitation were to be considered as equations of the total field. But since this is not the case, one will have to say that the field of a material particle will differ the more from a pure gravitational field the closer one comes to the location of the particle. If one had the field equations of the total field, one would be compelled to demand that the particles themselves could be represented as solutions of the complete field equations that are free of irregularities everywhere. Only then would the general theory of relativity be a complete theory. This is clearly related to Einstein’s dissatisfaction with the dualistic nature of physics, being partly described by partial differential equations of the field, and partly by total differential equations of particles. His hope was that particle-like solutions would emerge from some suitable field theory, and one of the conditions he felt must be satisfied by any such complete field theory must be the complete absence of singularities. It’s easy to understand why Einstein felt the need for a “unified field theory” to encompass both gravity and electromagnetism, because in their present separate forms they are extremely incongruous. In the case of electrodynamics, the field equations are linear, and possess only a single gauge freedom, so the equations of motion must be introduced as an independent assumption. In contrast, general relativity suggests that the equations of motion of a field theory ought to be implied by the field equations themselves, which must therefore be non-linear. One of the limitations of Einstein’s work on the equations of motion was that it neglected the effect of radiation. This is usually considered to be legitimate provided the accelerations involved are not too great. Still, strictly speaking, accelerating masses ought to produce radiation. Indeed, this is necessary, even for slowly accelerated motions, in order to maintain strict momentum conservation along with the nearly complete absence

of aberration in the apparent direction of the “force” of gravity in the two-body problem (as noted by Laplace). But radiation reaction also causes acceleration, so it can be argued that any meaningful treatment of the problem of motion cannot neglect the effects of gravitational waves. Of course, the full field equations of general relativity possess solutions in which metrical disturbances propagate as waves, but such waves have not yet been directly observed. Hence they don't, at present, constitute part of the experimentally validated body of general relativity, but there is indirect empirical confirmation of gravitational waves in the apparent energy loss of certain binary star systems, most notably the Hulse-Taylor system, which consists of a neutron star and a pulsar orbiting each other every 8 hours. Careful observations indicate that the two stars are spiraling toward each other at a rate of 2.7 parts per billion each year, precisely consistent with the prediction of general relativity for the rate at which the system should be radiating energy in the form of gravitational waves. The agreement is very impressive, and subsequent observations of other binary star systems have provided similar indirect support for the existence of gravitational waves, although in some cases it is necessary to postulate other (unseen) bodies in the system in order to yield results consistent with general relativity. The experimental picture may change as a result of the LIGO project, which is an attempt to use extremely sensitive interferometry techniques to directly detect gravitational waves. Two separate facilities are being prepared in the states of Louisiana and Washington, and their readings will be combined to achieve a very large baseline. The facility in Washington state is over a mile long. If this effort is successful in detecting gravitational waves, it will be a stupendous event, possibly opening up a new "channel" for observing the universe. Of course, it's also possible that efforts to detect gravitational waves may yield inconclusive results, i.e., no waves may be definitely detected, but it may be unclear whether the test has been adequate to detect them even if they were present. If, on the other hand, the experimental efforts were to surprise us with an unambiguously null result (like the Michelson-Morley experiments), ruling out the presence of gravitational waves in a range where theory says they ought to be detectable, it could have serious implications for the field equations and/or the quadrupole solution. Oddly enough, Einstein became convinced for a short time in 1937 that gravity waves were impossible, but soon changed his mind again. As recently as 1980 there were disputes in scholarly publications as to the validity of the quadrupole solution. Part of the reason that people such as Einstein have occasionally doubted the reality of the wave solutions is that all gravitational waves imply a singularity (as does the Schwarzschild solution), albeit "merely" a coordinate singularity. Also, the phenomena of gravitational waves must be inherently non-linear, because it consists of gravity "acting on itself", and we know that gravity itself doesn't show up in the source terms of the field equations, but only in the non-linearity of the left-hand side of the field equations. The inherent non-linearity of gravitational waves makes them difficult to treat mathematically, because the classical wave solutions are based on linearized models, so it isn't easy to be sure the resulting "solutions" actually represent realistic solutions of the full non-linear field equations. Furthermore, there are no known physical situations that would produce any of the simple linearized plane wave situations that are usually discussed. For example, it is known that

there are no plane wave solutions to the non-linear field equations. There are cylindrical solutions, but unfortunately no plausible sources for infinite cylindrical solutions are known, so the physical significance of these solutions is unclear. It might seem as though there ought to be spherically symmetrical "pulsating" solutions that radiate gravitational waves, but this is not the case, as is clear from Birkhoff's proof that the Schwarzschild solution is the unique (up to transformation of coordinates) spherically symmetrical solution of the field equations, even without the "static" assumption. This is because, unlike the case of electromagnetism, the gravitational field is also the metric by which the field is measured, so coordinate transformations inherently represent more degrees of freedom than in Maxwell's equations, which have just a single "gage". As a result, there is no physically meaningful "dipole" source for gravitational waves in general relativity. The lowest-order solutions are necessarily given by quadrupole configurations. Needless to say, another major complication in the consideration of gravitational waves is the idea of "gravitons" arising from attempts to quantize the gravitational field by analogy with the quantization of the electromagnetic field. This moves us into a realm where the classical notions of a continuous spacetime manifold may not be sustainable. A great deal of effort has been put into understanding how the relativistic theory of gravity can be reconciled with quantum theory, but no satisfactory synthesis has emerged. Regardless of future developments, it seems safe to say that the results associated with the large-scale Schwarzschild metric and geodesic hypothesis would not be threatened by quantization of the field equations. Nevertheless, this shows how important the subject of gravitational waves is for any attempt to integrate the results of general relativity into quantum mechanics (or vice versa, as Einstein might have hoped). This is one reason the experimental results are awaited with such interest. Closely related to the subject of gravitational waves is the question of how rapidly the "ordinary" effects of gravity "propagate". It's not too surprising that early investigations of the gravitational field led to the notion of instantaneous action at a distance, because it is an empirical fact that the gravitational acceleration of a small body orbiting at a distance r from a gravitating source points, at each instant, very precisely toward the position of the source at that instant, not (as we might naively expect) toward the location of the source at a time r/c earlier. (When we refer to "instants" in this section, we mean with respect to the inertial rest coordinates of the center of mass of the orbital system.) To gain a clear understanding of the reason for the absence of gravitational "aberration" in these circumstances, it's useful to recall some fundamentals of the phase relations between dynamically coupled variables. One of the simplest representations of dynamic coupling between two variables x and y is the "lead-lag" transfer function, which is based on the ordinary first-order differential equation

where a0, a1, b0, and b1 are constants. This coupling is symmetrical, so there is no implicit

directionality, i.e., we aren't required to regard either x or y as the independent variable and the other as the dependent variable. However, in most applications we are given one of these variables as a function of time, and we use the relation to infer the response of the other variable. To assess the "frequency response" of this transfer function we suppose that the x variable is given by a pure sinusoidal function x(t) = Asin(ωt) for some constants A and w. Eventually the y variable will fall into an oscillating response, which we presume is also sinusoidal of the same frequency, although the amplitude and phase may be different. Thus we seek a solution of the form y(t) = Bsin(ωt  θ) for some constants B and θ. If we define the "time lag" tL of the transfer function as the phase lag θ divided by the angular frequency ω, it follows that the time lag is given by

For sufficiently small angular frequencies the input function and the output response both approach simple linear "ramps", and since invtan(z) goes to z as z approaches zero, we see that the time lag goes to

The ratios a1/a0 and b1/b0 are often called, respectively, the lag and lead time constants of the transfer function, so the "time lag" of the response to a steady ramp input equals the lag time constant minus the lead time constant. Notice that it is perfectly possible for the lead time constant to be greater than the lag time constant, in which case the "time lag" of the transfer function is negative. In general, for any frequency input (not just linear ramps), the phase lag is negative if b1/b0 exceeds a1/a0. Despite the appearance, this does not imply that the transfer function somehow reads the future, nor than the input signal is traveling backwards in time (or is instantaneous in the case of a symmetrical coupling). The reason the output appears to anticipate the input is simply that the forcing function (the right hand side of the original transfer function) contains not only the input signal x(t) but also its derivative dx/dt (assuming b1 is non-zero), whose phase is π/2 ahead. (Recall that the derivative of the sine is the cosine.) Hence a linear combination of x and its derivative yields a net forcing function with an advanced phase. Thus the effective forcing function at any given instant does not reflect the future of x, it represents the current x and the current dx/dt. It just so happens that if the sinusoidal wave pattern continues unchanged, the value of x will subsequently progress through the phase that was "predicted" by the combination of the previous x and dx/dt signals, making it appear as though the output predicted the input. However, if the x signal abruptly changes the pattern at some instant, the change will not be foreseen by the output. Any such change will only reach the output after it has appeared at the input and

worked its way through the transfer function. One way of thinking about this is to remember that the basic transfer function is directionally symmetrical, and the "output signal" y(t) could just as well be regarded as the input signal, driving the "response" of x(t) and its derivative. We sometimes refer to "numerator dynamics" as the cause of negative time lags, because the b1 coefficient appears in the numerator of the basic dynamic relationship when represented as a transfer function with x(t) as an independent "input" signal. The ability of symmetrical dynamic relations to extrapolate periodic input oscillations so that the output has the same phase as (or may even lead) the input accounts for many interesting effects in physics. For example, in electrodynamics the electrostatic force exerted on a uniformly moving test particle by a "stationary" charge always points directly toward the source, because the field is spherically symmetrical about the source. However, since the test particle is moving uniformly we can also regard it as "stationary", in which case the source charge is moving uniformly. Nevertheless, the force exerted on the test particle always points directly toward the source at the present instant. This may seem surprising at first, because we know changes in the field propagate at the speed of light, rather than instantaneously. How does the test particle "know" where the source is at the present instant, if it can only be influenced by the source at some finite time in the past, allowing for the finite speed of propagation of the field? The answer, again, is numerator dynamics. The electromagnetic force function depends not only on the source's relative position, but also on the derivative of the position (i.e., the velocity). The net effect is to cancel out any phase shift, but of course this applies only as long as the source and the test particle continue to move uniformly. If either of them is accelerated, the "knowledge" of this propagates from one to the other at the speed of light. An even more impressive example of the phase-lag cancellation effects of numerator dynamics involves the "force of gravity" on a massive test particle orbiting a much more massive source of gravity, such as the Earth orbiting the Sun. In the case of Einstein's gravitational field equations the "numerator dynamics" cancel out not only the first-order phase effects (like the uniform velocity effect in electromagnetism) but also the secondorder phase effects, so that the "force of gravity" on an orbiting points directly at the gravitating source at the present instant, even though the source (e.g., the Sun) is actually undergoing non-uniform motion. In the two-body problem, both objects actually orbit around the common center of mass, so the Sun (for example) actually proceeds in a circle, but the "force of gravity" exerted on the Earth effectively anticipates this motion. The reason the phase cancellation extends one order higher for gravity than for electromagnetism is the same reason that Maxwell's equations predict dipole waves, whereas Einstein's equations only support quadrupole (or higher) waves. Waves will necessarily appear in the same order at which phase cancellation no longer applies. For electrically charged particles we can generate waves by any kind of acceleration, but this is because electromagnetism exists within the spacetime metric provided by the field equations. In contrast, we can't produce gravitational waves by the simplest kind of "acceleration" of a mass, because there is no background reference to unambiguously define dipole acceleration. The Einstein field equations have an extra degree of freedom

(so to speak) that prevents simple dipole acceleration from having any "traction". It is necessary to apply quadrupole acceleration, so that the two dipoles can act on each other to yield a propagating effect. In view of this, we expect that a two-body system such as the Sun and the Earth, which essentially produces no gravitational radiation (according to general relativity) should have numerator dynamic effects in the gravitational field that give nearly perfect phaselag cancellation, and therefore the Earth's gravitational acceleration should always point directly toward the Sun's position at the present instant, rather than (say) the Sun's position eight minutes ago. Of course, if something outside this two-body system (such as a passing star) were to upset the Sun's pattern of motion, the effect of such a disturbance would propagate at the speed of light. The important point to realize is that the fact that the Earth's gravitational acceleration always points directly at the Sun's present position does not imply that the "force of gravity" is transmitted instantaneously. It merely implies that there are velocity and acceleration terms in the transfer function (i.e., numerator dynamics) that effectively cancel out the phase lag in a simple periodic pattern of motion. 7.1 Is the Universe Closed? The unboundedness of space has a greater empirical certainty than any experience of the external world, but its infinitude does not in any way follow from this; quite the contrary. Space would necessarily be finite if one assumed independence of bodies from position, and thus ascribed to it a constant curvature, as long as this curvature had ever so small a positive value. B. Riemann, 1854 Very soon after arriving at the final form of the field equations, Einstein began to consider their implications with regard to the overall structure of the universe. His 1917 paper presented a simple model of a closed spherical universe which "from the standpoint of the general theory of relativity lies nearest at hand". In order to arrive at a quasi-static distribution of matter he found it necessary to introduce the "cosmological term" to the field equations (as discussed in Section 5.8), so he based his analysis on the equations

where λ is the cosmological constant. Before invoking the field equations we can consider the general form of a metric that is suitable for representing the large-scale structure of the universe. First, we ordinarily assume that the universe would appear to be more or less the same when viewed from the rest frame of any galaxy, anywhere in the universe (at the present epoch). This is sometimes called the Cosmological Principle. Then, since the universe on a large scale appears (to us) highly homogenous and

isotropic, we infer that these symmetries apply to every region of space. This greatly restricts the class of possible metrics. In addition, we can choose, for each region of space, to make the time coordinate coincide with the proper time of the typical galaxy in that region. Also, according to the Cosmological Principle, the coefficients of the spatial terms of the (diagonalized) metric should be independent of location, and any dependence on the time coordinate must apply symmetrically to all the space coordinates. From this we can infer a metric of the form

where S(t) is some (still to be determined) function with units of distance, and dσ is the total space differential. Recall that for a perfectly flat Euclidean space the differential line element is

where r2 = x2 + y2 + z2. If we want to allow our space (at a given coordinate time t) to have curvature, the Cosmological Principle suggests that the (large scale) curvature should be the same everywhere and in every direction. In other words, the Gaussian curvature of every two-dimensional tangent subspace has the same value at every point. Now suppose we embed a Euclidean three-dimensional space (x,y,z) in a fourdimensional space (w,x,y,z) whose metric is

where k is a fixed constant equal to either +1 or -1. If k = +1 the four-dimensional space is Euclidean, whereas if k = -1 it is pseudo-Euclidean (like the Minkowski metric). In either case the four-dimensional space is "flat", i.e., has zero Riemannian curvature. Now suppose we consider a three-dimensional subspace comprising a sphere (or pseudosphere), i.e., the locus of points satisfying the condition

From this we have w2 = (1  r2)/k = k  kr2, and therefore

Substituting this into the four-dimensional line element above gives the metric for the

three-dimensional sphere (or pseudo-sphere)

Taking this as the spatial part of our overall spacetime metric (2) that satisfies the Cosmological Principle, we arrive at

This metric, with k = +1 and R(t) = constant, was the basis of Einstein's 1917 paper, and it was subsequently studied by Alexander Friedmann in 1922 with both possible signs of k and with variable R(t). The general form was re-discovered by Robertson and Walker (independently) in 1935, so it is now often referred to as the Robertson-Walker metric. Notice that with k = +1 this metric essentially corresponds to polar coordinates on the "surface" of a sphere projected onto the "equatorial plane", so each value of r corresponds to two points, one in the Northern and one in the Southern hemisphere. We could remedy this by making the change of variable r  r/(1 + 3kr2), which (in the case k = +1) amounts to stereographic projection from the North pole to a tangent plane at the South pole. In terms of this transformed radial variable the Robertson-Walker metric has the form

As noted above, Einstein originally assumed R(t) = constant, i.e., he envisioned a static un-changing universe. He also assumed the matter in the universe was roughly "stationary" at each point with respect to these cosmological coordinates, so the only nonzero component of the stress-energy tensor in these coordinates is Ttt = ρ where ρ is the density of matter (assumed to be uniform, in accord with the Cosmological Principle). On this basis, the field equations imply

Here the symbol R denotes the assumed constant value of R(t) (not to be confused with the Ricci curvature scalar). This explains why Einstein was originally led to introduce a non-zero cosmological constant λ, because if we assume a static universe and the Cosmological Principle, the field equations of general relativity can only be satisfied if the density ρ is proportional to the cosmological constant. However, it was soon pointed out that this static model is unstable, so it is apriori unlikely to correspond to the physical universe. Moreover, astronomical observations subsequently indicated that the universe (on the largest observable scale) is actually expanding, so we shouldn't restrict ourselves

to models with R(t) = constant. If we allow R(t) to be variable, then the original field equations, without the cosmological term (i.e., with λ = 0), do have solutions. In view of this, Einstein decided the cosmological term was unnecessary and should be excluded. Interestingly, George Gamow was working with Friedmann in Russia in the early 1920's, and he later recalled that "Friedmann noticed that Einstein had made a mistake in his alleged proof that the universe must necessarily be stable". Specifically, Einstein had divided through an equation by a certain quantity, even though that quantity was zero under a certain set of conditions. As Gamow notes, "it is well known to students of high school algebra" that division by zero is not valid. Friedmann realized that this error invalidated Einstein's argument against the possibility of a dynamic universe, and indeed under the condition that the quantity in question vanishes, it is possible to satisfy the field equations with a dynamic model, i.e., with a model of the form given by the RobertsonWalker metric with R(t) variable. It's worth noting that Einstein's 1917 paper did not actually contain any alleged proof that the universe must be static, but it did suggest that a non-zero cosmological constant required a non-zero density of matter. Shortly after Einstein's paper appeared, de Sitter gave a counter-example (see Section 7.6), i.e., he described a model universe that had a non-zero λ but zero matter density. However, unlike Einstein's model, it was not static. Einstein objected strenuously to de Sitter's model, because it showed that the field equations allowed inertia to exist in an empty universe, which Einstein viewed as "inertia relative to space", and he still harbored hopes that general relativity would fulfill Mach's idea that inertia should only be possible in relation to other masses. It was during the course of this debate that (presumably) Einstein advanced his "alleged proof" of the impossibility of dynamic models (with the errant division by zero?). However, before long Einstein withdrew his objection, realizing that his argument was flawed. Years later he recalled the sequence of events in a discussion with Gamow, and made the famous remark that it had been the biggest blunder of his life. This is usually interpreted to mean that he regretted ever considering a cosmological term (which seems to have been the case), but it could also be referring to his erroneous argument against de Sitter's idea of a dynamic universe, and his unfortunate "division by zero". In any case, the Friedmann universes (with and without cosmological constant) became the "standard model" for cosmologies. If k = +1 the manifold represented by the Robertson-Walker metric is a finite spherical space, so it is called "closed". If k = 0 or 1 the metric is typically interpreted as representing an infinite space, so it is called "open". However, it's worth noting that this need not be the case, because the metric gives only local attributes of the manifold; it does not tell us the overall global topology. For example, we discuss in Section 7.4 a manifold that is everywhere locally flat, but that is closed cylindrically. This shows that when we identify "open" (infinite) and "closed" (finite) universes with the cases k = -1 and k = +1 respectively, we are actually assuming the "maximal topology" for the given metric in each case. Based on the Robertson-Walker metric (3), we can compute the components of the Ricci tensor and scalar and substitute these along with the simple uniform stress-energy tensor into the field equations (1) to give the conditions on the scale function R = R(t):

where dots signify derivatives with respect to t. As expected, if R(t) is constant, these equations reduce to the ones that appeared in Einstein's original 1917 paper, whereas with variable R(t) we have a much wider range of possible solutions. It may not be obvious that these two equations have a simultaneous solution, but notice that if we multiply the first condition through by R(t)3 and differentiate with respect to t, we get

The left-hand side is equal to times the left-hand side of the second condition, which equals zero, so the right hand side must also vanish, i.e., the derivative of (8π/3)Gρ R(t)3 must equal zero. This implies that there is a constant C such that

With this stipulation, the two conditions are redundant, i.e., a solution of one is guaranteed to be a solution of the other. Substituting for (8π/3)Gρ in the first condition and multiplying through by R(t)3, we arrive at the basic differential equation for the scale parameter of a Friedmann universe

Incidentally, if we multiply through by R(t), differentiate with respect to t, divide through by

, and differentiate again, the constants k and C drop out, and we arrive at

With λ = 0 this is identical to the gravitational separation equation (2) in Section 4.2, showing that the cosmological scale parameter R(t) is yet another example of a naturally occurring spatial separation that satisfies this differential equation. It follows that the admissible functions R(t) (with λ = 0) are formally identical to the gravitational free-fall solutions described in Section 4.3. Solving equation (4) (with λ = 0) for

and

switching to normalized coordinates T = t/C and X = R/C, we get

Accordingly as k equals -1, 0, or +1, integration of this equation gives

A plot of these three solutions is shown below.

In all three cases with λ = 0, the expansion of the universe is slowing down, albeit only slightly for the case k = -1. However, if we allow a non-zero cosmological constant λ, there is a much greater variety of possible solutions to Friedmann's equation (2), including solutions in which the expansion of the universe is actually accelerating exponentially. Based on the cosmic scale parameter R and its derivatives, the three observable parameters traditionally used to characterize a particular solution are

In terms of these parameters, the constants appearing in the Friedmann equation (4) can be expressed as

In principle if astronomers could determine the values of H, q, and σ with enough precision, we could decide on empirical grounds the sign of k, and whether or not λ is zero. Thus, assuming the maximal topologies (and the large-scale validity of general relativity), we could determine whether the universe is open or closed, and whether it will expand forever or eventually re-contract. Unfortunately, none of the parameters is known with enough precision to distinguish between these possibilities. One source of uncertainty is in our estimates of the mass density ρ of the universe. Given the best current models of star masses, and the best optical counts of stars in galaxies, and the apparent density of galaxies, we estimate an overall mass density that is only a small fraction of what would be required to make k = 0. However, there are reasons to believe that much (perhaps most) of the matter in the universe is not luminous. (For example, the observed rotation of individual galaxies indicates that they ought to fly apart unless there is substantially more mass in them than is visible to us.) This has led physicists and astronomers to search for the "missing mass" in various forms. Another source of uncertainty is in the values of R and its derivatives. For example, in its relatively brief history, Hubble's constant has undergone revisions of an order of magnitude, both upwards and downwards. In recent years the Hubble space telescope and several modern observatories on Earth seem to have found strong evidence that the expansion of the universe is actually accelerating. If so, then it could be accounted for in the context of general relativity only by a non-zero cosmological constant λ (on a related question, see Section 7.6), with the implication that the universe is infinite and will expand forever (at an accelerating rate). Nevertheless, the idea of a closed finite universe is still of interest, partly because of the historical role it played in Einstein's thought, but also because it remains (arguably) the model most compatible with the spirit of general relativity. In an address to the Berlin Academy of Sciences in 1921, Einstein said I must not fail to mention that a theoretical argument can be adduced in favor of the hypothesis of a finite universe. The general theory of relativity teaches that the inertia of a given body is greater as there are more ponderable masses in

proximity to it; thus it seems very natural to reduce the total effect of inertia of a body to action and reaction between it and the other bodies in the universe... From the general theory of relativity it can be deduced that this total reduction of inertia to reciprocal action between masses - as required by E. Mach, for example - is possible only if the universe is spatially finite. On many physicists and astronomers this argument makes no impression... This is consistent with the approach taken in Einstein's 1917 paper. Shortly thereafter he presented (in "The Meaning of Relativity", 1922) the following three arguments against the conception of infinite space, and for the conception of a bounded, or closed, universe: (1) From the standpoint of the theory of relativity, to postulate a closed universe is very much simpler than to postulate the corresponding boundary condition at infinity of the quasi-Euclidean structure of the universe. (2) The idea that Mach expressed, that inertia depends on the mutual attraction of bodies, is contained, to a first approximation, in the equations of the theory of relativity; it follows from these equations that inertia depends, at least in part, upon mutual actions between masses. Thereby Mach's idea gains in probability, as it is an unsatisfactory assumption to make that inertia depends in part upon mutual actions, and in part upon an independent property of space. But this idea of Mach's corresponds only to a finite universe, bounded in space, and not to a quasi-Euclidean, infinite universe. From the standpoint of epistemology it is more satisfying to have the mechanical properties of space completely determined by matter, and this is the case only in a closed universe. (3) An infinite universe is possible only if the mean density of matter in the universe vanishes. Although such an assumption is logically possible, it is less probable than the assumption of a finite mean density of matter in the universe. Along these same lines, Misner, Thorne, and Wheeler ("Gravitation") comment that general relativity "demands closure of the geometry in space as a boundary condition on the initial-value equations if they are to yield a well-determined and unique 4-geometry." Interestingly, when they quote Einstein's reasons in favor of a closed universe they omit the third without comment, although it reappears (with a caveat) in the subsequent "Inertia and Gravitation" of Ciufolini and Wheeler. As we've seen, Einstein was initially under the mistaken impression that the only cosmological solution of the field equations are those with

where R is the radius of the universe, ρ is the mean density of matter, and κ is the gravitational constant. This much is consistent with modern treatments, which agree that at any given epoch in a Friedmann universe with constant non-negative curvature the

radius is inversely proportional to the square root of the mean density. On the basis of (5) Einstein continued If the universe is quasi-Euclidean, and its radius of curvature therefore infinite, then ρ would vanish. But it is improbable that the mean density of matter in the universe is actually zero; this is our third argument against the assumption that the universe is quasi-Euclidean. However, in the 2nd edition of "The Meaning of Relativity" (1945), he added an appendix, "essentially nothing but an exposition of Friedmann's idea", i.e., the idea that "one can reconcile an everywhere finite density of matter with the original form of the equations of gravity [without the cosmological term] if one admits the time variability of the metric distances...". In this appendix he acknowledged that in a dynamic model, as described above, it is perfectly possible to have an infinite universe with positive density of matter, provided that k = -1. It's clear that Einstein originally had not seriously considered the possibility of a universe with positive mass density but overall negative curvature. In the first edition, whenever he mentioned the possibility of an infinite universe he referred to the space as "quasi-Euclidean", which I take to mean "essentially flat". He regarded this open infinite space as just a limiting case of a closed spherical universe with infinite radius. He simply did not entertain the possibility of a hyperbolic (k = -1) universe. (It's interesting that Riemann, too, excluded spaces of negative curvature from his 1854 lecture, without justification.) His basic objection was evidently that a spacetime with negative curvature possess an inherent structure independent of the matter it contains, and he was unable to conceive of any physical source of negative curvature. This typically entails "ad hoc" boundary conditions at infinity is precisely what's required in an open universe, which Einstein regarded as contrary to the spirit of relativity. At the end of the appendix in the 2nd edition, Einstein conceded that it comes down to an empirical question. If (8π/3)Gρ is greater than H2, then the universe is closed and spherical; otherwise it is open and flat or pseudospherical (hyperbolic). He also makes the interesting remark that although we might possibly prove the universe is spherical, "it is hardly imaginable that one could prove it to be pseudospherical". His reasoning is that in order to prove the universe is spherical, we need only identify enough matter so that (8 π/3)Gρ exceeds H2, whereas if our current estimate of ρ is less than this threshold, it will always be possible that there is still more "missing matter" that we have not yet identified. Of course, at this stage Einstein was assuming a zero cosmological constant, so it may not have occurred to him that it might someday be possible to determine empirically that the expansion of the universe is accelerating, thereby automatically proving that the universe is open. Ultimately, was there any merit in Einstein's skepticism toward the idea of an "open" universe? Even setting aside his third argument, the first two still carry some weight with some people, especially those who are sympathetic to Mach's ideas regarding the relational origin of inertia. In an open universe we must accept the fact that there are multiple, physically distinct, solutions compatible with a given distribution of matter and

energy. In such a universe the "background" inertial field can in no way be associated with the matter and energy content of the universe. From this standpoint, general relativity can never gives an unambiguous answer to the twins paradox (for example), because the proper time integral over a given path from A to B depends on the inertial field, and in an open universe this field cannot be inferred from the distribution of massenergy. It is determined primarily by whatever absolute boundary conditions we choose to impose, independent of the distribution of mass-energy. Einstein believed that such boundary conditions were inherently non-relativistic, because they require us to single out a specific frame of reference - essentially Newton's absolute space. (In later years a great deal of work has been done in attempting to develop boundary conditions "at infinity" that do not single out a particular frame. This is discussed further in Section 7.7.) The only alternative (in an open universe) that Einstein could see in 1917 was for the metric to degenerate far from matter in such a way that inertia vanishes, i.e., we would require that the metric at infinity go to something like

Such a boundary condition would be the same with respect to any frame of reference, so it wouldn't single out any specific frame as the absolute inertial frame of the universe. Einstein pursued this approach for a long time, but finally abandoned it because it evidently implies that the outermost shell of stars must exist in a metric very different from ours, and as a consequence we should observe their spectral signatures to be significantly shifted. (At the time there was no evidence of any "cosmological shift" in the spectra of the most distant stars. We can only speculate how Einstein would have reacted to the discovery of quasars, the most distant objects known, which are in fact characterized by extreme redshifts and apparently extraordinary energies.) The remaining option that Einstein considered for an open asymptotically flat universe is to require that, for a suitable choice of the system of reference, the metric must go to

at infinity. However, this explicitly singles out one particular frame of reference as the absolute inertial frame of the universe, which, as Einstein said, "is contrary to the spirit of the relativity principle". This was the basis of his early view that general relativity is most compatible with a closed unbounded universe. The recent astronomical findings

that seem to indicate an accelerating expansion have caused most scientists to abandon closed models, but there seems to be some lack of appreciation for the damage an open universe does to the epistemological strength of general relativity. As Einstein wrote in 1945, "the introduction of [the cosmological constant] constitutes a complication of the theory, which seriously reduces its logical simplicity". Of course, in both an open and a closed universe there must be boundary and/or initial conditions, but the question is whether the distribution of mass-energy by itself is adequate to define the field, or whether independent boundary conditions are necessary to pin down the field. In a closed universe the "boundary conditions" can be more directly identified with the distribution of mass-energy, whereas in an open universe they are necessarily quite independent. Thus a closed universe can claim to satisfy Mach's principle at least to some degree, whereas an open universe definitely can't. The seriousness of this depends on how seriously we take Mach's principle. Since we can just as well regard a field as a palpable constituent of the universe, and since the metric of spacetime itself is a field in general relativity, it can be argued that Mach's dualistic view is no longer relevant. However, the second issue is whether even the specification of the distribution of mass-energy plus boundary conditions at infinity yields a unique solution. For Maxwell's equations (which are linear) it does, but for Einstein's equations (which are non-linear) it doesn't. This is perhaps what Misner, et al, are referring to when they comment that "Einstein's theory...demands closure of the geometry in space ... as a boundary condition on the initial value equations if they are to yield a well-determined (and, we now know, a unique) 4-geometry". In view of this, we might propose the somewhat outlandish argument that the (apparent) uniqueness of metrical field supports the idea of a closed universe - at least within the context of general relativity. To put it more explicitly, if we believe the structure of the universe is governed by general relativity, and that the structure is determinate, then the universe must be closed. If the universe is not closed, then general relativity must be incomplete in the sense that there must be something other than general relativity determining which of the possible structures actually exists. Admittedly, completeness in this sense is a very ambitious goal for any theory, but it's interesting to recall the famous "EPR" paper in which Einstein criticized quantum mechanics on the grounds that it could not be a complete description of nature. He may well have had this on his mind when he pointed out how seriously the introduction of a cosmological constant undermines the logical simplicity of general relativity, which was always his criterion for evaluating the merit of any scientific theory. We can see him wrestling with this issue, even in his 1917 paper, where he notes that some people (such as de Sitter) have argued that we have no need to consider boundary conditions at infinity, because we can simple specify the metric at the spatial limit of the domain under consideration, just as we arbitrarily (or empirically) specify the inertial frames when working in Newtonian mechanics. But this clearly reduces general relativity to a rather weak theory that must be augmented by other principles and/or considerable amounts of arbitrary information in order to yield determinate results. Not surprisingly, Einstein was unenthusiastic about this alternative. As he said, "such a

complete resignation in this fundamental question is for me a difficult thing. I should not make up my mind to it until every effort to make headway toward a satisfactory view had proved to be in vain". 7.2 The Formation and Growth of Black Holes It is a light thing for the shadow to go down ten degrees: nay, but let the shadow return backward ten degrees. 2 Kings 20 One of the most common questions about black holes is how they can even exist if it takes infinitely long (from the perspective of an outside observer) for anything to reach the event horizon. The usual response to this question is to explain that although the Schwarzschild coordinates are ill-behaved at the event horizon, the intrinsic structure of spacetime itself is well-behaved in that region, and an infalling object passes through the event horizon in finite proper time of the object. This is certainly an accurate description of the Schwarzschild structure, but it doesn't fully address the question, which can be summarized in terms of the following two seemingly contradictory facts: (1) An event horizon can grow in finite coordinate time only if the mass contained inside the horizon increases in finite coordinate time. (2) According to the Schwarzschild metric, nothing crosses the event horizon in finite coordinate time. Item (1) is a consequence of the fact that, as in Newtonian gravity, the field contributed by a (static) spherical shell on its interior is zero, so an event horizon can't be expanded by accumulating mass on its exterior. Nevertheless, if mass accumulates near the exterior of a black hole's event horizon the gravitational radius of the combined system must eventually (in finite coordinate time) increase far enough to encompass the accumulated mass, leading unavoidably to the conclusion that matter from the outside must reach the interior in finite coordinate time, which seems to directly conflict with Item 2 (and certainly seems inconsistent with the "frozen star" interpretation). To resolve this apparent paradox requires a careful examination of the definition of a black hole, and this leads directly to several interesting results, such as the fact that if two black holes merge, then their event horizons are contiguous, and have been so since they were formed. The matter content of a black hole is increased when it combines with another black hole, but in such a case we obviously aren't dealing with a simple "one-body problem", so the spherically symmetrical Schwarzschild solution is not applicable. Lacking an exact solution of the field equations for the two-body problem, we can at least get a qualitative idea of the process by examining the "trousers" topology shown below:

As we progress through the sequence of external time slices the first event horizon appears at A, then another appears at B, then at C, and then A and B merge together. The "surfaces" of the trousers represent future null infinity (I+) of the external region, consistent with the definition of black holes as regions of spacetime that are not in the causal past of future null infinity. (If the universe is closed, the "ceiling" from which these "stalactites" descend is at some finite height, and our future boundary is really just a single surface. In such a universe these protrusions of future infinity are not true "event horizons", making it difficult to give a precise definition of a black hole. In this discussion we assume an infinite open universe.) The "interior" regions enclosed by these surfaces are, in a sense, beyond the infinite future of our region of spacetime. If we regard a small test object as a point particle with zero radius then it's actually a black hole too, and the process of "falling in" to a "macro" black hole would simply be the trousers operation of merging the two I+ surfaces together, just like the merging of two macro black holes. On this basis the same interpretation would apply to the original formation of a macro black hole, by the coalescing of the I+ surfaces represented by the individual particles of the original collapsing star. Thus, we can completely avoid the "paradox" of black hole formation by considering all particles of matter to already be black holes. According to this view, it makes no sense to talk about the "interior" of a black hole, any more than it makes sense to talk about what's "outside" the universe, because the surface of a black hole is a boundary (future null infinity) of the universe. Unfortunately, it isn't at all clear that small particles of matter can be regarded as black holes surrounded by their own microscopic event horizons, so the "trousers" approach may not be directly applicable to the accumulation of small particles of "naked matter" (i.e., matter not surrounded by an event horizon). We'd like an explanation for the absorption of matter into a black hole that doesn't rely on this somewhat peculiar model of matter. To reconcile the Schwarszchild solution with the apparent paradox presented by items (1) and (2) above, it's worthwhile to recall from Chapter 6.4 what a radial freefall path really looks like in simple Schwarzschild geometry. The test particle starts at radius r = 10m and t = 0. The purple curve represents the radius vs. the particle's proper time, showing a simple well-behaved cycloidal path right down to r = 0, whereas the green curve represents the particle's radius vs. Schwarzschild coordinate time. The latter shows that the infalling object traverses through infinite coordinate time in order to reach

the event horizon, and then traverses back through coordinate time until reaching r = 0 (in the interior) in a net coordinate time that is not too different from the elapsed proper time. In other words, the object goes infinitely far into the "future" (of coordinate time), and then infinitely far back to the "present" (also in coordinate time), and since these two segments must always occur together, we can "re-normalize" the round trip and just deal with the net change in coordinate time (for any radius other than precisely r = 2m). It shouldn't be surprising that the infalling object is in two places (both inside and outside the event horizon) at the same coordinate time, because worldlines need not be singlevalued in terms of arbitrary curvilinear coordinates. Still, it might seem that this "dual presence" opens the door to time-travel paradoxes. For example, we can observe the increase in the gravitational radius at some finite coordinate time, when the particle that caused the increase has still not yet crossed the event horizon (using the terms "when" and "not yet" in the sense of coordinate time), so it might seem that we have the opportunity to retrieve the particle before it crosses the horizon, thus preventing the increase that triggered our retrieval! However, if we carefully examine the path of the particle, both outside and inside the event horizon, we find that by the time it has gotten "back" close to our present coordinate time on the interior branch, the exterior branch is past the point of last communication. Even a photon could not catch up with it prior to crossing the horizon. The "backward" portion of the particle's trajectory through coordinate time inside the horizon ends just short of enabling any causality paradoxes. (It's apparent from these considerations that classical relativity must be a strictly deterministic theory - in which each worldline can be treated as already existing in its entirety - because we could construct genuine paradoxes in a non-deterministic theory.) At this point it's worth noticing that our two strategies for explaining the formation and growth of black holes are essentially the same! In both cases the event horizon "reaches back" to us all the way from future null infinity. In a sense, that's why the infalling geodesics in Schwarzschild space go to infinity at the event horizon. To show the correspondence more clearly, we can turn the figure in Section 6.4 on end (so the coordinate time axis is vertical) and then redraw the constant-t lines as curves so as to accurately represent the absolute spacetime intervals. The result is shown below for a small infalling test particle:

Notice that the infalling worldline passes through all the Schwarzschild time slices t as it crosses the event horizon. Now suppose we take a longer view of this, beginning all the way back at the point of formation of the black hole, and suppose the infalling mass is significant relative to the original mass m. The result looks like this:

This shows how the stalactite reaches down from future infinity, and how the infalling mass passes through this infinity - but in finite proper time - to enter the interior of the black hole, and the event horizon expands accordingly. This figure is based on the actual spacetime intervals, and shows how the lines of constant Schwarzschild time t wrap around the exterior of the event horizon down to the point of formation, where they enter the interior of the black hole and "expand" back close to the region where they originated on the outside. One thing that sometimes concerns people when they look at a radial free-fall plot in Schwarzschild coordinates is related to the left hand side of the ballistic trajectory. Does the symmetry of the figure imply that we could launch a particle from r = 0, have it climb up to 5m, and then drop back down? No, because the light cones have tipped over at 2m, so the timelike and spacelike axes are reversed. Inside the event horizon the effective time axis points parallel to "r". As a result, although the left hand trajectory in the region above 2m is possible, the portion for r less than 2m is not; it's really just the timereversed version of the right hand side. (We could also imagine a topology in which all inward and outward trajectories are realized (Kruskal space), but there is no known mechanism that would generate such a structure.) Still, it's valid to ask "how did we decide which way was forward in time inside the event horizon?" The only formal requirement seems to be that our choice be consistent for any given event horizon, always increasing r or always decreasing r. If we make one choice of sign convention we have a "white hole" spewing objects outward into our universe, whereas if we make the opposite choice we have a black hole, drawing things inward.

The question of whether we should expect to find as many white holes as black holes in the universe is still a subject of lively debate. In the forgoing reference was made to mass accumulating "near" the horizon, but we need to be careful about the concepts of nearness. The intended meaning in the above context was that the mass is (1) exterior to the event horizon, and (2) within a small increment Δr of the horizon, where r is the radial Schwarzschild coordinate. I've also assumed spherical symmetry so that the Schwarzschild solution and Birkhoff's uniqueness proof apply (meaning that the spacetime in the interior of an empty spherically symmetrical shell is necessarily flat). Of course, in terms of the spacelike surfaces of simultaneity of an external particle, the event horizon is always infinitely far away, or, more accurately, the horizon doesn't intersect with any external spacelike surface, with the exception of the single degenerate time&space-like surface precisely at 2m, where the external time and space surfaces close on each other like scissors (and then swap roles in the interior). So in terms of these coordinates the particle is infinitely far from the horizon right up to the instant it crosses the horizon! And this is the same "instant" that every other infalling object crosses the horizon, although separated by great "distances". (This isn't really so strange. Midnight tonight is infinitely far from us in this same sense, because it is no finite spatial distance away, and it will remain so until the instant we reach it. Likewise the event horizon is ahead of us in time, not in space.) Incidentally, I should probably qualify my dismissal of the "frozen star" interpretation, because there's a sense in which it's valid, or at least defensible. Remember that historically the two most common conceptual models for general relativity have been the "geometric interpretation" (as exemplified by Misner/Thorne/Wheeler's "Gravitation") and the "field interpretation" (as in Weinberg's "Gravitation and Cosmology"). These two views are operationally equivalent outside event horizons, but they tend to lead to different conceptions of the limit of gravitational collapse. According to the field interpretation, a clock runs increasingly slowly as it approaches the event horizon (due to the strength of the field), and the natural "limit" of this process is that the clock just asymptotically approaches "full stop" (i.e., running at a rate of zero) as it approaches the horizon. It continues to exist for the rest of time, but it's "frozen" due to the strength of the gravitational field. Within this conceptual framework there's nothing more to be said about the clock's existence. This leads to the "frozen star" conception of gravitational collapse. In contrast, according to the geometric interpretation, all clocks run at the same rate, measuring out real distances along worldlines in spacetime. This leads us to think that, rather than slowing down as it approaches the event horizon, the clock is following a shorter and shorter path to the future. In fact, the path gets shorter at such a rate that it actually reaches (our) future infinity in finite proper time. Now what? If we believe the clock is still running just like every other clock (and there's no local pathology of the spacetime) then it seems natural to extrapolate the clock's existence right past our future infinity and into another region of spacetime. Obviously this implies that the universe has a "transfinite topology", which some people find troubling, but there's nothing logically contradictory about it (assuming the notion of an infinite continuous universe is not itself

logically contradictory). In both of these interpretations we find that an object goes to future infinity (of coordinate time) as it approaches an event horizon, and its rate of proper time as a function of coordinate time goes to zero. The difference is that the field interpretation is content to truncate its description at the event horizon, while the geometric interpretation carries on with its description right through the event horizon and down to r = 0 (where it too finally gives up). What, if anything, is gained by extrapolating the worldlines of infalling objects through the event horizon? One obvious gain is that it offers a prediction of what would be experienced by an infalling observer. Since this represents a worldline that we could, in principle, follow, and since the formulas of relativity continue to make coherent predictions along those worldlines, there doesn't seem to be any compelling reason to truncate our considerations at the horizon. After all, if we limit our view of the universe to just the worldlines we have followed, or that we intend to follow, we end up with a very oddly shaped universe. On the other hand, the "frozen star" interpretation does have the advantage of simplifying the topology, i.e., it allows us to exclude event horizons separating transfinite regions of spacetime. More importantly, by declining to consider the fate of infalling worldlines through the event horizon, we avoid dealing with the rather awkward issue of a genuine spacetime singularity at r = 0. Therefore, if the "frozen star" interpretation gave equivalent predictions for all externally observable phenomena, and was logically consistent, it would probably be the preferred view. The question is, does the concept of a "frozen star" satisfy those two conditions? We saw above that the idea of a frozen star as an empty region around which matter "bunches up" outside an event horizon isn't viable, because if nothing ever passes from the exterior to the interior of an event horizon (in finite coordinate time) we cannot accommodate infalling matter. Either the event horizon expands or it doesn't, and in either case we arrive at a contradiction unless the value of m inside the horizon increases, and does so in finite coordinate time. The "trousers topology" described previously is, in some ways, the best of both worlds, but it relies on a somewhat dubious model of material particles as micro singularities in spacetime. We've also seen how the analytical continuation of the external free-fall geodesics into the interior leads to an apparently self-consistent picture of black hole growth in finite coordinate time, and this picture turns out to be fairly isomorphic to the trousers model. (Whether it's isomorphic to the truth is another question.) It may be worthwhile to explicitly describe the situation. Consider a black hole of mass m. The event horizon has radius r = 2m in Schwarzschild coordinates. Now suppose a large concentric spherical dust cloud of total mass m surrounds the black hole is slowly pulled to within a shell of radius, say, 2.1m. The mass of the combined system is 2m, giving it a gravitational radius of r = 4m, and all the matter is now within r = 4m, so there must be, according to the unique spherically symmetrical solution of the field equations, an event horizon at r = 4m. Evidently the dust has somehow gotten inside the event horizon. We might think that although the event horizon has expanded to 4m, maybe the dust is being held "frozen" just outside the horizon at, say, 4.1m. But that can't be true because then there would be only 1m of mass inside the 4m radius, and the horizon would collapse.

Also, this would imply that any dust originally inside 4m must have been pushed outward, and there is no known mechanism for that to happen. One possible way around this would be for the density of matter to be limited (by some mechanism we don't understand) to just sub-critical. In other words, each spherical region of radius r would be limited to just less than r/2 mass. It might be interesting to figure out the mass density profile necessary to be just shy of having an event horizon at every radius r (possibly inverse square?), but the problem with this idea is that there just isn't any known force that would hold the matter in this configuration. By all the laws we know it would immediately collapse. Of course, it's easy to posit some kind of Pauli-like gravitational "exclusion principle" which would simply prohibit two particles of matter from occupying the same "gravitational state". After all, it's the electron and nucleon exclusion principles that yield the white dwarf and neutron star configurations, respectively. The only reason we end up with black holes is because the universe seems to be one exclusion principle short. Thus, barring any "new physics", there is nothing to prevent an event horizon from forming and expanding, and this implies that the value of m inside the horizon increases in finite coordinate time, which conflicts with the "frozen star" interpretation. The preceding discussion makes clear the fact that general relativity is not a relational theory. Schwarzschild spacetime represents a cosmology with a definite preferred frame of reference, the one associated with the time-independent metric components. (Einstein at first was quite disappointed when he learned that the field equations have such an explicitly non-Machian solution, i.e., a single mass in an otherwise empty infinite universe). Of course, we introduced the preferred frame ourselves by imposing spherical symmetry in the first place, but it's always necessary to impose some boundary or initial value conditions, and these conditions (in an open infinite universe) unavoidably single out a particular frame of reference (as discussed further in Section 7.7). That troubled Einstein greatly, and was his main reason for arguing that the universe must be closed, because only in that context can we claim that the entire metric is in some sense fully determined by the distribution of mass-energy. However, there is no precise definition of a black hole in a closed universe, so for the purposes of this discussion we're committed to a cosmology with an arbitrarily preferred frame. To visualize how this preferred frame effectively governs the physics in Schwarzschild space, consider the following schematic of a black hole:

The star collapsed at point "a", and formed an event horizon of radius 2m in Schwarzschild coordinates. How far is the observer at "O" from the event horizon? If we trace along the spacelike surface "t = now" we find that the black hole doesn't exist at time t = now, which is to say, it is nowhere on the t = now timeslice. The event horizon is in the future of every external timeslice, all the way to future infinity. In fact, the event horizon is part of future null infinity. Nevertheless, the black hole clearly affects the physics on the timeslice t = now. For example, if the "observer" at O looks toward the "nearby star", his view will be obstructed, i.e., the star will be eclipsed, because the observer is effectively in the shadow of the infinite future. The size of this shadow will increase as the size of the event horizon increases. Thus we can derive knowledge of a black hole from the shadow it casts (like an eclipse), noting that the outline of a shadow isn't subject to speed-of-light restrictions, so there's nothing contradictory about being able to detect the presence and growth of a black hole region in finite coordinate time. Moreover, if the observer is allowed to fall freely, he will go mostly leftward (and slightly up) toward r = 0, quickly carrying him through all future timeslices (which are infinitely compressed around the event horizon) and into the interior. In doing so, he causes the event horizon to expand slightly. 7.3 Falling Into and Hovering Near A Black Hole Unless the giddy heaven fall, And earth some new convulsion tear, And, us to join, the world should all Be cramped into a planisphere. As lines so loves oblique may well

Themselves in every angle greet; But ours, so truly parallel, Though infinite, can never meet. Therefore the love which us doth bind, But Fate so enviously debars, Is the conjunction of the mind, And opposition of the stars. Andrew Marvell (1621-1678) The empirical evidence for the existence of black holes – or at least something very much like them  has become impressive, although it is arguably still largely circumstantial. Indeed, most relativity experts, while expressing high confidence (bordering on certainty) in the existence of black holes, nevertheless concede that since any electromagnetic signal reaching us must necessarily have originated outside any putative black holes, it may always be possible to imagine that they were produced by some mechanism just short of a black hole. Hence we may never acquire, by electromagnetic signals, definitive proof of the existence of black holes – other than by falling into one. (It’s conceivable that gravitational waves might provide some conclusive external evidence, but no such waves have yet been detected.) Of course, there are undoubtedly bodies in the universe whose densities and gravitational intensities are extremely great, but it isn’t self-evident that general relativity remains valid in these extreme conditions. Ironically, considering that black holes have become one of the signature predictions of general relativity, the theory’s creator published arguments purporting to show that gravitational collapse of an object to within its Schwarzschild radius could not occur in nature. In a paper published in 1939, Einstein argued that if we consider progressively smaller and smaller stationary systems of particles revolving around each other under their mutual gravitational attraction, the particles would need to be moving at the speed of light before reaching the critical density. Similarly Karl Schwarzschild had computed the behavior of a hypothetical stationary star of uniform density, and found that the pressure must go to infinity as the star shrank toward the critical radius. In both cases the obvious conclusion is that there cannot be any stationary configurations of matter above the critical density. Some scholars have misinterpreted Einstein’s point, claiming that he was arguing against the existence of black holes within the context of general relativity. These scholars underestimate both Einstein’s intelligence and his radicalism. He could not have failed to understand that sub-light particles (or finite pressure in Schwarchild’s star) meant unstable collapse to a singular point of infinite density – at least if general relativity holds good. Indeed this was his point: general relativity must fail. Thus we are not surprised to find him writing in “The Meaning of Relativity” For large densities of field and matter, the field equations and even the field variables which enter into them have no real significance. One may not therefore assume the validity of the equations for very high density of field and matter… The present relativistic theory of gravitation is based on a separation of the

concepts of “gravitational field” and of “matter”. It may be plausible that the theory is for this reason inadequate for very high density of matter… These reservations were not considered to be warranted by other scientists at the time, and even less so today, but perhaps they can serve to remind us not to be too dogmatic about the validity of our theories of physics, especially when extrapolated to very extreme conditions that have never been (and may never be) closely examined. Furthermore, we should acknowledge that, even within the context of general relativity, the formal definition of a black hole may be impossible to satisfy. This is because, as discussed previously, a black hole is strictly defined as a region of spacetime that is not in the causal past of any point in the infinite future. Notice that this refers to the infinite future, because anything short of that could theoretically be circumvented by regions that are clearly not black holes. However, in some fairly plausible cosmological models the universe has no infinite future, because it re-collapses to a singularity in finite coordinate time. In such a universe (which, for all we know, could be our own), the boundary of any gravitationally collapsed region of spacetime would be contiguous with the boundary of the ultimate collapse, so it wouldn’t really be a separate black hole in the strict sense. As Wald says, "there appears to be no natural notion of a black hole in a closed RobertsonWalker universe which re-collapses to a final singularity", and further, "there seems to be no way to define a black hole in a closed universe, because it requires going to infinity, but there is no infinity in a closed universe." It’s interesting that this is essentially the same objection that is often raised by people when they first hear about black holes, i.e., they reason that if it takes infinite coordinate time for any object to cross an event horizon, and if the universe is going to collapse in a finite coordinate time, then it’s clear that nothing can possess the properties of a true black hole in such a universe. Thus, in some fairly plausible cosmological models it's not strictly possible for a true black hole to exist. On the other hand, it is possible to have an approximate notion of a black hole in some isolated region of a closed universe, but of course many of the interesting transfinite issues raised by true (perhaps a better name would be "ideal") back holes are not strictly applicable to an "approximate" black hole. Having said this, there is nothing to prevent us from considering an infinite open universe containing full-fledged black holes in all their transfinite glory. I use the word “transfinite” because ideal black holes involve singular boundaries at which the usual Schwarzschild coordinates for the external field of a gravitating body go to infinity - and back - as discussed in the previous section. There are actually two distinct kinds of "spacetime singularities" involved in an ideal black hole, one of which occurs at the center, r = 0, where the spacetime manifold actually does become unequivocally singular and the field equations are simply inapplicable (as if trying to divide a number by 0). It's unclear (to say the least) what this singularity actually means from a physical standpoint, but oddly enough the "other" kind of singularity involved in a black hole seems to shield us from having to face the breakdown of the field equations. This is because it seems (although it has not been proved) to be a characteristic of all realistic spacetime singularities in general relativity that they are invariably enclosed within an event

horizon, which is a peculiar kind of singularity that constitutes a one-way boundary between the interior and exterior of a black hole. This is certainly the case with the standard black hole geometries based on the Schwarzschild and Kerr solutions. The proposition that it is true for all singularities is sometimes called the Cosmic Censorship Conjecture. Whether or not this conjecture is true, it's a remarkable fact that at least some (if not all) of the singular solutions of Einstein's field equations automatically enclose the singularity inside an event horizon, amazing natural contrivance that effectively shields the universe from direct two-way exposure to any regions in which the metric of spacetime breaks down. Perhaps because we don't really know what to make of the true singularity at r = 0, we tend to focus our attention on the behavior of physics near the event horizon, which, for a non-rotating black hole, resides at the radial location r = 2m, where the Schwarzschild coordinates become singular. Of course, a singularity in a coordinate system doesn't necessarily represent a pathology of the manifold. (Consider traveling due East at the North Pole). Nevertheless, the fact that no true black hole can exist in a finite universe shows that the coordinate singularity at r = 2m is not entirely inconsequential, because it does (or at least can) represent a unique boundary between fundamentally separate regions of spacetime, depending on the cosmology. To understand the nature of this boundary, it's useful to consider hovering near the event horizon of a black hole. The components of the curvature tensor at r = 2m are on the order of 1/m2, so the spacetime can theoretically be made arbitrarily "flat" (Lorentzian) at that radius by making m large enough. Thus, for an observer "hovering" at a value of r that exceeds 2m by some small fixed amount Δr, the downward acceleration required to resist the inward pull can be arbitrarily small for sufficiently large m. However, in order for the observer to be hovering close to 2m his frame must be tremendously "boosted" in the radial direction relative to an in-falling particle. This is best seen in terms of a spacetime diagram such as the one below, which show the future light cones of two events located on either side of a black hole's event horizon.

In this drawing r is the radial Schwarzschild coordinate and t' is an Eddington-Finkelstein

mapping of the Schwarzschild time coordinate, i.e.,

The right-hand ray of the cone for the event located just inside the event horizon is tilted just slightly to the left of vertical, whereas the cone for the event just outside 2m is tilted just slightly to the right of vertical. The rate at which this "tilt" changes with r is what determines the curvature and acceleration, and for a sufficiently large black hole this rate can be made negligibly small. However, by making this rate small, we also make the outward ray more nearly "vertical" at a given Δr above 2m, which implies that the hovering observer's frame needs to be even more "boosted" relative to the local frame of an observer falling freely from infinity. The gravitational potential, which need not be changing very steeply at r = 2m, has nevertheless changed by a huge amount relative to infinity. We must be very deep in a potential hole in order for the light cones to be tilted that far, even though the rate at which the tilt has been increasing can be arbitrarily slow. This just means that for a super-massive black hole they started tilting at a great distance. As can be seen in the diagram, relative to the frame of a particle falling in from infinity, a hovering observer must be moving outward at near light velocity. Consequently his axial distances are tremendously contracted, to the extent that, if the value of Δr is normalized to his frame of reference, he is actually a great distance (perhaps even light-years) from the r = 2m boundary, even though he is just 1 inch above r = 2m in terms of the Schwarzschild coordinate r. Also, the closer he tries to hover, the more radial boost he needs to hold that value of r, and the more contracted his radial distances become. Thus he is living in a thinner and thinner shell of Δr, but from his own perspective there's a world of room. Assuming he brought enough rocket fuel to accelerate himself up to this "hovering frame" at that radius 2m + Δr (or actually to slow himself down to a hovering frame), he would thereafter just need to resist the local acceleration of gravity to maintain that frame of reference. Quantitatively, for an observer hovering at a small Schwarzschild distance Δr above the horizon of a black hole, the radial distance Δr' to the event horizon with respect to the observer's local coordinates would be

which approaches as Δr goes to zero. This shows that as the observer hovers closer to the horizon in terms of Schwarzschild coordinates, his "proper distance" remains relatively large until he is nearly at the horizon. Also, the derivative of Δr' with respect to Δr

in this range is , which goes to infinity as Δr goes to zero. (These relations pertain to a truly static observer, so they don’t apply when the observer is moving from

one radial position to another, unless he moves sufficiently slowly.) Incidentally, it's amusing to note that if a hovering observer's radial distance contraction factor at r was 12m/r instead of the square root of that quantity, his scaled distance to the event horizon at a Schwarzschild distance of Δr would be Δr' = 2m + Δr. Thus when he is precisely at the event horizon his scaled distance from it would be 2m, and he wouldn’t achieve zero scaled distance from the event horizon until arriving at the origin r = 0 of the Schwarzschild coordinates. This may seem rather silly, but it’s actually quite similar to one of Einstein’s proposals for avoiding what he regarded as the unpleasant features of the Schwarzschild solution at r = 2m. He suggested replacing the radial coordinate r with ρ = , and noted that the Schwarzschild solution expressed in terms of this coordinate behaves regularly for all values of ρ. Whether or not there is any merit in this approach, it clearly shows how easily we can “eliminate” poles and singularities simply by applying coordinates that have canceling zeros (much as one does in the design of control systems) or otherwise restricting the domain of the variables. However, we shouldn’t assume that every arbitrary system of coordinates has physical significance. What "acceleration of gravity" would a hovering observer feel locally near the event horizon of a black hole? In terms of the Schwarzschild coordinate r and the proper time τ of the particle, the path of a radially free-falling particle can be expressed parametrically in terms of the parameter θ by the equations

where R is the apogee of the path (i.e., the highest point, where the outward radial velocity is zero). These equations describe a cycloid, with θ = 0 at the top, and they are valid for any radius r down to 0. We can evaluate the second derivative of r with respect to τ as follows

At θ = 0 the path is tangent to the hovering worldline at radius R, and so the local gravitational acceleration in the neighborhood of a stationary observer at that radius equals m/R2, which implies that if R is approximately 2m the acceleration of gravity is about 1/(4m). Thus the acceleration of gravity in terms of the coordinates r and τ is finite at the event horizon, and can be made arbitrarily small by increasing m. However, this acceleration is expressed in terms of the Schwarzschild radial parameter r, whereas the hovering observer’s radial distance r' must be scaled by the “gravitational boost” factor, i.e., we have dr' = dr/(12m/r)1/2. Substituting this expression for dr into the above formula gives the proper local acceleration of a stationary observer

This value of acceleration corresponds to the amount of rocket thrust an observer would need in order to hold position, and we see that it goes to infinity as r goes to 2m. Nevertheless, for any ratio r/(2m) greater than 1 we can still make this acceleration arbitrarily small by choosing a sufficiently large m. On the other hand, an enormous amount of effort would be required to accelerate the rocket into this hovering condition for values of r/(2m) very close to 1. This amount of “boost” effort cannot be made arbitrarily small, because it essentially amounts to accelerating (outwardly) the rocket to nearly the speed of light relative to the frame of a free-falling particle from infinity. Interestingly, as the preceding figure suggests, an outward going photon can hover precisely at the event horizon, since at that location the outward edge of the light cone is vertical. This may seem surprising at first, considering that the proper acceleration of gravity at that location is infinite. However, the proper acceleration of a photon is indeed infinite, since the edge of a light cone can be regarded as hyperbolic motion with acceleration “a” in the limit as “a” goes to infinity, as illustrated in the figure below.

Also, it remains true that for any fixed Δr above the horizon we can make the proper acceleration arbitrarily small by increasing m. To see this, note that if r = 2m + Δr for a sufficiently small increment Δr we have m/r ~ 1/2, and we can bring the other factor of r into the square root to give

Still, these formulas contain a slight "mixing of metaphors", because they refer to two

different radial parameters (r' and r) with different scale factors. To remedy this, we can define the locally scaled radial increment Δr' = as the hovering observer’s “proper” distance from the event horizon. Then, since Δr = r  2m, we have Δr'   and so r = . Substituting this into the formula for the proper local acceleration gives the proper acceleration of a stationary observer at a "proper distance" Δr' above the event horizon of a (non-rotating) object of mass m is given by

Notice that as (Δr'/M) becomes small the acceleration approaches -1/(2Δr'), which is the asymptotic proper acceleration at a small "proper distance" Δr' from the event horizon of a large black hole. Thus, for a given proper distance Δr' the proper acceleration can't be made arbitrarily small by increasing m. Conversely, for a given proper acceleration g our hovering observer can't be closer than 1/(2g) of proper distance, even as m goes to infinity. For example, the closest an observer can get to the event horizon of a supermassive black hole while experiencing no more than 1g proper acceleration is about half a light-year of proper distance. At the other extreme, if (Δr'/m) is very large, as it is in normal circumstances between gravitating bodies, then this acceleration approaches m/(Δ r'2, which is just Newton's inverse-square law of gravity in geometrical units. We've seen that the amount of local acceleration that must be overcome to hover at a radial distance r increases to infinity at r = 2m, but this doesn't imply that the gravitational curvature of spacetime at that location becomes infinite. The components of the curvature tensor depend to some extent on the choice of coordinate systems, so we can't simply examine the components of Rαβγδ to ascertain whether the intrinsic curvature is actually singular at the event horizon. For example, with respect to the Schwarzschild coordinates the non-zero components of the covariant curvature tensor are

along with the components related to these by symmetry. The two components relating the radial coordinate to the spherical surface coordinates are singular at r = 2m, but this is again related to the fact that the Schwarzschild coordinates are not well-behaved on this manifold near the event horizon. A more suitable system of coordinates in this region (as noted by Misner, et al) is constructed from the basis vectors

where γ = . With respect to this "hovering" orthonormal system of coordinates the non-zero components of the curvature tensor (up to symmetry) are

Interestingly, if we transform to the orthonormal coordinates of a free-falling particle, the curvature components remain unchanged. Plugging in r = 2m, we see that these components are all proportional to 1/m2 at the event horizon, so the intrinsic spacetime curvature at r = 2m is finite. Indeed, for a sufficiently large mass m the curvature can be made arbitrarily mild at the event horizon. If we imagine the light cone at a radial coordinate r extremely close to the horizon (i.e., such that r/(2m) is just slightly greater than 1), with its outermost ray pointing just slightly in the positive r direction, we could theoretically boost ourselves at that point so as to maintain a constant radial distance r, and thereafter maintain that position with very little additional acceleration (for sufficiently large m). But, as noted above, the work that must be expended to achieve this hovering condition from infinity cannot be made arbitrarily small, since it requires us to accelerate to nearly the speed of light. Having discussed the prospects for hovering near a black hole, let's review the process by which an object may actually fall through an event horizon. If we program a space probe to fall freely until reaching some randomly selected point outside the horizon and then accelerate back out along a symmetrical outward path, there is no finite limit on how far into the future the probe might return. This sometimes strikes people as paradoxical, because it implies that the in-falling probe must, in some sense, pass through all of external time before crossing the horizon, and in fact it does, if by "time" we mean the extrapolated surfaces of simultaneity for an external observer. However, those surfaces are not well-behaved in the vicinity of a black hole. It's helpful to look at a drawing like this:

This illustrates schematically how the analytically continued surfaces of simultaneity for external observers are arranged outside the event horizon of a black hole, and how the infalling object's worldline crosses (intersects with) every timeslice of the outside world prior to entering a region beyond the last outside timeslice. The dotted timeslices can be modeled crudely as simple "right" hyperbolic branches of the form tj  T = 1/R. We just repeat this same -y = 1/x shape, shifted vertically, up to infinity. Notice that all of these infinitely many time slices curve down and approach the same asymptote on the left. To get to the "last timeslice" an object must go infinitely far in the vertical direction, but only finitely far in the horizontal (leftward) direction. The key point is that if an object goes to the left, it crosses every single one of the analytically continued timeslice of the outside observers, all the way to their future infinity. Hence those distant observers can always regard the object as not quite reaching the event horizon (the vertical boundary on the left side of this schematic). At any one of those slices the object could, in principle, reverse course and climb back out to the outside observers, which it would reach some time between now and future infinity. However, this doesn't mean that the object can never cross the event horizon (assuming it doesn't bail out). It simply means that its worldline is present in every one of the outside timeslices. In the direction it is traveling, those time slices are compressed infinitely close together, so the in-falling object can get through them all in finite proper time (i.e., its own local time along the worldline falling to the left in the above schematic). Notice that the temporal interval between two definite events can range from zero to infinity, depending on whose time slices we are counting. One observer's time is another observer's space, and vice versa. It might seem as if this degenerates into chaos, with no absolute measure for things, but fortunately there is an absolute measure. It's the absolute invariant spacetime interval "ds" between any two neighboring events, and the absolute distance along any specified path in spacetime is just found by summing up all the "ds" increments along that path. For any given observer, a local absolute increment ds can be projected onto his proper time axis and local surface of simultaneity, and these projections can be called dt, dx, dy, and dz. For a sufficiently small region around the observer these components are related to the absolute increment ds by the Minkowski or some other flat metric, but in the presence of curvature we cannot unambiguously project the components of extended intervals. The only unambiguous way of characterizing extended intervals (paths) is by summing the incremental absolute intervals along a given path. An observer obviously has a great deal of freedom in deciding how to classify the locations of putative events relative to himself. One way (the conventional way) is in terms of his own time-slices and spatial distances as measured on those time slices, which works fairly well in regions where spacetime is flat, although even in flat spacetime it's possible for two observers to disagree on the lengths of objects and the spatial and temporal distances between events, because their reference frames may be different. However, they will always agree on the ds between two events. The same is true of the integrated absolute interval along any path in curved spacetime. The dt,dx,dy,dz

components can do all sorts of strange things, but observers will always agree on ds. This suggests that rather than trying to map the universe with a "grid" composed of time slices and spatial distances on those slices, an observer might be better off using a sort of "polar" coordinate system, with himself at the center, and with outgoing geodesic rays in all directions and at all speeds. Then for each of those rays he measures the total ds between himself and whatever is "out there". This way of "locating" things could be parameterized in terms of the coordinate system [θ, ϕ, β, s] where θ and ϕ are just ordinary latitude and longitude angles to determine a direction in space, β is the velocity of the outgoing ray (divided by c), and s is the integrated ds distance along that ray as it emanates out from the origin to the specified point along a geodesic path. (Incidentally, these are essentially the coordinates Riemann used in his 1854 thesis on differential geometry.) For any event in spacetime the observer can now assign it a location based on this system of coordinates. If the universe is open, he will find that there are things which are only a finite absolute distance from him, and yet are not on any of his analytically continued time slices! This is because there are regions of spacetime where his time slices never go, specifically, inside the event horizon of a black hole. This just illustrates that an external observer's time slices aren't a very suitable set of surfaces with which to map events near a black hole, let alone inside a black hole. For this reason it's best to measure things in terms of absolute invariant distances rather than time slices, because time slices can do all sorts of strange things and don't necessarily cover the entire universe, assuming an open universe. Why did I specify an open universe? The schematic above depicted an open universe, with infinitely many external time slices, but if the universe is closed and finite, there are only finitely many external time slices, and they eventually tip over and converge on a common singularity, as shown below

In this context the sequence of tj slices eventually does include the vertical slices. Thus, in a closed universe an external observer's time slices do cover the entire universe, which is why there really is no true event horizon in a closed universe. An observer could use his analytically continued time slices to map all events if he wished, although they would

still make an extremely somewhat ill-conditioned system of coordinates near an approximate black hole. One common question is whether a man falling (feet first) through an even horizon of a black hole would see his feet pass through the event horizon below him. As should be apparent from the schematics above, this kind of question is based on a misunderstanding. Everything that falls into a black hole falls in at the same local time, although spatially separated, just as everything in your city is going to enter tomorrow at the same time. We generally have no trouble seeing our feet as we pass through midnight tonight, although it is difficult one minute before midnight trying to look ahead and see your feet one minute after midnight. Of course, for a small black hole you will have to contend with tidal forces that may induce more spatial separation between your head and feet than you'd like, but for a sufficiently large black hole you should be able to maintain reasonable point-to-point co-moving distances between the various parts of your body as you cross the horizon. On the other hand, we should be careful not to understate the physical significance of the event horizon, which some authors have a tendency to do, perhaps in reaction to earlier over-estimates of its significance. Section 6.4 includes a description of a sense in which spacetime actually is singular at r = 2m, even in terms of the proper time of an in-falling particle, but it turns out to be what mathematicians call a "removable singularity", much like the point x = 0 on the function sin(x)/x. Strictly speaking this "curve" is undefined at that point, but by analytic continuation we can "put the point back in", essentially by just defining sin(x)/x to be 1 at x = 0. Whether nature necessarily adheres to analytic continuation in such cases is an open question. Finally, we might ask what an observer would find if he followed a path that leads across an event horizon and into a black hole. In truth, no one really knows how seriously to take the theoretical solutions of Einstein's field equations for the interior of a black hole, even assuming an open infinite universe. For example, the "complete" Schwarzschild solution actually consists of two separate universes joined together at the black hole, but it isn't clear that this topology would spontaneously arise from the collapse of a star, or from any other known process, so many people doubt that this complete solution is actually realized. It's just one of many strange topologies that the field equations of general relativity would allow, but we aren't required to believe something exists just because it's a solution of the field equations. On the other hand, from a purely logical point of view, we can't rule them out, because there aren't any outright logical contradictions, just some interesting transfinite topologies.

7.4 Curled-Up Dimensions I do not mind confessing that I personally have often found relief from the dreary infinities of homaloidal space in the consoling hope that, after all, this other may be the true state of things.

William Kingdon Clifford, 1873 The simplest cylindrical space can be represented by the perimeter of a circle. This onedimensional space with the coordinate X has the natural embedding in two-dimensional space with orthogonal coordinates (x1,x2) given by the circle formulas

From the derivatives dx1/dX = sin(X/R) and dx2/dX = cos(X/R) we have the Pythagorean identity (dx1)2 + (dx2)2 = (dX)2. The length of this cylindrical space is 2πR. We can form the Cartesian product of n such cylindrical spaces, with radii R1, R2, ..,Rn respectively, to give an n-dimensional space that is cylindrical in all directions, with a total "volume" of

For example, a three-dimensional space that is everywhere locally Euclidean and yet cylindrical in all directions can be constructed by embedding the three spatial dimensions in a six-dimensional space according to the parameterization

so the spatial Euclidean line element is

giving a Euclidean spatial metric in a closed three-space with total volume (2π)3R1R2R3. Subtracting from this an ordinary temporal component gives an everywhere-locallyLorentzian spacetime that is cylindrical in the three spatial directions, i.e.,

However, this last step seems half-hearted. We can imagine a universe cylindrical in all directions, temporal as well as spatial, by embedding the entire four-dimensional spacetime in a manifold of eight dimensions, two of which are purely imaginary, as follows:

This leads again to the locally Lorentzian four-dimensional metric (1), but now all four of the dimensions X,Y,Z,T are periodic. So here we have an everywhere-locally-Lorentzian manifold that is closed and unbounded in every spatial and temporal direction. Obviously this manifold contains closed time-like worldlines, although they circumnavigate the entire universe. Whether such a universe would appear (locally) to possess a directional causal structure is unclear. We might imagine that a flat, closed, unbounded universe of this type would tend to collapse if it contained any matter, unless a non-zero cosmological constant is assumed. However, it's not clear what "collapse" would mean in this context. For example, it might mean that the Rn parameters would shrink, but they are not strictly dynamical parameters of the model. The four-dimensional field equations of general relativity operate only on X,Y,Z,T, so we have no context within which the Rn parameters could "evolve". Any "change" in Rn would imply some meta-time parameter τ, so that all the Rn coefficients in the embedding formulas would actually be functions Rn(τ). Interestingly, the local flatness of the cylindrical four-dimensional spacetime is independent of the value of R(τ), so if our "internal" field equations are satisfied for one set of Rn values they would be satisfied for any other values. The meta-time τ and associated meta-dynamics would be independent of the internal time T for a given observer unless we imagine some "meta field equations" relating τ to the internal parameters X,Y,Z,T. We might even speculate that these meta-equations would allow (require?) the values of Rn to be "increasing" versus τ, and therefore indirectly versus our internal time T = f(τ), in order to ensure stability. (One interesting question raised by these considerations locally flat n-dimensional spaces embedded in flat 2n-dimensional spaces is whether every orthogonal basis in the n-space maps to an orthogonal basis in the 2n-space according to a set of formulas formally the same as those shown above, and, if not, whether there is a more general mapping that applies to all bases.) The above totally-cylindrical spacetime has a natural expression in terms of "octonion space", i.e., the Cayley algebra whose elements are two ordered quaternions

Thus each point (X,Y,Z,T) in four-dimensional spacetime represents two quaternions

To determine the absolute distances in this eight-dimensional manifold we again consider the eight coordinate differentials, exemplified by

(using the rule for total differentials) so the squared differentials are exemplified by

Adding up the eight squared differentials to give the square of the absolute differential interval leads again to the locally Lorentzian four-dimensional metric

Naturally it isn't necessary to imagine an embedding of our hypothesized closed dimensions in a higher-dimensional space, but it can be helpful for visualizing the structure. One of the first suggestions for closed cylindrical dimensions was made by Theodor Kaluza in 1919, in a paper communicated to the Prussian Academy by Einstein in 1921. The idea proposed by Kaluza was to generalize relativity from four to five dimensions. The introduction of the fifth dimension increases the number of components of the Riemann metric tensor, and it was hoped that some of this additional structure would represent the electromagnetic field on an equal footing with the gravitational field on the "left side" of Einstein's field equations, instead of being lumped into the stressenergy tensor Tµν. Kaluza showed that, at least in the weak field limit for low velocities, we can arrange for a five dimensional manifold with one cylindrical dimension such that geodesic paths correspond to the paths of charged particles under the combined influence of gravitational and electromagnetic fields. In 1926 Oskar Klein proved that the result was valid even without the restriction to weak fields and low velocities. The fifth dimension seems to have been mainly a mathematical device for Kaluza, with little physical significance, but subsequent researchers have sought to treat it as a real physical dimension, and more recent "grand unification theories" have postulated field theories in various numbers of dimensions greater than four (though none with fewer than four, so far as I know). In addition to increasing the amount of mathematical structure, which might enable the incorporation of the electromagnetic and other fields, many researchers (including Einstein and Bergmann in the 1930's) hoped the indeterminacy of quantum phenomena might be simply the result of describing a five-dimensional world in terms of four-dimensional laws. Perhaps by re-writing the laws in the full five dimensions quantum mechanics could, after all, be explained by a field theory. Alas, as Bergmann later noted, "it appears these high hopes were unjustified".

Nevertheless, theorists ever since have freely availed themselves of whatever number of dimensions seemed convenient in their efforts to devise a fundamental "theory of everything". In nearly all cases the extra dimensions are spatial and assumed to be closed with extremely small radii in terms of macroscopic scales, thus explaining why it appears that macroscopic objects exist in just three spatial dimensions. Oddly enough, it is seldom mentioned that we do, in fact, have six extrinsic relational degrees of freedom, consisting of the three open translational dimensions and the closed orientational dimensions, which can be parameterized (for example) by the Euler angles of a frame. Of course, these three dimensions are not individually cylindrical, nor do they commute, but at each point in three-dimensional space they constitute a closed three-dimensional manifold isomorphic to the group of rotations. It's also worth noting that while translational velocity in the open dimensions is purely relativistic, angular velocity in the closed dimensions is absolute, and there is no physical difficulty in discerning a state of absolute non-rotation. This is interesting because, even though a closed cylindrical space may be locally Lorentzian, it is globally absolute, in the sense that there is a globally distinguished state of motion with respect to which an inertial observer's natural surfaces of simultaneity are globally coherent. In any other state of motion the surfaces of simultaneity are helical in time, similar to the analytically continued systems of reference of observers at rest on the perimeter of a rotating disk. To illustrate, consider two possible worldlines of a single particle P in a one-dimensional cylindrical space as shown in the spacetime diagrams below.

The cylindrical topology of the space is represented by identifying the worldline AB with the worldline CD. Now, in the left-hand figure the particle P is stationary, and it emits pulses of light in both directions at event a. The rightward-going pulse passes through event c, which is the same as event b, and then it proceeds from b to d. Likewise the leftward-going pulse goes from a to b and then from c to d. Thus both pulses arrive back at the particle P simultaneously. However, if the particle P is in absolute motion as shown in the right-hand figure, the rightward light pulse goes from a to c and then from c’ to d2, whereas the leftward pulse goes from a to b and then from b’ to d1, so in this case the

pulses do not arrive back at particle P simultaneously. The absolutely stationary worldlines in this cylindrical space are those for which the diverging-converging light cones remain coherent. (In the one-dimensional case there are discrete absolute speeds greater than zero for which the leftward and rightward pulses periodically re-converge on the particle P.) Of course, for a different mapping between the events on the line AB and the events on the line CD we would get a different state of rest. The worldlines of identifiable inertial entities establish the correct mapping. If we relinquish the identifiability of persistent entities through time, and under completed loops around the cylindrical dimension, then the mapping becomes ambiguous. For example, we assume particle P associates the pulses absorbed at event d with the pulses emitted at event a, although this association is not logically necessary. 7.5 Packing Universes In Spacetime All experience is an arch wherethrough Gleams that untraveled world whose margin fades Forever and forever when I move. Tennyson, 1842 One of the interesting aspects of the Minkowski metric is that every lightcone (in principle) contains infinitely many nearly-complete lightcones. Consider just a single spatial dimension in which an infinite number of point particles are moving away from each other with mutual velocities as shown below:

Each particle finds itself mid-way between its two nearest neighbors, which are receding at nearly the speed of light, so that each particle can be regarded as the origin of a nearlycomplete lightcone. On the other hand, all of these particles emanate from a single point, and the entire infinite set of points (and nearly-complete lightcones) resides within the future lightcone of that single point. More formally, a complete lightcone in a flat Lorentzian xt plane comprises the boundary of all points reachable from a given point P along world lines with speeds less than 1 relative to any and every inertial worldline through P. Also, relative to any specific inertial frame W we can define an "ε-complete lightcone" as the region reachable from P along world lines with speeds less than (1ε) relative to W, for some arbitrarily small ε > 0. A complete lightcone contains infinitely many epsilon-complete lightcones, as illustrated above by the infinite linear sequence of particles in space, each receding with a speed of (1ε) relative to its closest neighbors. Since we can never observe something infinitely red-shifted, it follows that our observable universe can fit inside an ε-complete lightcone just as well as in a truly complete lightcone. Thus a single lightcone in infinite

flat Lorentzian spacetime encompasses infinitely many mutually exclusive ε-universes. If we arbitrarily select one of the particles as the "rest" particle P0, and number the other particles sequentially, we can evaluate the velocities of the other particles with respect to the inertial coordinates of P0, whose velocity is v0 = 0. If each particle has a mutual velocity u relative to each of its nearest neighbors, then obviously P1 has a speed v1 = u. The speed of P2 is u relative to P1, and its speed relative to P0 is given by the relativistic speed composition formula v2 = (v1 + u)/(uv1 + 1). In general, the speed of Pk can be computed recursively based on the speed of Pk-1 using the formula

This is just a linear fractional function, so we can use the method described in Section 2.6 to derive the explicit formula

Similarly, in full 3+1 dimensional spacetime we can consider packing ε-complete lightspheres inside a complete lightsphere. A flash of light at point P in flat Lorentzian spacetime emanates outward in a spherical shell as viewed from any inertial worldline through P. We arbitrarily select one such worldline W0 as our frame of reference, and let the slices of simultaneity relative to this frame define a time parameter t. The points of the worldline W0 can be regarded as the stationary center of a 3D expanding sphere at each instant t. On any given time-slice t we can set up orthogonal space coordinates x,y,z relative to W0 and normalize the units so that the radius of the expanding lightsphere at time t equals 1. In these terms the boundary of the lightsphere is just the sphere

Now let W1 denote another inertial worldline through the point P with a velocity v = v1 relative to W0, and consider the region R1 surrounding W1 consisting of the points reachable from P with speeds not exceeding u = u1 relative to W1. The region R1 is spherical and centered on W1 relative to the frame of W1, but on any time-slice t (relative to W0) the region R1 has an ellipsoidal shape. If v is in the z direction then the crosssectional boundary of R1 on the xy plane is given parametrically by

as θ ranges from 0 to 2π. The entire boundary is just the surface of rotation of this ellipse

about the z axis. If v1 has a magnitude of (1  ε) for some arbitrarily small ε > 0, and if we set u1 = |v1|, then as ε goes to zero the boundary of the region R1 approaches the limiting ellipsoid

Similarly if W2 is an inertial worldline with speed |v2| = |v1| in the negative z direction relative to W0, then the boundary of the region R2 consisting of the points reachable from P with speeds not exceeding u2 = |v2| approaches the limiting ellipsoid

The regions R1 and R2 are mutually exclusive, meeting only at the point of contact [0,0,0]. Each of these regions can be called an "ε-complete" lightsphere. Interestingly, beginning with R1 and R2 we can construct a perfect tetrahedral packing of eight epsilon-complete lightspheres by placing six more spheres in a hexagonal ring about the z axis with centers in the xy plane, such that each sphere just touches R1 and R2 and its two adjacent neighbors in the ring. Each of these six spheres represents a region reachable from P with speeds less than u1 relative to one of six worldlines whose speeds are (1  4ε) relative to W0. The normalized boundaries of these six ellipsoids on a timeslice t are given by

for k = 0,1,..,5. In the limit as epsilon goes to zero the hexagonal cluster of e-spheres touching any given e-sphere becomes vanishingly small with respect to the given sphere's frame of reference, so we approach the condition that this hexagonal pattern tessellates the entire surface of each e-sphere in a perfectly symmetrical tetrahedral packing of identical epsilon-complete lightspheres. A cross-sectional side-view and top-view of this configuration are shown below.

These considerations show that we can regard a single light cone as a cosmological model, taking advantage of the complete symmetry in Minkowski spacetime. Milne was the first to discuss this model in detail. He postulated a cloud of particles expanding in flat spacetime from a single event O, with a distribution of velocities such that the mutual velocities between neighboring particles was the same for every particle, just as in the one-dimensional case described at the beginning of this section. With respect to any particular system of inertial coordinates t,x,y,z whose origin is at the event O, the cloud of particles is spherically symmetrical with radially outward speed v = r/t. The density of the particles is also spherically symmetrical, but it is not isotropic. To determine the density with respect to the inertial coordinates t,x,y,z, we first consider the density in the radial direction at a point on the x axis at time t. If we let u denote the mutual speed between neighboring particles, then the speed vn of the nth particle away from the center is

where xn is the radial distance of the nth particle along the x axis. Solving for n gives

Differentiating with respect to x gives the density of particles in the x directions

This confirms that the one-dimensional density at the spatial origin drops in proportion to 1/t. Also, by symmetry, the densities in the transverse directions y and z at any point are given by this same expression as a function of the proper time τ = t point

at that

This shows that the densities in the transverse directions are less than in the radial direction by a factor of . Neglecting the anisotropy, the number of particles in a volume element dxdydz at a radial distance r from the spatial origin at time t is proportional to

This distribution applies to every inertial system or coordinates with origin at O, so this cosmology looks the same, and is spherically symmetrical, with respect to the rest frame of each individual particle. The above analysis was based on a foliation of spacetime into slices of constant-t for some particular system of inertial coordinates, but this is not the only possible foliation, nor even the most natural. From a cosmological standpoint we might adopt as our time coordinate at each point the proper time of uniform worldline extending from O to that point. This would give hyperboloid spacelike surfaces consisting of the locus of all the points with a fixed proper age from the origin event O. One of these spacelike slices is illustrated by the "τ = k" line in the figure below.

Rindler points out that if τ = k is the epoch at which the density of the expanding cloud drops low enough so that matter and thermal radiation decouple, we should expect at the present event "p" to be receiving an isotropic and highly red-shifted "background radiation" along the dotted lightlike line from that de-coupling surface as shown in the figure. As our present event p advances into the future we expect to see a progressively more red-shifted (i.e., lower temperature) background radiation. This simplistic model gives a surprisingly good representation of the 3K microwave radiation that is actually observed. It's also worth noting that if we adopt the hyperboloid foliation the universe of this expanding cloud is spatially infinite. We saw in Section 1.7 that the absolute radial distance along this surface from the spatial center to a point at r is

where r2 = x2 + y2 + z2 in terms of the inertial coordinates of the central spatial point. Furthermore, we can represent this hyperboloid spatial surface as existing over the flat Euclidean xy plane with the elevation . By making the elevation imaginary, we capture the indefinite character of the surface. In the limit near the origin we can expand h to give

So, according to the terminology of Section 5.3, we have a surface tangent to the xy plane at the origin with elevation given by h = ax2 + bxy + cy2 where a = c = i/2τ and b = 0. Consequently the Gaussian curvature of this spatial surface is K = 4ac  b2 = -1/τ2. By symmetry the same analysis is applicable at every point on the surface, so this surface has constant negative curvature. This applies to any two-dimensional spatial tangent plane in the three-dimensional space at each point for constant τ. We can also evaluate the metric on this two-dimensional spacelike slice, by writing the total differential of h

Squaring this and adding the result to (dx)2 + (dy)2 gives the line element for this surface in terms of the tangent xy plane coordinates projected onto the surface

7.6 Cosmological Coherence Our main “difference in creed” is that you have a specific belief and I am a skeptic. Willem de Sitter, 1917 Almost immediately after Einstein arrived at the final field equations of general relativity, the very foundation of his belief in those equations was shaken, first by appearance of Schwarzschild’s exact solution of the one-body problem. This was disturbing to Einstein because at the time he held the Machian belief that inertia must be attributable to the effects of distant matter, so he thought the only rigorous global solutions of the field equations would require some suitable distribution of distant matter. Schwarzschild’s solution represents a well-defined spacetime extending to infinity, with ordinary inertial behavior for infinitesimal test particles, even though the only significant matter in this universe is the single central gravitating body. That body influences the spacetime in its vicinity, but the metric throughout spacetime is primarily determined by the spherical symmetry, leading to asymptotically flat spacetime at great distances from the central body. This seems rather difficult to reconcile with “Mach’s Principle”, but there was worse to come, and it was Einstein himself who opened the door. In an effort to conceive of a static cosmology with uniformly distributed matter he found it necessary to introduce another term to the field equations, with a coefficient called the cosmological constant. (See Section 5.8.) Shortly thereafter, Einstein received a letter

from the astronomer Willem de Sitter, who pointed out a global solution of the modified field equations (i.e., with non-zero cosmological constant) that is entirely free of matter, and yet that possesses non-trivial metrical structure. This thoroughly un-Machian universe was a fore-runner of Gödel’s subsequent cosmological models containing closed-timelike curves. After a lively and interesting correspondence about the shape of the universe, carried on between a Dutch astronomer and a German physicist at the height of the first world war, de Sitter published a paper on his solution, and Einstein published a rebuttal, claiming (incorrectly) that “the De Sitter system does not look at all like a world free of matter, but rather like a world whose matter is concentrated entirely on the [boundary]”. The discussion was joined by several other prominent scientists, including Weyl, Klein, and Eddington, who all tried to clarify the distinction between singularities of the coordinates and actual singularities of the manifold/field. Ultimately all agreed that de Sitter was right, and his solution does indeed represent a matter-free universe consistent with the modified field equations. We’ve seen that the Schwarzschild metric represents the unique spherically symmetrical solution of the original field equations of general relativity - assuming the cosmological constant, denoted by λ in Section 5.8, is zero. If we allow a non-zero value of λ, the Schwarzschild solution generalizes to

To avoid upsetting the empirical successes of general relativity, such as the agreement with Mercury’s excess precession, the value of λ must be extremely small, certainly less than 10-40 m-2, but not necessarily zero. If λ is precisely zero, then the Schwarzschild metric goes over to the Minkowski metric when the gravitating mass m equals zero, but if λ is not precisely zero the Schwarzschild metric with zero mass is

where L is a characteristic length related to the cosmological constant by L2 = 3/λ. This is one way of writing the metric of de Sitter spacetime. Just as Minkowski spacetime is a solution of the original vacuum field equation Rµν = 0, so the de Sitter metric is a solution of the modified field equations Rµν = λgµν. Since there is no central mass in this case, it may seem un-relativistic to use polar coordinates centered on one particular point, but it can be shown that – just as with the Minkowski metric in polar coordinates – the metric takes the same form when centered on any point. The metric (1) can be written in a slightly different form in terms of the radial coordinate ρ defined by

Noting that dr/L = cos(ρ/L)dρ, the de Sitter metric is

Interestingly, with a suitable change of coordinates, this is actually the metric of the surface of a four-dimensional pseudo-sphere in five-dimensional Minkowski space. Returning to equation (1), let x,y,z denote the usual three orthogonal spatial coordinates such that x2 + y2 + z2 = r2, and suppose there is another orthogonal spatial coordinate W and a time coordinate T defined by

For any values of x,y,z,t we have

so this locus of events comprises the surface of a hyperboloid, i.e., a pseudo-sphere of “radius” L. In other words, the spatial universe for any given time T is the threedimensional surface of the four-dimensional sphere of squared radius L2 + T2. Hence the space shrinks to a minimum radius L at time T = 0 and then expands again as T increases, as illustrated below (showing only two of the spatial dimensions).

Assuming the five-dimensional spacetime x,y,z,W,T has the Minkowski metric

we can determine the metric on the hyperboloid surface by substituting the squared differentials (dT)2 and (dW)2

into the five-dimensional metric, which gives equation (1). The accelerating expansion of the space for a positive cosmological constant can be regarded as a consequence of a universal repulsive force. The radius of the spatial sphere follows a hyperbolic trajectory similar to the worldlines of constant proper acceleration discussed in Section 2.9. To show that the expansion of the de Sitter spacetime can be seen as exponential, we can put the metric into the “Robertson-Walker form” (see Section 7.1) by defining a new system of coordinates

such that

where

It follows that

where

Substituting into the metric (1) gives the exponential form

This the characteristic length R(t) for this metric is the simple exponential function. (This form of the metric covers only part of the manifold.) Equations (1), (2), and (3) are the most common ways of expressing de Sitter’s metric, but in the first letter that de Sitter wrote to Einstein on this subject he didn’t give the line element in any of these familiar forms. We can derive his original formulation beginning with (1) if we define new coordinates

related to the r,t coordinates of (1) by

Incidentally, the t coordinate is the “relativistic difference” between the advanced and retarded combinations of the barred coordinates, i.e.,

The differentials in (1) can be expressed in terms of the barred coordinates as

where the partials are

and

Making these substitutions and simplifying, we get the “Cartesian” form of the metric that de Sitter presented in his first letter to Einstein

where dΩ denotes the angular components, which are unchanged from (1). These

expressions have some purely mathematical features of interest. For example, the line element is formally similar to the expressions for curvature discussed in Section 5.3. Also, the denominators of the partials of t are, according to Heron’s formula, equal to 16A2 where A is the area of a triangle with edge lengths

.

If the cosmological constant was zero (meaning that L was infinite) all the dynamic solutions of the field equations with matter predict a slowing rate of expansion, but in 1998 two independent groups of astronomers reported evidence that the expansion of the universe is actually accelerating. If these findings are correct, then some sort of repulsive force is needed in models based on general relativity. This has led to renewed interest in the cosmological constant and de Sitter spacetime, which is sometimes denoted as dS4. If the cosmological constant is negative the resulting spacetime manifold is called anti-de Sitter spacetime, denoted by AdS4. In the latter case, we still get a hyperboloid, but the time coordinate advances circumferentially around the surface. To avoid closed time-like curves, we can simply imagine “wrapping” sheets around the hyperboloid. As discussed in Section 7.1, the characteristic length R(t) of a manifold (i.e., the timedependent coefficient of the spatial part of the manifold) satisfying the modified Einstein field equations (with non-zero cosmological constant) varies as a function of time in accord with the Friedmann equation

where dots signify derivatives with respect to a suitable time coordinate, C is a constant, and k is the curvature index, equal to either -1, 0, or +1. The terms on the right hand side are akin to potentials, and it’s interesting to note that the first two terms correspond to the two hypothetical forms of gravitation highlighted by Newton in the Principia. (See Section 8.2 for more on this.) As explained in Section 7.1, the Friedmann equation implies that R satisfies the equation

which shows that, if λ = 0, the characteristic cosmological length R is a solution of the “separation equation” for non-rotating gravitationally governed distances, as given by equation (2) of Section 4.2. Comparing the more general gravitational separation from Section 4.2 with the general cosmological separation, we have

which again highlights the inverse square and the direct proportionalities that caught Newton’s attention. It’s interesting that with m = 0 the left-hand expression reduces to the purely inertial separation equation, whereas with λ = 0 the right hand expression reduces to the (non-rotating) gravitational separation equation. We saw that the “homogeneous”

forms of these equations are just special cases of the more general relation

where subscripts denote derivatives with respect to a suitable time coordinate. Among the solutions of this equation, in addition to the general co-inertial separations, non-rotating gravitational separations, and rotating-sliding separations, are sinusoidal functions and exponential functions. Historically this led to the suspicion, long before the recent astronomical observations, that there might be a class of exponential cosmological distances in addition to the cycloidal and parabolic distances. In other words, there could be different classes of observable distances, some very small and oscillatory, some larger and slowing, and some – the largest of all – increasing at an accelerating rate. This is illustrated in the figure below.

Of course, according to all conventional metrical theories, including general relativity, the spatial relations between material objects (on any chosen temporal foliation) conform to a single three-dimensional manifold. Assuming homogeneity and isotropy, it follows that all the cosmological distances between objects are subject to the ordinary metrical relations such as the triangle inequality. This greatly restricts the observable distances. On the other hand, our assumption that the degrees of freedom are limited in this way is based on our experience with much smaller distances. We have no direct evidence that cosmological distances are subject to the same dependencies. As an example of how concepts based on limited experience can be misleading, recall how special relativity revealed that the metric of our local spacetime fails to satisfy the axioms of a metric, including the triangle inequality. The non-additivity of relative speeds was not anticipated based on human experience with low speeds. Likewise for three “co-linear” objects A,B,C, it’s conceivable that the distance AC is not the simple sum of the distances AB and BC. The feasibility of regarding separations (rather than particles) as the elementary objects of nature was discussed in Section 4.1.

One possible observational consequence of having distances of several different classes would be astronomical objects that are highly red-shifted and yet much closer to us than the standard Hubble model would imply based on their redshifts. (Of course, even if this view was correct, it might be the case that all the exponential separations have already passed out of view.) Another possible consequence would be that some observable distances would be increasing at an accelerating rate, whereas others of the same magnitude might be decelerating. The above discussion shows that the idea of at least some cosmological separations increasing at an accelerating rate can (and did) arise from completely a priori considerations. Of course, as long as a single coherent expansion model is adequate to explain our observations, the standard GR models of a smooth manifold will remain viable. Less conventional notions such as those discussed above would only be called for only if we begin to see conflicting evidence, e.g., if some observations strongly indicate accelerating expansion while others strongly indicate decelerating expansion. The cosmological constant is hardly ever discussed without mentioning that (according to Gamow) Einstein called it his “biggest blunder”, but the reasons for regarding this constant as a “blunder” are seldom discussed. Some have suggested that Einstein was annoyed at having missed the opportunity to predict the Hubble expansion, but in his own writings Einstein argued that “the introduction of [the cosmological constant] constitutes a complication of the theory, which seriously reduces its logical simplicity”. He also wrote “If there is no quasi-static world, then away with the cosmological term”, adding that it is “theoretically unsatisfactory anyway”. In modern usage the cosmological term is usually taken to characterize some feature of the vacuum state, and so it is a fore-runner of the extremely complicated vacua that are contemplated in the “string theory” research program. If Einstein considered the complication and loss of logical simplicity associated with a single constant to be theoretically unsatisfactory, he would presumably have been even more dissatisfied with the nearly infinite number of possible vacua contemplated in current string research. Oddly enough, the de Sitter and anti-de Sitter spacetimes play a prominent role in this research, especially in relation to the so-called AdS/CFT conjecture involving conformal field theory. 7.7 Boundaries and Symmetries Whether Heaven move or Earth, Imports not, if thou reckon right. John Milton, 1667 Each point on the surface of an ordinary sphere is perfectly symmetrical with every other point, but there is no difficulty imagining the arbitrary (random) selection of a single point on the surface, because we can define a uniform probability density on this surface. However, if we begin with an infinite flat plane, where again each point is perfectly symmetrical with every other point, we face an inherent difficulty, because there does not

exist a perfectly uniform probability density distribution over an infinite surface. Hence, if we select one particular point on this infinite flat plane, we can't claim, even in principle, to have chosen from a perfectly uniform distribution. Therefore, the original empty infinite flat plane was not perfectly symmetrical after all, at least not with respect to our selection of individual points. This shows that the very idea of selecting a point from a pre-existing perfectly symmetrical infinite manifold is, in a sense, selfcontradictory. Similarly the symmetry of infinite Minkowski spacetime admits no distinguished position or frame of reference, but the introduction of an inertial particle not only destroys the symmetry, it also contradicts the premise that the points of the original manifold were perfectly symmetrical, because the non-existence of a uniform probability density distribution over the possible positions and velocities implies that the placement of the particle could not have been completely impartial. Even if we postulate a Milne cosmology (described in Section 7.5), with dust particles emanating from a single point at uniformly distributed velocities throughout the future null cone (note that this uniform distribution isn't normalized as a probability density, so it can't be use make a selection), we still arrive at a distinguished velocity frame at each point. We could retain perfect Minkowskian symmetry in the presence of matter only by postulating a "super-Milne" cosmology in which every point on some past spacelike slice is an equivalent source of infinitesimal dust particles emanating at all velocities distributed uniformly throughout the respective future null cones of every point. In such a cosmology this same condition would apply on every time-slice, but the density would be infinite, because each point is on the surface of infinitely many null cones, and we would have infinitely dense flow of particles in all directions at every point. Whether this could correspond to any intelligible arrangement of physical entities is unclear. The asymmetry due to the presence of an infinitesimal inertial particle in flat Minkowski spacetime is purely circumstantial, because the spacetime is considered to be unaffected by the presence of this particle. However, according to general relativity, the presence of any inertial entity disturbs the symmetry of the manifold even more profoundly, because it implies an intrinsic curvature of the spacetime manifold, i.e., the manifold takes on an intrinsic shape that distinguishes the location and rest frame of the particle. For a single non-rotating uncharged particle the resulting shape is Schwarzschild spacetime, which obviously exhibits a distinguished center and rest frame (the frame of the central mass). Indeed, this spacetime exhibits a preferred system of coordinates, namely those for which the metric coefficients are independent of the time coordinate. Still, since the field variables of general relativity are the metric coefficients themselves, we are naturally encouraged to think that there is no a priori distinguished system of reference in the physical spacetime described by general relativity, and that it is only the contingent circumstance of a particular distribution of inertial entities that may distinguish any particular frame or state of motion. In other words, it's tempting to think that the spacetime manifold is determined solely by its "contents", i.e., that the left side of Guv = 8πTuv is determined by the right side. However, this is not actually the case (as Einstein and others realized early on), and to understand why, it's useful to review what is involved in actually solving the field equations of general relativity as an initial-value

problem. The ten algebraically independent field equations represented by Guv = 8πTuv involve the values of the ten independent metric coefficients and their first and second derivatives with respect to four spacetime coordinates. If we're given the values of the metric coefficients throughout a 3D spacelike "slice" of spacetime at some particular value of the time coordinate, we can directly evaluate the first and second derivatives of these components with respect to the space coordinates in this "slice". This leaves only the first and second derivatives of the ten metric with respect to the time coordinate as unknown quantities in the ten field equations. It might seem that we could arbitrarily specify the first derivatives, and then solve the field equations for the second derivatives, enabling us to "integrate" forward in time to the next timeslice, and then repeat this process to predict the subsequent evolution of the metric field. However, the structure of the field equations does not permit this, because four of the ten field equations (namely, G0v = 8πT0v with v = 0,1,2,3) contain only the first derivatives with respect to the time coordinate x0, so we can't arbitrarily specify the guv and their first derivatives with respect to x0 on a surface of constant x0. These ten first derivatives, alone, must satisfy the four G0v conditions on any such surface, so before we can even pose the initial value problem, we must first solve this subset of the field equations for a viable set of initial values. Although these four conditions constrain the initial values, they obviously don't fully determine them, even for a given distribution of Tuv. Once we've specified values of the guv and their first derivatives with respect to x0 on some surface of constant x0 in such a way that the four conditions for G0v are satisfied, the four contracted Bianchi identities ensure that these conditions remain satisfied outside the initial surface, provided only that the remaining six equations are satisfied everywhere. However, this leaves only six independent equations to govern the evolution of the ten field variables in the x0 direction. As a result, the second derivatives of the guv with respect to x0 appear to be underdetermined. In other words, given suitable initial conditions, we're left with a four-fold ambiguity. We must arbitrarily impose four more conditions on the system in order to uniquely determine a solution. This was to be expected, because the metric coefficients depend not only on the absolute shape of the manifold, but also on our choice of coordinate systems, which represents four degrees of freedom. Thus, the field equations actually determine an equivalence class of solutions, corresponding to all the ways in which a given absolute metrical manifold can be expressed in various coordinate systems. In order to actually generate a solution of the initial value problem, we need to impose four "coordinate conditions" along with the six "dynamical" field equations. The conditions arise from any proposed system of coordinates by expressing the metric coefficients g0v in terms of these coordinates (which can always be done for any postulated system of coordinates), and then differentiating these four coefficients twice with respect to x0 to give four equations in the second derivatives of these coefficients. Notwithstanding the four-fold ambiguity of the dynamical field equations, which is just a descriptive rather than a substantive ambiguity, it's clear that the manifold is a definite absolute entity, and its overall characteristics and evolution are determined not only by

the postulated Tuv and the field equations, but also by the conditions specified on the initial timeslice. As noted above, these conditions are constrained by the field equations, but are by no means fully determined. We are still required to impose largely arbitrary conditions in order to fix the absolute background spacetime. This state of affairs was disappointing to Einstein, because he recognized that the selection of a set of initial conditions is tantamount to stipulating a preferred class of reference systems, precisely as in Newtonian theory, which is "contrary to the spirit of the relativity principle" (referring presumably to the relational ideas of Mach). As an example, there are multiple distinct vacuum solutions of the field equations, some with gravitational waves and even geons (temporarily) zipping around, and some not. Even more ambiguity arises when we introduce mass, as Gödel showed with his cosmological solutions in which the average mass of the universe is rotating with respect to the spacetime background. These examples just highlight the fact that general relativity can no more dispense with the arbitrary stipulation of a preferred class of reference systems (the inertial systems) than could Newtonian mechanics or special relativity. This is clearly illustrated by Schwarzschild spacetime, which (according to Birkhoff's theorem) is the essentially unique spherically symmetrical solution of the field equations. Clearly this cosmological model, based on a single spherically symmetrical mass in an otherwise empty universe, is "contrary to the spirit of the relativity principle", because as noted earlier there is an essentially unique time coordinate for which the metric coefficients are independent of time. Translation along a vector that leaves the metric formally unchanged is called an isometry, and a complete vector field of isometries is called a Killing vector field. Thus the Schwarzschild time coordinate t constitutes a Killing vector field over the entire manifold, making it a highly distinguished time coordinate, no less than Newton's absolute time. In both special relativity and Newtonian physics there is an infinite class of operationally equivalent systems of reference at any point, but in Schwarzschild spacetime there is an essentially unique global coordinate system with respect to which the metric coefficients are independent of time, and this system is related in a definite way to the inertial class of reference systems at each point. Thus, in the context of this particular spacetime, we actually have a much stronger case for a meaningful notion of absolute rest than we do in Newtonian spacetime or special relativity, both of which rest naively on the principle of inertia, and neither of which acknowledges the possibility of variations in the properties of spacetime from place to place (let alone under velocity transformations). The unique physical significance of the Schwarzschild time coordinate is also shown by the fact that Fermat's principle of least time applies uniquely to this time coordinate. To see this, consider the path of a light pulse traveling through the solar system, regarded as a Schwarzschild geometry centered around the Sun. Naturally there are many different parameterizations and time coordinates that we could apply to this geometry, and in general a timelike geodesic extremizes dτ (not dt for whatever arbitrary time coordinate t we might be using), and of course a spacelike geodesic extremizes ds (again, not dt). However, for light-like paths we have dτ = ds = 0 by definition, so the path is confined to null surfaces, but this is not sufficient to pick out which null path will be followed. So, starting with a line element of the form

where θ and ϕ represent the usual Schwarzschild coordinates, we then set dτ = 0 for light-like paths, which reduces the equation to

This is a perfectly good metrical (not pseudo-metrical) space, with a line element given by dt, and in fact by extremizing (dt)2 we get the paths of light. Note that this only works because gtt, grr , gθθ, gϕϕ all happen to be independent of this time coordinate, t, and also because gtr = gtθ = gtϕ = 0. If and only if all these conditions apply, we reduce to a simple line element of dt on the null surfaces, and Fermat's Principle applies to the parameter t. Thus, in a Schwarzschild universe, this works only when using the essentially unique Schwarzschild coordinates, in which the metric coefficients are independent of the time coordinate. Admittedly the Schwarzschild geometry is a highly simplistic and symmetrical cosmology, but it illustrates how the notion of an absolute rest frame can be more physically meaningful in a relativistic spacetime than in Newtonian spacetime. The spatial configuration of Newton's absolute space is invariant and the Newtonian metric is independent of time, regardless of which member of the inertial class of reference systems we choose, whereas Schwarzschild spacetime is spherically symmetrical and its metric coefficients are independent of time only with respect to the essentially unique Schwarzschild system of coordinates. In other words, Newtonian spacetime is operationally symmetrical under translations and uniform velocities, whereas the spacetime of general relativity is not. The curves and dimples in relativistic spacetime automatically destroy symmetry under translation, let alone velocity. Even the spacetime of special relativity is (marginally) less relational (in the Machian sense) than Newtonian spacetime, because it combines space and time into a single manifold that is only partially ordered, whereas Newtonian spacetime is totally ordered into a continuous sequence of spatial instants. Noting that Newtonian spacetime is explicitly less relational than Galilean spacetime, it can be argued that the actual evolution of spacetime theories historically has been from the purely kinematically relational spacetime of Copernicus to inertial relativity of Galileo and special relativity to the purely absolute spacetime of general relativity. At each stage the meaning of relativity has been refined and qualified. We might suspect that the distinguished "Killing-time" coordinate in the Schwarzschild cosmology is exceptional - in the sense that the manifold was designed to satisfies a very restrictive symmetry condition - and that perhaps more general spacetime manifolds do not exhibit any preferred directions or time coordinates. However, for any specific manifold we must apply some symmetry or boundary conditions sufficient to fix the metrical relations of the manifold, which unavoidably distinguishes one particular system of reference at any given point. For example, in the standard Friedmann models of the

universe there is, at each point in the manifold, a frame of reference with respect to which the rest of the matter and energy in the universe has maximal spherical symmetry, which is certainly a distinguished system of reference. Still, we might imagine that these are just more exceptional cases, and that underneath all these specific examples of relativistic cosmologies that just happen to have strongly distinguished systems of reference there lies a purely relational theory. However, this is not the case. General relativity is not a relational theory of motion. The spacetime manifold in general relativity is an absolute entity, and it's clear that any solution of the field equations can only be based on the stipulation of sufficient constraints to uniquely determine the manifold, up to inertial equivalence, which is precisely the situation with regard to the Newtonian spacetime manifold. But isn't it possible for us to invoke general relativity with very generic boundary conditions that do not commit us to any distinguished frame of reference? What if we simply stipulate asymptotic flatness at infinity? This is typically the approach taken when modeling the solar system or some other actual configuration, i.e., we require that, with a suitable choice of coordinates, the metric tensor approaches the Minkowski metric at spatial infinity. However, as Einstein put it, the specifications of "these boundary conditions presuppose a definite choice of the system of reference". In other words, we must specify a suitable choice of coordinates in terms of which the metric tensor approaches the Minkowski metric, but this specification is tantamount to specifying the absolute spacetime (up to inertial equivalence, as always) in Newtonian physics. The well-known techniques for imposing asymptotic flatness at "conformal infinity", such as discussed by Wald, are not exceptions, because they place only very mild constraints on the field solution in the finite region of the manifold. Indeed, the explicit purpose of such constructions is to establish asymptotic flatness at infinity while otherwise constraining the solution as little as possible, to facilitate the study gravitational waves and other phenomena in the finite region of the manifold. These phenomena must still be "driven" by the imposition of conditions that inevitably distinguish a particular frame of reference at one or more points. Furthermore, to the extent that flatness at conformal infinity succeeds in imposing an absolute reference for gravitational "potential" and the total energy of an isolated system, it still represents an absolute background that has been artificially imposed. Since the condition of flatness at infinity is not sufficient to determine a solution, we must typically impose other conditions. Obviously there are many physically distinct ways in which the metric could approach flatness as a function of radial spatial distance from a given region of interest, and one of the most natural-seeming and common approaches, consistent with local observation, is to assume a spherically symmetrical approach to spatial infinity. This tends to seem like a suitably frame-independent assumption, since spatial spherical symmetry is frame-independent in Newtonian physics. The problem, of course, is that in relativity the concept of spherical symmetry automatically distinguishes a particular frame of reference - not just a class of frames, but one particular frame. For example, if we choose a system of reference that is moving toward Sirius at 0.999999c, the entire distribution of stars and galaxies in the universe is

drastically shrunk (spatially) along that direction, and if we define a spherically symmetrical asymptotic approach to flatness at spatial infinity in these coordinates we will get a physically different result (e.g., for solar system calculations) than if we define a spherically symmetrical asymptotic approach to flatness with respect to a system of coordinates in which the Sun is at rest. It's true that the choice of coordinate systems is arbitrary, but only until we impose physically meaningful conditions on the manifold in terms of those coordinates. Once we do that, our choice of coordinate systems acquires physical significance, because the physical meaning of the conditions we impose is determined largely by the coordinates in terms of which they are expressed, and these conditions physically influence the solution. Of course, we can in principle define any boundary conditions in conjunction with any set of coordinates, i.e., we could take the rest frame of a near-light-speed cosmic particle to work out orbital mechanics of our Solar system by (for example) specifying an asymptotic approach to flatness at spatial infinity in a highly elliptical pattern, but the fact remains that this approach give a uniquely spherical pattern only with respect to the Sun's rest frame. Whenever we pose a Cauchy initial-value problem, the very act of specifying timeslices (a spacelike foliation) and defining a set of physically recognizable conditions on one of these surfaces establishes a distinguished reference system at each point. These individual local frames need not be coherent, nor extendible, nor do we necessarily require them to possess specific isometries, but the fact remains that the general process of actually applying the field equations to an initial-value problem involves the stipulation of a preferred space-time decomposition at each point, since the tangent plane of the timeslice at each point singles out a local frame of reference, and we are assigning physically meaningful conditions to every point on this surface in terms that unavoidably distinguish this frame. More generally, whenever we apply the field equations in any particular situation, whether in the form of an initial-value problem or in some other form, we must always specify sufficient boundary conditions, initial conditions, and/or symmetries to uniquely determine the manifold, and in so doing we are positing an absolute spacetime just as surely (and just as arbitrarily) as Newton did. It's true that the field equations themselves would be compatible with a wide range of different absolute spacetimes, but this ambiguity, from a predictive standpoint, is a weakness rather than a strength of the theory, since, after all, we live in one definite universe, not infinitely many arbitrary ones. Indeed, when taken as a meta-theory in this sense, general relativity does not even give unique predictions for things like the twins paradox, etc, unless the statement of the question includes the specification of the entire cosmological boundary conditions, in which case we're back to a specific absolute spacetime. It was this very realization that led Einstein at one point to the conviction that the universe must be regarded as spatially closed, to salvage at least a semblance of unique for the cosmological solution as a function of the mass energy distribution. (See Section 7.1.) However, the closed Friedmann models are not currently in favor among astronomers, and in any case the relational uniqueness that can be recovered in such a universe is more semantic than substantial.

Moreover, the strategy of trying to obviate arbitrary boundary conditions by selecting a topology without boundaries generally results in a topologically distinguished system of reference at any point. For example, in cylindrical coordinates (assuming the space is everywhere locally Lorentzian) there is only one frame in which the surfaces of simultaneity of an inertial observer coherent. In all other frames, if we follow a surface of simultaneity all the way around the closed dimension we find that it doesn't meet up with itself. Instead, we get a helical pattern (if we picture just a single cylindrical spatial dimension versus time). It may seem that we can disregard peculiar boundary conditions involving waves and so on, but if we begin to rule out valid solutions of the field equations by fiat, then we're obviously not being guided by the theory, but by our prejudices and preferences. Similarly, in order to exclude "unrealistic" cosmological solutions of the field equations we must impose energy conditions, i.e., we find that it's necessary to restrict the class of allowable Tuv tensor fields, but this again is not justified by the field equations themselves, but merely by our wish to force them to give us "realistic" solutions. It would be an exaggeration to say that we get out of the field equations only what we put into them, but there's no denying that a considerable amount of "external" information must be imposed on them in order to give realistic solutions. 7.8 Global Interpretations of Local Experience How are our customary ideas of space and time related to the character of our experiences? ... It seems to me that Poincare clearly recognized the truth in the account he gave in his book "La Science et l'Hypothese". Albert Einstein, 1921

The standard interpretation of general relativity entails a conceptual framework consisting of primary entities - such as particles and non-gravitational fields - embedded in an extensive differentiable manifold of space and time. The theory is presented in the form of differential equations, interpreted as giving a description of the local metrical properties of the manifold around any specific point, but most of the observable predictions of the theory derive not from local results, per se, but from the inferred global structure generated by analytically continuing the solution over an extended region. From these extended solutions we infer configurations and motions of distant objects (fields and particles), from which we derive predictions about observable interactions. Does the totality of the observable interactions compels us to adopt this standard interpretation, or might the same pattern of experiences be explainable within some other, possibly quite different, conceptual framework? In one sense the answer to this question is obvious. We can always accommodate any sequence of perceptions within an arbitrary ontology merely by positing a suitable theory of appearances separate from our presumed ontology. This approach goes back to ancient philosophers such as Parmenides, who taught that motion, change, and even plurality are merely appearances, while the reality is an unchanging unity. Although this strikes many people as outlandish, we're all familiar with the appearances of motion, change, and

plurality in our own personal dreams while we are "really" motionless and alone. We can even achieve a similar separation of perception and reality in computer-generated "virtual reality simulations", in which various sense impressions of sight and sound are generated to create an appearance that is starkly different from the underlying physical situation. Due to technical limitations, such simulations may not be very realistic (at the moment), but in principle they could be made arbitrarily realistic, and clearly there need be no direct correspondence between the topology of the virtual world of appearances and the actual world of external physical objects. When confronted with examples like this, people who believe there is only one true interpretation of the corporeal operations compatible with our experiences tend to be dismissive, as if such examples are frivolous and unworthy of consideration. It's true that a purely solipsistic approach to the interpretation of experiences is somewhat repugnant, and need not be taken too seriously, it nevertheless serves to remind us (if we needed reminding) that the link between our sense perceptions and the underlying external structure is always ambiguous, and any claim that our experiences do (or can) uniquely single out an ontology is patently false. There is always a degree of freedom in the selection of our model of the presumed external objective reality. In more serious models we usually assume that the processes of perception are "of the same kind" as the external processes that we perceive, but we still bifurcate our models into two parts, consisting of (1) an individual's sense impressions and interior experiences, such as thoughts and dreams, and (2) a class of objective exterior entities and events, of which only a small subset correspond to any individual's direct perceptions. Even within this limited class of models, the task of inferring (2) from (1) is not trivial, and there is certainly no a priori requirement that a given set of local experiences uniquely determines a particular global embedding. For the purposes of this discussion we will focus on the ambiguity class for external models that are consistent with the predictions of general relativity, reduced to the actual sense impressions. These considerations are complicated by the fact that the field equations of general relativity, by themselves, permit a very wide range of global solutions if no restrictions are placed on the type of boundary conditions, initial values, and energy conditions that are allowed, but most of these solutions are (presumably) unphysical. As Einstein said, "A field theory is not yet completely determined by the system of field equations". In order to extract realistic solutions (i.e., solutions consistent with our experiences) from the field equations we must impose some constraints on the boundary and energy conditions. In this sense the field equations do not represent a complete theory, because these restrictions cannot be inferred from the field equations, but are auxiliary assumptions that must simply be imposed on the basis of external considerations. This incompleteness is a characteristic of any physical law that is expressed as a set of differential equations, because such equations generally possess a vast range of possible formal solutions, and require one or more external principle or constraint to yield definite results. The more formal flexibility that our theory possesses, the more inclined we are to ask whether the actual physical content of the theory is contained in the rational "laws" or

the circumstantial conditions that we impose. For example, consider a theory consisting of the assertion that certain aspects of our experience can be modeled by means of a suitable Turing machine with suitable initial data. This is a very flexible theoretical framework, since by definition anything that is computable can be computed from some initial data using a suitable Turing machine. Such a theory undeniably yields all applicable and computable results, but of course it also (without further specification) encompasses infinitely many inapplicable results. An ideal theoretical framework would be capable of representing all physical phenomena, but no unphysical phenomena. This is just an expression of the physicist's desire to remove all arbitrariness from the theory. However, as the general theory of relativity stands at present, it does not yield unique predictions about the overall global shape of the manifold. Instead, it simply imposes certain conditions on the allowable shapes. In this sense we can regard general relativity as a meta-theory, rather than a specific theory. So, when considering the possibility of alternative interpretations (or representations) of general relativity, we need to decide whether we are trying to find a viable representation of all possible theories that reside within the meta-theory of general relativity, or whether we are trying to find a viable representation of just a single theory that satisfies the requirements of general relativity. The physicist might answer that we need only seek representations that conform with those aspects of general relativity that have been observationally verified, whereas a mathematician might be more interested in whether there are viable alternative representations of the entire meta-theory. First we should ask whether there are any viable interpretations of general relativity as a meta-theory. This is a serious question, because the usual criterion for viability is that the candidate interpretation permits us to analytically continue all worldlines without leading to any singularities or physical infinities. In other words, an interpretation is considered to be not viable if the representation "breaks down" at some point due to an inability to diffeomorphically continue the solution within that representation. The difficulty here is that even the standard interpretation of general relativity in terms of curved spacetime leads, in some circumstances, to inextendible worldlines and singularities in the field. Thus if we take the position that such attributes are disqualifying, then it follows that even the standard interpretation of general relativity in terms of an extended spacetime manifold is not viable. One approach to salvaging the geometrical interpretation is to adopt, as an additional feature of the theory, the principle that the manifold must be free of singularities and infinities. Indeed this was the approach that Einstein often suggested. He wrote It is my opinion that singularities must be excluded. It does not seem reasonable to me to introduce into a continuum theory points (or lines, etc.) for which the field equations do not hold... Without such a postulate the theory is much too vague.

He even hoped that the exclusion of singularities might (somehow) lead to an understanding of atomistic and quantum phenomena within the context of a continuum theory, although he acknowledged that he couldn't say how this might come about. He believed that the difficulty of determining exact singularity-free global solutions of non-

linear field equations prevents us from assessing the full content of a non-linear field theory such as general relativity. (He recognized that this was contrary to the prevailing view that a field theory can only be quantized by first being transformed into a statistical theory of field probabilities, but he regarded this as "only an attempt to describe relationships of an essentially nonlinear character by linear methods".) Another approach, more in the mainstream of current thought, is to simply accept the existence of singularities, and decline to consider them as a disqualifying feature of an interpretation. According to theorems of Penrose, Hawking, and others, it is known that the existence of a trapped surface (such as the event horizon of a black hole) implies the existence of inextendible worldlines, provided certain energy conditions are satisfied and we exclude closed timelike curves. Therefore, a great deal of classical general relativity and its treatment of black holes, etc., is based on the acceptance of singularities in the manifold, although this is often accompanied with a caveat to the effect that in the vicinity of a singularity the classical field equations may give way to quantum effects. In any case, since the field equations by themselves undeniably permit solutions containing singularities, we must either impose some external constraint on the class of realistic solutions to exclude those containing singularities, or else accept the existence of singularities. Each of these choices has implications for the potential viability of alternative interpretations. In the first case we are permitted to restrict the range of solutions to be represented, which means we really only need to seek representations of specific theories, rather than of the entire meta-theory represented by the bare field equations. In the second case we need not rule out interpretations based on the existence of singularities, inextendible worldlines, or other forms of "bad behavior". To illustrate how these considerations affect the viability of alternative interpretations, suppose we attempt to interpret general relativity in terms of a flat spacetime combined with a universal force field that distorts rulers and clocks in just such a way as to match the metrical relations of a curved manifold in accord with the field equations. It might be argued that such a flat-spacetime formulation of general relativity must fail at some point(s) to diffeomorphically map to the corresponding curved-manifold if the latter possesses a non-trivial global topology. For example, the complete surface of a sphere cannot be mapped diffeomorphically to the plane. By means of sterographic projection from the North Pole of a sphere to a plane tangent to the South Pole we can establish a diffeomorphic mapping to the plane of every point on the sphere except the North Pole itself, which maps to a "point at infinity". This illustrates the fact that when mapping between two topologically distinct manifolds such as the plane and the surface of a sphere, there must be at least one point where the mapping is not well-behaved. However, this kind of objection fails to rule out physically viable alternatives to the curved spacetime interpretation (assuming any viable interpretation exists), and for several reasons. First, we may question whether the mapping between the curved spacetime and the alternative manifold needs to be everywhere diffeomorphic. Second, even if we accede to this requirement, it's important to remember that the global topology of a manifold is sensitive to pointwise excisions. For example, although it is not possible

to diffeomorphically map the complete sphere to the plane, it is possible to map the punctured sphere, i.e., the sphere minus one point (such as the North Pole in the sterographic projection scheme). We can analytically continue the mapping to include this point by simply adding a "point at infinity" to the plane - without giving the extended plane intrinsic curvature. Of course, this interpretation does entail a singularity at one point, where the universal field must be regarded as infinitely strong, but if we regard the potential for physical singularities as disqualifying, then as noted above we have no choice but to allow the imposition of some external principles to restrict the class of solutions to global manifolds that are everywhere "well-behaved". If we also disallow this, then as discussed above there does not exist any viable interpretation of general relativity. Once we have allowed this, we can obviously posit a principle to the effect that only global manifolds which can be diffeomorphically mapped to a flat spacetime are physically permissible. Such a principle is no more in conflict with the field equations than are any of the wellknown "energy conditions", the exclusion of closed timelike loops, and so on. Believers in one uniquely determined interpretation may also point to individual black holes, whose metrical structure of trapped surfaces cannot possibly be mapped to flat spacetime without introducing physical singularities. This is certainly true, but according to theorems of Penrose and Hawking it is precisely the circumstance of a trapped surface that commits the curved-spacetime formulation itself to a physical singularity. In view of this, we are hardly justified in disqualifying alternative formulations that entail physical singularities in exactly the same circumstances. Another common objection to flat interpretations is that even for a topologically flat manifold like the surface of a torus it is impossible to achieve the double periodicity of the closed torriodal surface, but this objection can also be countered, simply by positing a periodic flat universe. Admittedly this commits us to distant correlations, but such things cannot be ruled out a priori (and in fact distant correlations do seem to be a characteristic of the universe from the standpoint of quantum mechanics, as discussed in Section 9). More generally, as Poincare famously summarized it, we can never observe our geometry G in a theory-free sense. Every observation we make relies on some prior conception of physical laws P which specify how physical objects behave with respect to G. Thus the universe we observe is not G, but rather U = G + P, and for any given G we can vary P to give the observed U. Needless to say, this is just a simplified schematic of the full argument, but the basic idea is that it's simply not within the power of our observations to force one particular geometry upon us (nor even one particular topology), as the only possible way in which we could organize our thoughts and perceptions of the world. We recall Poincare's famous conventionalist dictum "No geometry is more correct than any other - only more convenient". Those who claim to "prove" that only one particular model can be used to represent our experience would do well to remember John Bell's famous remark that the only thing "proved" by such proofs is lack of imagination. The interpretation of general relativity as a field theory in a flat background spacetime

has a long history. This approach was explored by Feynman, Deser, Weinberg, and others at various times, partly to see if it would be possible to quantize the gravitational field in terms of a spin-2 particle, following the same general approach that was successful in quantizing other field theories. Indeed, Weinberg's excellent "Gravitation and Cosmology" (1972) contained a provocative paragraph entitled "The Geometric Analogy", in which he said Riemann introduced the curvature tensor Rµναβ to generalize the [geometrical] concept of curvature to three or more dimensions. It is therefore not surprising that Einstein and his successors have regarded the effects of a gravitational field as producing a change in the geometry of space and time. At one time it was even hoped that the rest of physics could be brought into a geometric formulation, but this hope has met with disappointment, and the geometric interpretation of the theory of gravitation has dwindled to a mere analogy, which lingers in our language in terms like "metric", "affine connection", and "curvature", but is not otherwise very useful. The important thing is to be able to make predictions about the images on the astronomer's photographic plates, frequencies of spectral lines, and so on, and it simply doesn't matter whether we ascribe these predictions to the physical effect of a gravitational field on the motion of planets and photons or to a curvature of space and time.

The most questionable phrase here is the claim that, aside from providing some useful vocabulary, the geometric analogy "is not otherwise very useful". Most people who have studied general relativity have found the geometric analogy to be quite useful as an aid to understanding the theory, and Weinberg can hardly have failed to recognize this. I suspect that what he meant (in context) is that the geometric framework has not proven to be very useful in efforts to unify gravity with the rest of physics. The idea of "bringing the rest of physics into a geometric formulation" refers to attempts to account for the other forces of nature (electromagnetism, strong, and weak) in purely geometrical terms as attributes of the spacetime manifold, as Einstein did for gravity. In other words, eliminate the concept of "force" entirely, and show that all motion is geodesic in some suitably defined spacetime manifold. This is what is traditionally called a "unified field theory", and led to Weyl's efforts in the 20's, and the Kluza-Klein theories, and Einstein's anti-symmetric theories, and so on. As Weinberg said, those hopes have (so far) met with disappointment. Of course, in another sense, one could say that all of physics has been subsumed by the geometric point of view. We can obviously describe baseball, music, thermodynamics, etc., in geometrical terms, but that isn't the kind of geometrizing that is being discussed here. Weinberg was referring to attempts to make the space-time manifold itself account for all the "forces" of nature, as Einstein had made it account for gravity. Quantum field theory works on a background of space-time, but posits other ingredients on top of that to represent the fields. Obviously we're free to construct a geometrical picture in our minds of any gauge theory, just as we can form a geometrical picture in any arbitrary kind of "space", e.g., the phase space of a system, but this is nothing like what Einstein, Weyl, Kaluza, Weinberg, etc. were talking about. The original (and perhaps naive) hope was to eliminate all other fields besides the metric field of the spacetime manifold itself, to reduce physics to this one primitive entity (and its metric). It's clear that (1) physics has not been geometrized in the sense that Weinberg was talking about, viz, with the spacetime metric being the only ontological entity, and (2) in point of fact, some significant progress toward the unification of the other "forces" of nature has indeed been

made by people (such as Weinberg himself) who did so without invoking the geometric analogy. Many scholars have expressed similar views to those of Poincare and Weinberg regarding the essential conventionality of geometry. In considering the question "Is Spacetime Curved?" Ian Roxburgh described the curved and flat interpretations of general relativity, and concluded that "the answer is yes or no depending on the whim of the answerer. It is therefore a question without empirical content, and has no place in physical inquiry." Thus he agreed with Poincare that our choice of geometry is ultimately a matter of convenience. Even if we believe that general relativity is perfectly valid in all regimes (which most people doubt), it's still possible to place a non-geometric interpretation on the "photographic plates and spectral lines" if we choose. The degree of "inconvenience" is not very great in the weak-field limit, but becomes more extreme if we're thinking of crossing event horizons or circumnavigating the universe. Still, we can always put a nongeometrical interpretation onto things if we're determined to do so. (Ironically, the most famous proponent of the belief that the geometrical view is absolutely essential, indeed a sine qua non of rational thought, was Kant, because the geometry he espoused so confidently was non-curved Euclidean space.) Even Kip Thorne, who along with Misner and Wheeler wrote the classic text Gravitation espousing the geometric viewpoint, admits that he was once guilty of curvature chauvinism. In his popular book "Black Holes and Time Warps" he writes Is spacetime really curved? Isn't it conceivable that spacetime is actually flat, but the clocks and rulers with which we measure it... are actually rubbery? Wouldn't... distortions of our clocks and rulers make truly flat spacetime appear to be curved? Yes.

Thorne goes on to tell how, in the early 1970's, some people proposed a membrane paradigm for conceptualizing black holes. He says When I, as an old hand at relativity theory, heard this story, I thought it ludicrous. General relativity insists that, if one falls into a black hole, one will encounter nothing at the horizon except spacetime curvature. One will see no membrane and no charged particles... the membrane theory can have no basis in reality. It is pure fiction. The cause of the field lines bending, I was sure, is spacetime curvature, and nothing else... I was wrong.

He goes on to say that the laws of black hole physics, written in accord with the membrane interpretation, are completely equivalent to the laws of the curved spacetime interpretation (provided we restrict ourselves to the exterior of black holes), but they are each heuristically useful in different circumstances. In fact, after he got past thinking it was ludicrous, Thorne spent much of the 1980's exploring the membrane paradigm. He does, however, maintain that the curvature view is better suited to deal with interior solutions of black holes, but isn't not clear how strong a recommendation this really is, considering that we don't really know (and aren't likely to learn) whether those interior solutions actually correspond to facts. Feynman’s lectures on gravitation, written in the early 1960’s, present a field-theoretic approach to gravity, while also recognizing the viability of Einstein’s geometric

interpretation. Feynman described the thought process by which someone might arrive at a theory of gravity mediated by a spin-two particle in flat spacetime, analagous to the quantum field theories of the other forces of nature, and then noted that the resulting theory possesses a geometrical interpretation. It is one of the peculiar aspect of the theory of gravitation that is has both a field interpretation and a geometrical interpretation… the fact is that a spin-two field has this geometrical representation; this is not something readily explainable – it is just marvelous. The geometric interpretation is not really necessary or essential to physics. It might be that the whole coincidence might be understood as representing some kind of gauge invariance. It might be that the relationships between these two points of view about gravity might be transparent after we discuss a third point of view, which has to do with the general properties of field theories under transformations…

He goes on to discuss the general notion of gauge invariance, and concludes that “gravity is that field which corresponds to a gauge invariance with respect to displacement transformations”. One potential source of confusion when discussing this issue is the fact that the local null structure of Minkowski spacetime makes it locally impossible to smoothly mimic the effects of curved spacetime by means of a universal force. The problem is that Minkowski spacetime is already committed to the geometrical interpretation, because it identifies the paths of light with null geodesics of the manifold. Putting this together with some form of the equivalence principle obviously tends to suggest the curvature interpretation. However, this does not rule out other interpretations, because there are other possible interpretations of special relativity - notably Lorentz's theory - that don't identify the paths of light with null geodesics. It's worth remembering that special relativity itself was originally regarded as simply an alternate interpretation of Lorentz's theory, which was based on a Galilean spacetime, with distortions in both rulers and clocks due to motion. These two theories are experimentally indistinguishable - at least up to the implied singularity of the null intervals. In the context of Galilean spacetime we could postulate gravitational fields affecting the paths of photons, the rates of physical clocks, and so on. Of course, in this way we arrive at a theory that looks exactly like curved spacetime, but we interpret the elements of our experience differently. Since (in this interpretation) we believe light rays don't follow null geodesic paths (and in fact we don't even recognize the existence of null geodesics) in the "true" manifold under the influence of gravity, we aren't committed to the idea that the paths of light delineate the structure of the manifold. Thus we'll agree with the conventional interpretation about the structure of light cones, but not about why light cones have that structure. Of course, at some point any flat manifold interpretation will encounter difficulties in continuing its worldlines in the presence of certain postulated structures, such as black holes. However, as discussed above, the curvature interpretation is not free of difficulties in these circumstances either, because if there exists a trapped surface then there also exist non-extendable timelike or null geodesics for the curvature interpretation. So, the (arguably) problematical conditions for a "flat space" interpretation are identical to the problematical conditions for the curvature interpretation. In other words, if we posit the existence of trapped surfaces, then it's disingenuous for us to impugn the robustness of flat space interpretations in view of the fact that these same circumstances commit the

curvature interpretation to equally disquieting singularities. It may or may not be the case that the curvature interpretation has a longer reach, in the sense that it's formally extendable inside the Schwarzschild radius, but, as noted above, the physicality of those interior solutions is not (and probably never will be) subject to verification, and they are theoretically controversial even within the curvature tradition itself. Also, the simplistic arguments proposed in introductory texts are easily seen to be merely arguments for the viability of the curvature interpretation, even though they are often mis-labeled as arguments for the necessity of it. There's no doubt that the evident universality of local Lorentz covariance, combined with the equivalence principle, makes the curvature interpretation eminently viable, and it's probably the "strongest" interpretation of general relativity in the sense of being exposed most widely to falsification in principle, just as special relativity is stronger than Lorentz’s ether theory. The curvature interpretation has certainly been a tremendous heuristic aid (maybe even indispensable) to the development of the theory, but the fact remains that it isn't the only possible interpretation. In fact, many (perhaps most) theoretical physicists today consider it likely that general relativity is really just an approximate consequence of some underlying structure, similar to how continuum fluid mechanics emerges from the behavior of huge numbers of elementary particles. As was rightly noted earlier, much of the development of particle physics and more recently string theory has been carried out in the context of rather naive-looking flat backgrounds. Maybe Kant will be vindicated after all, and it will be shown that humans really aren't capable of conceiving of the fundamental world on anything other than a flat geometrical background. If so, it may tell us more about ourselves than about the world. Another potential source of confusion is the tacit assumption on the part of some people that the topology of our experiences is unambiguous, and this in turn imposes definite constraints on the geometry via the Gauss-Bonnet theorem. Recall that for any twodimensional manifold M the Euler characteristic is a topological invariant defined as

where V, E, and F denote the number of vertices, edges, and faces respectively of any arbitrary triangulation of the entire surface. Extending the work that Gauss had done on the triangular excess of curves surfaces, Bonnet proved in 1858 the beautiful theorem that the integral of the Gaussian curvature K over the entire area of the manifold is proportional to the Euler characteristic, i.e.,

More generally, for any manifold M of dimension n the invariant Euler characteristic is

where νk is the number of k-simplexes of an arbitrary "triangulation" of the manifold. Also, we can let Kn denote the analog of the Gaussian curvature K for an n-dimensional manifold, noting that for hypersurfaces this is just the product of the n principal extrinsic curvatures, although like K it has a purely intrinsic significance for arbitrary embeddings. The generalized Gauss-Bonnet theorem is then

where V(Sn) is the "volume" of a unit n-sphere. Thus if we can establish that the topology of the overall spacetime manifold has a non-zero Euler characteristic, it will follow that the manifold must have non-zero metrical curvature at some point. Of course, the converse is not true, i.e., the existence of non-zero metrical curvature at one or more points of the manifold does not imply non-zero Euler characteristic. The two-dimensional surface of a torus with the usual embedding in R3 not only has intrinsic curvature but is topologically distinct from R2, and yet (as discussed in Section 7.5) it can be mapped diffeomorphically and globally to an everywhere-flat manifold embedded in R4. This illustrates the obvious fact that while topological invariants impose restrictions on the geometry, they don't uniquely determine the geometry. Nevertheless, if a non-zero Euler characteristic is stipulated, it is true that any diffeomorphic mapping of this manifold must have non-zero curvature at some point. However, there are two problems with this argument. First, we need not be limited to diffeomorphic mappings from the curved spacetime model, especially since even the curvature interpretation contains singularities and physical infinities in some circumstances. Second, the topology is not stipulated. The topology of the universe is a global property which (like the geometry) can only be indirectly inferred from local experiences, and the inference is unavoidably ambiguous. Thus the topology itself is subject to re-interpretation, and this has always been recognized as part-and-parcel of any major shift in geometrical interpretation. The examples that Poincare and others talked about often involved radical re-interpretations of both the geometry and the topology, such as saying that instead of a cylindrical dimension we may imagine an unbounded but periodic dimension, i.e., identical copies placed side by side. Examples like this aren't intended to be realistic (necessarily), but to convey just how much of what we commonly regard as raw empirical fact is really interpretative. We can always save the appearances of any particular apparent topology with a completely different topology, depending on how we choose to identify or distinguish the points along various paths. The usual example of this is a cylindrical universe mapped to an infinite periodic universe. Therefore, we cannot use topological arguments to prove anything about the geometry. Indeed these considerations merely extend the degrees of freedom in Poincare's conventionalist formula, from U = G + P to U = (G + T) + P, where T represents topology. Obviously the metrical and topological models impose consistency conditions on each other, but the two of them combined do not constrain U any more than G alone, as long as the physical laws P remain free.

Of course, there may be valid reasons for preferring not to avail ourselves of any of the physical assumptions (such as a "universal force", let alone multiple copies of regions, etc.) that might be necessary to map general relativity to a flat manifold in various (extreme) circumstances, such as in the presence of trapped surfaces or other "pathological" topologies, but these are questions of convenience and utility, not of feasibility. Moreover, as noted previously, the curvature interpretation itself entails inextendable worldlines as soon as we posit a trapped surface, so topological anomalies hardly give an unambiguous recommendation to the curvature interpretation. The point is that we can always postulate a set of physical laws that will make our observations consistent with just about any geometry we choose (even a single monadal point!), because we never observe geometry directly. We only observe physical processes and interactions. Geometry is inherently an interpretative aspect of our understanding. It may be that one particular kind of geometrical structure is unambiguously the best (most economical, most heuristically robust, most intuitively appealing, etc), and any alternative geometry may require very labored and seemingly ad hoc "laws of physics" to make it compatible with our observations, but this simply confirms Poincare's dictum that no geometry is more true than any other - only more convenient. It may seem as if the conventionality of geometry is just an academic fact with no real applicability or significance, because all the examples of alternative interpretations that we've cited have been highly trivial. For a more interesting example, consider a mapping (by radial projection) from an ordinary 2-sphere to a circumscribed polyhedron, say a dodecahedron. With the exception of the 20 vertices, where all the "curvature" is discretely concentrated, the surface of the dodecahedron is perfectly flat, even along the edges, as shown by the fact that we can "flatten out" two adjacent pentagonal faces on a plane surface without twisting or stretching the surfaces at all. We can also flatten out a third pentagonal face that joins the other two at a given vertex, but of course (in the usual interpretation) we can't fit in a fourth pentagon at that vertex, nor do three quite "fill up" the angular range around a vertex in the plane. At this stage we would conventionally pull the edges of the three pentagons together so that the faces are no longer coplanar, but we could also go on adjoining pentagonal surfaces around this vertex, edge to edge, just like a multi-valued "Riemann surface" winding around a pole in the complex plane. As we march around the vertex, it's as if we are walking up a spiral staircase, except that all the surfaces are laying perfectly flat. This same "spiral staircase" is repeated at each vertex of the solid. Naturally we can replace the dodecahedron with a polyhedron having many more vertices, but still consisting of nothing but flat surfaces, with all the "curvature" distributed discretely at a huge number of vertices, each of which is a "pole" of an infinite spiral staircase of flat surfaces. This structure is somewhat analogous to a "no-collapse" interpretation of quantum mechanics, and might be called a "no-curvature" interpretation of general relativity. At each vertex (cf. measurement) we "branch" into on-going flatness across the edge, never actually "collapsing" the faces meeting at a vertex into a curved structure. In essence the manifold has zero Euler characteristic, but it exhibits a nonvanishing Euler characteristic modulo the faces of the polyhedron. Interestingly, the term

"branch" is used in multi-valued Riemann surfaces just as it's used in some descriptions of the "no-collapse" interpretation of quantum mechanics. Also, notice that the non-linear aspects of both theories are (arguably) excised by this maneuver, leaving us "only" to explain how the non-linear appearances emerge from this aggregate, i.e., how the different moduli are inter-related. To keep track of a particle we would need its entire history of "winding numbers" for each vertex of the entire global manifold, in the order that it has encountered them (because it's not commutative), as well as it's nominal location modulo the faces of the polyhedron. In this model the full true topology of the universe is very different from the apparent topology modulo the polyhedral structure, and curvature is non-existent on the individual branches, because every time we circle a non-flat point we simply branch to another level (just as in some of the no-collapse interpretations of quantum mechanics the state sprouts a new branch, rather than collapsing, each time an observation is made). Each time a particle crosses an edge between two vertices it's set of winding numbers is updated, and we end up with a combinatorial approach, based on a finite number of discrete poles surrounded by infinitely proliferating (and everywhere-flat) surfaces. We can also arrange for the spiral staircases to close back on themselves after a suitable number of windings, while maintaining a vanishing Euler characteristic. For a less outlandish example of a non-trivial alternate interpretation of general relativity, consider the "null surface" interpretation. According to this approach we consider only the null surfaces of the traditional spacetime manifold. In other words, the only intervals under consideration are those such that gµν dxµ dxν = 0. Traditional timelike paths are represented in this interpretation by zigzag sequences of lightlike paths, which can be made to approach arbitrarily closely to the classical timelike paths. The null condition implies that there are really only three degrees of freedom for motion from any given point, because given any three of the increments dx0, dx1, dx2, and dx3, the corresponding increment of the fourth automatically follows (up to sign). The relation between this interpretation and the conventional one is quite similar to the relation between special relativity and Lorentz's ether theory. In both cases we can use essentially the same equations, but whereas the conventional interpretation attributes ontological status to the absolute intervals dt, the null interpretation asserts that those absolute intervals are ultimately superfluous conventionalizations (like Lorentz's ether), and encourages us to dispense with those elements and focus on the topology of the null surfaces themselves. 8.1 Kepler, Napier, and the Third Law There is special providence in the fall of a sparrow. Shakespeare By the year 1605 Johannes Kepler, working with the relativistic/inertial view of the solar system suggested by Copernicus, had already discerned two important mathematical regularities in the orbital motions of the planets:

I. Planets move in ellipses with the Sun at one focus. II. The radius vector describes equal areas in equal times. This shows the crucial role that interpretations and models sometimes play in the progress of science, because it's obvious that these profoundly important observations could never even have been formulated in terms of the Ptolemaic earth-centered model. Oddly enough, Kepler arrived at these conclusions in reverse order, i.e., he first determined that the radius vector of a planet's "oval shaped" path sweeps out equal areas in equal times, and only subsequently determined that the "ovals" were actually ellipses. It's often been remarked that Kepler's ability to identify this precise shape from its analytic properties was partly due to the careful study of conic sections by the ancient Greeks, particularly Apollonius of Perga, even though this study was conducted before there was even any concept of planetary orbits. Kepler's first law is often cited as an example of how purely mathematical ideas (e.g., the geometrical properties of conic sections) can sometimes find significant applications in the descriptions of physical phenomena. After painstakingly extracting the above two "laws" of planetary motion (first published in 1609) from the observational data of Tycho Brahe, there followed a period of more than twelve years during which Kepler exercised his ample imagination searching for any further patterns or regularities in the data. He seems to have been motivated by the idea that the orbits of the planets must satisfy a common set of simple mathematical relations, analogous to the mathematical relations which the Pythagoreans had discovered between harmonious musical tones. However, despite all his ingenious efforts during these years, he was unable to discern any significant new pattern beyond the two empirical laws which he had found in 1605. Then, as Kepler later recalled, on the 8th of March in the year 1618, something marvelous "appeared in my head". He suddenly realized that III. The proportion between the periodic times of any two planets is precisely one and a half times the proportion of the mean distances. In the form of a diagram, his insight looks like this:

At first it may seem surprising that it took a mathematically insightful man like Kepler over twelve years of intensive study to notice this simple linear relationship between the logarithms of the orbital periods and radii. In modern data analysis the log-log plot is a standard format for analyzing physical data. However, we should remember that logarithmic scales had not yet been invented in 1605. A more interesting question is why, after twelve years of struggle, this way of viewing the data suddenly "appeared in his head" early in 1618. (By the way, Kepler made some errors in the calculations in March, and decided the data didn't fit, but two months later, on May 15 the idea "came into his head" again, and this time he got the computations right.) Is it just coincidental that John Napier's "Mirifici Logarithmorum Canonis Descripto" (published in 1614) was first seen by Kepler towards the end of the year 1616? We know that Kepler was immediately enthusiastic about logarithms, which is not surprising, considering the masses of computation involved in preparing the Rudolphine Tables. Indeed, he even wrote a book of his own on the subject in 1621. It's also interesting that Kepler initially described his "Third Law" in terms of a 1.5 ratio of proportions, exactly as it would appear in a log-log plot, rather than in the more familiar terms of squared periods and cubed distances. It seems as if a purely mathematical invention, namely logarithms, whose intent was simply to ease the burden of manual arithmetical computations, may have led directly to the discovery/formulation of an important physical law, i.e., Kepler's third law of planetary motion. (Ironically, Kepler's academic mentor, Michael Maestlin, chided him - perhaps in jest? - for even taking an interest in logarithms, remarking that "it is not seemly for a professor of mathematics to be childishly pleased about any shortening of the calculations".) By the 18th of May, 1618, Kepler had fully grasped the logarithmic pattern in the planetary orbits: Now, because 18 months ago the first dawn, three months ago the broad daylight, but a very few days ago the full Sun of a most highly remarkable spectacle has risen, nothing holds me back.

It's interesting to compare this with Einstein's famous comment about "...years of anxious searching in the dark, with their intense longing, the final emergence into the light--only those who have experienced it can understand it". Kepler announced his Third Law in Harmonices Mundi, published in 1619, and also included it in his "Ephemerides" of 1620. The latter was actually dedicated to Napier, who had died in 1617. The cover illustration showed one of Galileo's telescopes, the figure of an elliptical orbit, and an allegorical female (Nature?) crowned with a wreath consisting of the Naperian logarithm of half the radius of a circle. It has usually been supposed that this work was dedicated to Napier in gratitude for the "shortening of the calculations", but Kepler obviously recognized that it went deeper than this, i.e., that the Third Law is purely a logarithmic harmony. In a sense, logarithms played a role in Kepler's formulation of the Third Law analogous to the role of Apollonius' conics in his discovery of the First Law, and with the role that tensor analysis and Riemannian geometry played in Einstein's development of the field equations of general relativity. In each of these cases we could ask whether the mathematical structure provided the tool with which the scientist was able to describe some particular phenomenon, or whether the mathematical structure effectively selected an aspect of the phenomena for the scientist to discern. Just as we can trace Kepler's Third Law of planetary motion back to Napier's invention of logarithms, we can also trace Napier's invention back to even earlier insights. It's no accident that logarithms have applications in the description of Nature. Indeed in his introduction to the tables, Napier wrote A logarithmic table is a small table by the use of which we can obtain a knowledge of all geometrical dimensions and motions in space... The reference to motions in space is very appropriate, because Napier originally conceived of his "artificial numbers" (later renamed logarithms, meaning number of the ratio) in purely kinematical terms. In fact, his idea can be expressed in a form that Zeno of Elea would have immediately recognized. Suppose two runners leave the starting gate, travelling at the same speed, and one of them maintains that speed, whereas the speed of the other drops in proportion to his distance from the finish line. The closer the second runner gets to the finish line, the slower he runs. Thus, although he is always moving forward, the second runner never reaches the finish line. As discussed in Section 3.7, this is exactly the kind of scenario that Zeno exploited to illustrate paradoxes of motion. Here, 2000 years later, we find Napier making very different use of it, creating a continuous mapping from the real numbers to his "artificial numbers". With an appropriate choice of units we can express the position x of the first runner as a function of time by x(t) = t, and the position X of the second runner is defined by the differential equation dX/dt = 1  X where the position "1" represents the finish line. The solution of this equation is X(t) = 1  et, where ex is the function that equals its own derivative. Then Napier defined x(t) as the "logarithm" of 1  X(t), which is to say, he defined t as the "logarithm" of et. Of course, the definition of logarithm was subsequently revised so

that we now define t as the logarithm of et, the latter being the function that equals its own derivative. The logarithm was one of many examples throughout history of ideas that were "in the air" at a certain time. It had been known since antiquity that the exponents of numbers in a geometric sequence are additive when terms are multiplied together, i.e., we have anam = a(m+n). In fact, there are ancient Babylonian tablets containing sequences of powers and problems involving the determination of the exponents of given numbers. In the 1540's Stifel's "Arithmetica integra" included tables of the successive powers of numbers, which was very suggestive for Napier and others searching for ways to reduce the labor involved in precise manual computations. In the 1580's Viete derived several trigonometric formulas such as

If we have a table of cosine values this formula enables us to perform multiplication simply by means of addition. For example, to find the product of 0.7831 and 0.9348 we can set cos(x) = 0.7831 and cos(y) = 0.9348 and then look up the angles x,y with these cosines in the table. We find x = 0.67116 and y = 0.36310, from which we have the sum x+y = 1.03426 and the difference xy = 0.30806. The cosines of the sum and difference can then be looked up in the table, giving cos(x+y) = 0.51116 and cos(x-y) = 0.95292. Half the sum of these two numbers equals the product 0.73204 of the original two numbers. This technique was called prosthaphaeresis (the Greek word for addition and subtraction), and was quickly adopted by scientists such as the Dane Tycho Brahe for performing astronomical calculations. Of course, today we recognize that the above formula is just a disguised version of the simple exponent addition rule, noting that cos(x) = (eix + e-ix)/2. At about this same time (1594), John Napier was inventing his logarithms, whose purpose was also to reduce multiplication and division to simple addition and subtraction by means of a suitable transformation. However, Napier might never have set aside his anti-Catholic polemics to work on producing his table of logarithms had it not been for an off-hand comment made by Dr. John Craig, who was the physician to James VI of Scotland (later James I of England and Ireland). In 1590 Craig accompanied James and his entourage bound for Norway to meet his prospective bride Anne, who was supposed to have journeyed from Denmark to Scotland the previous year, but had been diverted by a terrible storm and ended up in Norway. (The storm was so severe that several supposed witches were held responsible and were burned.) James' party, too, encountered severe weather, but eventually he met Anne in Oslo and the two were married. On the journey home the royal party visited Tycho Brahe's observatory on the island of Hven, and were entertained by the famous astronomer, well known as the discoverer of the "new star" in the constellation Cassiopeia. During this stay at Brahe's lavish Uraneinborg ("castle in the sky") Dr. Craig observed the technique of prosthaphaeresis that Brahe and his assistants used to ease the burden of calculation. When he returned to Scotland, Craig

mentioned this to his friend the Baron of Murchiston (aka John Napier), and this seems to have motivated Napier to devote himself to the development of his logarithms and the generation of his tables, on which he spent the remaining 25 years of his life. During this time Napier occasionally sent preliminary results of Brahe for comment. Several other people had similar ideas about exploiting the exponential mapping for purposes of computation. Indeed, Kepler's friend and assistant Jost Burgi evidently devised a set of "progress tables" (basically anti-logarithm tables) around 1600, based on the indices of geometric progressions, and made some use of these in his calculations. However, he didn't fully perceive the potential of this correspondence, and didn't develop it very far. Incidentally, if the story of a group of storm-tossed nobles finding themselves on a mysterious island ruled over by a magician sounds familiar, it may be because of Shakespeare's "The Tempest", written in 1610. This was Shakespeare's last complete play and, along with Love's Labor's Lost, his only original plot, i.e., these are the only two of his plays whose plots are not known to have been based on pre-existing works. It is commonly believed that the plot of "The Tempest" was inspired by reports of a group of colonists bound for Virginia who were shipwrecked in Bermuda in 1609. However, it's also possible that Shakespeare had in mind the story of James VI (who by 1610 was James I, King of England) and his marriage expedition, arriving after a series of violent storms on the island of the Danish astronomer and astrologer Tycho Brahe and his castle in the sky (which, we may recall, included a menagerie of exotic animals). We know "The Tempest" was produced at the royal court in 1611 and again in 1612 as part of the festivities preceding the marriage of the King's daughter, and it certainly seems likely that James and Anne would associate any story involving a tempest with their memories of the great storms of 1589 and 1590 that delayed Anne's voyage to Scotland and prompted James' journey to meet her. The providential aspects of Shakespeare's "The Tempest" and its parallels with their own experiences could hardly have been lost on them. Shakespeare's choice of the peculiar names Rosencrantz and Guildenstern for two minor characters in "Hamlet, Prince of Denmark" gives further support to the idea that he was familiar with Tycho, since those were the names of two of Tycho's ancestors appearing on his coat of arms. There is also evidence that Shakespeare was personally close to the Digges family (e.g., Leonard Digges contributed a sonnet to the first Folio), and Thomas Digges was an English astronomer and mathematician who, along with John Dee, was well acquainted with Tycho. Digges was an early supporter and interpreter of Copernicus' relativistic ideas, and was apparently the first to suggest that our Sun was just an ordinary star in an infinite universe of stars. Considering all this, it is surely not too farfetched to suggest that Tycho may have been the model for Prospero, whose name, being composed of Providence and sparrow, is an example of Shakespeare's remarkable ability to weave a variety of ideas, influences, and connotations into the fabric of his plays, just as we can see in Kepler's three laws the synthesis of the heliocentric model of Copernicus, Apollonius' conics, and the logarithms of Napier.

8.2 Newton's Cosmological Queries Isack received your letter and I perceived you letter from mee with your cloth but none to you your sisters present thai love to you with my motherly lov and prayers to god for you I your loving mother hanah wollstrup may the 6. 1665 Newton famously declared that it is not the business of science to make hypotheses. However, it's well to remember that this position was formulated in the midst of a bitter dispute with Robert Hooke, who had criticized Newton's writings on optics when they were first communicated to the Royal Society in the early 1670's. The essence of Newton's thesis was that white light is composed of a mixture of light of different elementary colors, ranging across the visible spectrum, which he had demonstrated by decomposing white light into its separate colors and then reassembling those components to produce white light again. However, in his description of the phenomena of color Newton originally included some remarks about his corpuscular conception of light (perhaps akin to the cogs and flywheels in terms of which James Maxwell was later to conceive of the phenomena of electromagnetism). Hooke interpreted the whole of Newton's optical work as an attempt to legitimize this corpuscular hypothesis, and countered with various objections. Newton quickly realized his mistake in attaching his theory of colors to any particular hypothesis on the fundamental nature of light, and immediately back-tracked, arguing that his intent had been only to describe the observable phenomena, without regard to any hypotheses as to the cause of the phenomena. Hooke (and others) continued to criticize Newton's theory of colors by arguing against the corpuscular hypothesis, causing Newton to respond more and more angrily that he was making no hypothesis, he was describing the way things are, and not claiming to explain why they are. This was a bitter lesson for Newton and, in addition to initiating a life-long feud with Hooke, went a long way toward shaping Newton's rhetoric about what science should be. I use the term "rhetoric" because it is to some extent a matter of semantics as to whether a descriptive theory entails a causative hypothesis. For example, when accused of invoking an occult phenomena in gravity, Newton replied that the phenomena of gravity are not occult, although the causes may be. (See below.) Clearly the dispute with Hooke had caused Newton to paint himself into the "hypotheses non fingo" corner, and this somewhat accidentally became part of his legacy to science, which has ever after been much more descriptive and less explanatory than, say, Descartes would have wished. This

is particularly ironic in view of the fact that Newton personally entertained a great many bold hypotheses, including a number of semi-mystical hermetic explanations for all manner of things, not to mention his painstaking interpretations of biblical prophecies. Most of these he kept to himself, but when he finally got around to publishing his optical papers (after Hooke had died) he couldn't resist including a list of 31 "Queries" concerning the big cosmic issues that he had been too reticent to address publicly before. The true nature of these "queries" can immediately be gathered from the fact that every one of them is phrased in the form of a negative question, as in "Are not the Rays of Light very small bodies emitted from shining substances?" Each one is plainly a hypothesis phrased as a question. The first edition of The Opticks (1704) contained only 16 queries, but when the Latin edition was published in 1706 Newton was emboldened to add seven more, which ultimately became Queries 25 through 31 when, in the second English edition, he added Queries 17 through 24. Of all these, one of the most intriguing is Query 28, which begins with the rhetorical question "Are not all Hypotheses erroneous in which Light is supposed to consist of Pression or Motion propagated through a fluid medium?" In this query Newton rejects the Cartesian idea of a material substance filling in and comprising the space between particles. Newton preferred an atomistic view, believing that all substances were comprised of hard impenetrable particles moving and interacting via innate forces in an empty space (as described further in Query 31). After listing several facts that make an aetheral medium inconsistent with observations, the discussion of Query 28 continues And for rejecting such a medium, we have the authority of those the oldest and most celebrated philosophers of ancient Greece and Phoenicia, who made a vacuum and atoms and the gravity of atoms the first principles of their philosophy, tacitly attributing gravity to some other cause than dense matter. Later philosophers banish the consideration of such a cause... feigning [instead] hypotheses for explaining all things mechanically [But] the main business of natural philosophy is to argue from phenomena without feigning hypotheses, and to deduce causes from effects, till we come to the very first cause, which certainly is not mechanical. And not only to unfold the mechanism of the world, but chiefly to resolve such questions as What is there in places empty of matter? and Whence is it that the sun and planets gravitate toward one another without dense matter between them? Whence is it that Nature doth nothing in vain? and Whence arises all that order and beauty which we see in the world? To what end are comets? and Whence is it that planets move all one and the same way in orbs concentrick, while comets move all manner of ways in orbs very excentrick? and What hinders the fixed stars from falling upon one another? It's interesting to compare these comments of Newton with those of Socrates as recorded in Plato's Phaedo

If then one wished to know the cause of each thing, why it comes to be or perishes or exists, one had to find what was the best way for it to be, or to be acted upon, or to act. I was ready to find out ... about the sun and the moon and the other heavenly bodies, about their relative speed, their turnings, and whatever else happened to them, how it is best that each should act or be acted upon. I never thought [we would need to] bring in any other cause for them than that it was best for them to be as they are. This wonderful hope was dashed as I went on reading, and saw that [men] mention as causes air and ether and water and many other strange things... It is what the majority appear to do, like people groping in the dark; they call it a cause, thus giving it a name which does not belong to it. That is why one man surrounds the earth with a vortex to make the heavens keep it in place, another makes the air support it like a wide lid. As for their capacity of being in the best place they could possibly be put, this they do not look for, nor do they believe it to have any divine force, but they believe that they will some time discover a stronger and more immortal Atlas to hold everything together... Both men are suggesting that a hierarchy of mechanical causes cannot ultimately prove satisfactory, and that the first cause of things cannot be mechanistic in nature. Both suggest that the macroscopic mechanisms of the world are just manifestations of an underlying and irreducible principle of "order and beauty", indeed of a "divine force". But Newton wasn't content to leave it at this. After lengthy deliberations, and discussions with David Gregory, he decided to add the comment Is not Infinite Space the Sensorium of a Being incorporeal, living and intelligent, who sees the things themselves intimately, and thoroughly perceives them, and comprehends them wholly by their immediate presence to himself? Samuel Johnson once recommended a proof-reading technique to a young writer, telling him that you should read over your work carefully, and whenever you come across a phrase or passage that seems particularly fine, strike it out. Newton's literal identification of Infinite Space with the Sensorium of God may have been a candidate for that treatment, but it went to press anyway. However, as soon as the edition was released, Newton suddenly got cold feet, and realized that he'd exposed himself to ridicule. He desperately tried to recall the book and, failing that, he personally rounded up all the copies he could find, cut out the offending passage with scissors, and pasted in a new version. Hence the official versions contain the gentler statement (reverting once again to the negative question!): And these things being rightly dispatch'd, does it not appear from phaenomena that there is a Being incorporeal, living, intelligent, omnipresent, who in infinite space, as it were in his Sensory, sees the things themselves intimately, and thoroughly perceives them, and comprehends them wholly by their immediate presence to himself: Of which things the images only carried through the organs of sense into our little sensoriums are there seen and beheld by that which in us

perceives and thinks. And though every true step made in this philosophy brings us not immediately to the knowledge of the first cause, yet it brings us nearer to it... Incidentally, despite Newton's efforts to prevent it, one of the un-repaired copies had already made its way out of the county, and was on its way to Leibniz, who predictably cited the original "Sensorium of God" comment as evidence that Newton "has little success with metaphysics". Newton's 29th Query (not a hypothesis, mind you) was: "Are not the rays of light very small bodies emitted from shining substances?" Considering that his mooting of this idea over thirty years earlier had precipitated a controversy that nearly led him to a nervous breakdown, one has to say that Newton was nothing if not tenacious. This query also demonstrates how little his basic ideas about the nature of light had changed over the course of his life. After listing numerous reasons for suspecting that the answer to this question was Yes, Newton proceeded in Query 30 to ask the pregnant question "Are not gross bodies and light convertible into one another?" Following Newton's rhetorical device, should not this be interpreted as a suggestion of equivalence between mass and energy? The final pages of The Opticks are devoted to Query 31, which begins Have not the small particles of bodies certain powers, virtues, or forces, by which they act at a distance, not only upon the rays of light for reflecting, refracting, and inflecting them, but also upon one another for producing a great part of the Phenomena of nature? Newton goes on to speculate that the force of electricity operates on very small scales to hold the parts of chemicals together and govern their interactions, anticipating the modern theory of chemistry. Most of this Query is devoted to an extensive (20 pages!) enumeration of chemical phenomena that Newton wished to cite in support of this view. He then returns to the behavior of macroscopic objects, asserting that Nature will be very conformable to herself, and very simple, performing all the great motions of the heavenly bodies by the attraction of gravity which intercedes those bodies, and almost all the small ones of their particles by some other attractive and repelling powers which intercede the particles. This is a very clear expression of Newton's belief that forces act between separate particles, i.e., at a distance. He continues The Vis inertiae is a passive Principle by which Bodies persist in their Motion or Rest, receive Motion in proportion to the Force impressing it, and resist as much as they are resisted. By this Principle alone there never could have been any Motion in the World. Some other Principle was necessary for putting Bodies into Motion; and now they are in Motion, some other Principle is necessary for

conserving the motion. In other words, Newton is arguing that the principle of inertia, by itself, cannot account for the motion we observe in the world, because inertia only tends to preserve existing states of motion, and only uniform motion in a straight line. Thus we must account for the initial states of motion (the initial conditions), the persistence of non-inertial motions, and for the on-going variations in the amount of motion that are observed. For this purpose Newton distinguishes between "passive" attributes of bodies, such as inertia, and "active" attributes of bodies, such as gravity, and he points out that, were it not for gravity, the planets would not remain in their orbits, etc, so it is necessary for bodies to possess active as well as passive attributes, because otherwise everything would soon be diffuse and cold. Thus he is not saying that the planets would simply come to a halt in the absence of active attributes, but rather that the constituents of any physical universe resembling ours (containing persistent non-inertial motion) must necessarily possess active as well as passive properties. Next, Newton argues that the "amount of motion" in the world is not constant, in two different respects. The first is rather interesting, because it makes very clear the fact that he regarded ontological motion as absolute. He considers two identical globes in empty space attached by a slender rod and revolving with angular speed ω about their combined center of mass, and he says the center of mass is moving with some velocity v (in the plane of revolution). If the radius from the center of mass to each globe is r, then the globes have a speed of ωr relative to the center. When the connecting rod is periodically oriented perpendicular to the velocity of the center, one of the globes has a speed equal to v + ωr and the other a speed equal to v  ωr, so the total "amount of motion" (i.e., the sum of the magnitudes of the momentums) is simply 2mv. However, when the rod is periodically aligned parallel to the velocity of the center, the globes each have a total speed of

, so the total "amount of motion" is

Thus, Newton argues, the total quantity of motion of the two globes fluctuates periodically between this value and 2mv. Obviously he is expressing the belief that the "amount of motion" has absolute significance. (He doesn't remark on the fact that the kinetic energy in this situation is conserved). The other way in which, Newton argues, the amount of motion is not conserved is in inelastic collisions, such as when two masses of clay collide and the bodies stick together. Of course, even in this case the momentum vector is conserved, but again the sum of the magnitudes of the individual momentums is reduced. Also, in this case, the kinetic energy is dissipated as heat. Interestingly, Newton observes that, aside from the periodic fluctuations such as with the revolving globes, the net secular change in total "amount of motion" is always negative.

By reason of the tenacity of fluids, the attrition of their parts... motion is much more apt to be lost than got, and is always upon the Decay. This can easily be seen as an early statement of statistical thermodynamics and the law of entropy. In any case, from this tendency for motion to decay, Newton concludes that eventually the Universe must "run down", and "all things would grow cold and freeze, and become inactive masses". Newton also mentions one further sense in which (he believed) passive attributes alone were insufficient to account for the persistence of well-ordered motion that we observe. ...blind fate could never make all the planets move one and the same way in orbs concentrick, some inconsiderable irregularities excepted, which may have risen from the action of comets and planets upon one another, and which will be apt to increase, till this system wants a reformation. In addition to whatever sense of design and/or purpose we may discern in the initial conditions of the solar system, Newton also seems to be hinting at the idea that, in the long run, any initial irregularities, however "inconsiderable" they may be, will increase until the system wants reformation. In recent years we've gained a better appreciation of the fact that Newton's laws, though strictly deterministic, are nevertheless potentially chaotic, so that the overall long-term course of events can quickly come to depend on arbitrarily slight variations in initial conditions, rendering the results unpredictable on the basis of any fixed level of precision. So, for all these reasons, Newton argues that passive principles such as inertia cannot suffice to account for what we observe. We also require active principles, among which he includes gravity, electricity, and magnetism. Beyond this, Newton suggests that the ultimate "active principle" underlying all the order and beauty we find in the world, is God, who not only set things in motion, but from time to time must actively intervene to restore their motion. This was an important point for Newton, because he was genuinely concerned about the moral implications of a scientific theory that explained everything as the inevitable consequence of mechanical principles. This is why he labored so hard to reconcile his clockwork universe with an on-going active role for God. He seems to have found this role in the task of resisting an inevitable inclination of our mechanisms to descend into dissipation and veer into chaos. In this final Query Newton also took the opportunity to explicitly defend his abstract principles such as inertia and gravity, which some critics charged were occult. These principles I consider not as occult qualities...but as general laws of nature, by which the things themselves are formed, their truth appearing to us by phenomena, though their causes be not yet discovered. For these are manifest qualities, and their causes only are occult. The Aristotelians gave the name of occult qualities not to manifest qualities, but to such qualities only as they supposed to lie hid in Bodies, and to be the unknown causes of manifest effects,

such as would be the causes of gravity... if we should suppose that these forces or actions arose from qualities unknown to us, and uncapable of being made known and manifest. Such occult qualities put a stop to the improvement of natural philosophy, and therefore of late years have been rejected. To tell us that every species of things is endowed with an occult specific quality by which it acts and produces manifest effects is to tell us nothing... The last set of Queries to be added, now numbered 17 through 24, appeared in the second English edition in 1717, when Newton was 75. These are remarkable in that they argue for an aether permeating all of space - despite the fact that Queries 25 through 31 argue at length against the necessity for an aether, and those were hardly altered at all when Newton added the new Queries which advocate an aether. (It may be worth noting, however, that the reference to "empty space" in the original version of Query 28 was changed at some point to "nearly empty space".) It seems to be the general opinion among Newtonian scholars that these "Aether Queries" inserted by Newton in his old age were simply attempts "to placate critics by seeming retreats to more conventional positions". The word "seeming" is well chosen, because we find in Query 21 the comments And so if any one should suppose that aether (like our air) may contain particles which endeavour to recede from one another (for I do not know what this aether is), and that its particles are exceedingly smaller than those of air, or even than those of light, the exceeding smallness of its particles may contribute to the greatness of the force by which those particles may recede from one another, and thereby make that medium exceedingly more rare and elastick than air, and by consequence exceedingly less able to resist the motions of projectiles, and exceedingly more able to press upon gross bodies, by endeavoring to expand itself. Thus Newton not only continues to view light as consisting of particles, but imagines that the putative aether may also be composed of particles, between which primitive forces operate to govern their movements. It seems that the aether of these queries was a distinctly Newtonian one, and it purpose was as much to serve as a possible mechanism for gravity as for the refraction and reflection of light. It's disconcerting that Newton continued to be misled by his erroneous belief that refracted paths proceed from more dense to less dense regions, which required him to posit an aether surrounding the Sun with a density that increases with distance, so that the motion of the planets may be seen as a tendency to veer toward less dense parts of the aether. There's a striking parallel between this set of "pro-Aether Queries" of Newton and the famous essay "Ether and the Theory of Relativity", in which Einstein tried to reconcile his view of physics with something that could be termed an ether. Of course, it turned out to be a distinctly Einsteinian ether, immaterial, and incapable of being assigned any place or state of motion. Since I've credited Newton with suggesting the second law of thermodynamics and mass-

energy equivalence, I may as well mention that he could also be regarded as the originator of the notorious "cosmological constant", which has had such a checkered history in theory of relativity. Recall that the weak/slow limit of Einstein's field equations without the cosmological term corresponds to a gravitational relation of the familiar form

but if a non-zero cosmological constant is assumed the weak/slow limit is

As it happens, Newton explored the consequences of a wide range of central force laws in the Principia, and determined that the only two forms for which spherically symmetrical masses can be treated as if all the mass was located at the central point are F = k/r2 and F = λr. (See Propositions LXXVII and LXXVIII in Book I). In addition to this distinctive spherical symmetry property (analogous to Birkhoff's theorem for general relativity), these are also the only two central force laws for which the shape of orbits in a two-body system are perfect conic sections (see Proposition X), although in the case of a force directly proportional to the distance the center of force is at the center of the conic, rather than at a focus. In the Scholium following the discussion of spherically symmetrical bodies Newton wrote I have now explained the two principal cases of attractions; to wit, when the centripetal forces decrease as the square of the ratio of the distances, or increase in a simple ratio of the distances, causing the bodies in both cases to revolve in conic sections, and composing spherical bodies whose centripetal forces observe the same law of increase or decrease in the recess from the center as the forces from the particles themselves do; which is very remarkable. Considering that Newton referred to these two special cases as the two principal cases of "attraction", it's not too much of a stretch to say that the full general law of attraction (or gravitation) developed in the Principia was actually (2) rather than (1), and it was only in Book III (The System of the World), in which the laws are fit to actual observed phenomena, that he concludes there is no (discernable) evidence for the direct term. The situation is essentially the same today, i.e., on a purely formal mathematical basis the cosmological term seems to "fit", at least up to a point, but the empirical justification for it remain unclear. If λ is non-zero, it must be quite small, at least in the current epoch. So I think it can be said with some justification that Newton actually originated the cosmological term in theoretical investigations of gravity. As an example of how seriously Newton took these "non-physical" possibilities, he noted that with an inverse-square law the introduction of a third body generally destroys perfect ellipticity of the orbits, causing the ellipses to precess, whereas in Proposition LXIV he shows that with a pure direct force law F = λr this is not the case. In other words, the

orbits remain perfectly elliptical even with three or more gravitating bodies, although the presence of more bodies increases the velocities and decreases the periods of the orbits. These serious considerations show that Newton wasn't simply trying to fit data to a model. He was interested in the same aspect of science that Einstein said interested him the most, namely, "whether God had any choice in how he created the world". This may be a somewhat melodramatic way of expressing it, but the basic idea is clear. It isn't enough to discern that objects appear to obey an inverse square law of attraction; Newton wanted to understand what was special about the inverse square, and why nature chose that form rather than some other. Socrates alluded to this same wish in Phaedo: If then one wished to know the cause of each thing, why it comes to be or perishes or exists, one had to find out what was the best way for it to be, or to be acted upon, or to act. Although this attitude may strike us as silly, it seems undeniable that it's been an animating factor in the minds of some of the greatest scientists – the urge to comprehend not just what is, but why it must be so.

8.3 The Helen of Geometers I first have to learn to watch very respectfully as the masters of creativity perform their intellectual climbing feats, while I stay bowleggedly below in the valley mist. I already have a premonition that up there the sun is always shining! Hedwig Born to Einstein, 1919 The curve traced out by a point on the rim of a rolling circle is called a cycloid, and we've seen that this curve described gravitational free-fall, both in Newtonian mechanics and in general relativity (in terms of the free-falling proper time). Remarkably, this curve has been a significant object of study for almost every major scientist mentioned in this book, and has been called "the Helen of geometers" because of all the disputes it has provoked between mathematicians. It was first discussed by Charles Bouvelles in 1501 as a mechanical means of squaring the circle. Subsequently Galileo and his student Viviani studied the curve, finding a method of constructing tangents, and Galileo suggested that it might be a suitable shape for an arch bridge. Mersenne publicized the cycloid among his group of correspondents, including the young Roberval, who, by the 1630's had determined many of the major properties of the cycloid, such as the interesting fact that the area under a complete cycloidal arch is exactly three times the area of the rolling circle. Roberval used his problem-solving techniques in 1634 to win the Mathematics chair at the College Royal, which was determined every three years by an open competition. Unfortunately, the contest did not require full disclosure of

the solution methods, so the incumbent (who selected the contest problems) had a strong incentive to keep his best methods a secret, lest they be used to unseat him at the next contest. In retrospect, this was not a very wise arrangement for a teaching position. Roberval held the chair for 40 years, but by keeping his solution methods secret he lost priority for several important discoveries, and became involved in numerous quarrels. One of the men accused by Roberval of plagiarism was Torricelli, who in 1644 was the first to publish an explanation of the area and the tangents of the cycloid. It's now believed that Torricelli arrived at his results independently. (Torricelli served as Galileo's assistant for a brief time, and probably learned of the cycloid from him.) In 1658, four years after renouncing mathematics as a vainglorious pursuit, Pascal found himself one day suffering from a painful toothache, and in desperation began to think about the cycloid to take his mind off the pain. Quickly the pain abated, and Pascal interpreted this as a sign from the Almighty that he should proceed to study the cycloid, which he did intensively for the next eight days. During this period he rediscovered most of what had already been learned about the cycloid, and several results that were new. Pascal decided to propose a set of challenge problems, with the promise of a first and second prize to be awarded for the best solutions. Roberval was named as one of the judges. Only two sets of solutions were received, from Antoine de Lalouvere and John Wallis, but Pascal and Roberval decided that neither of the entries merited a prize, so no prizes were awarded. Instead, Pascal published his own solutions, along with an essay on the "History of the Cycloid", in which he essentially took Roberval's side in the priority dispute with Torricelli. The conduct of Pascal's cycloid contest displeased many people, but it had at least one useful side effect. In 1658 Christiaan Huygens was thinking about how to improve the design clocks, and of course he realized that the period of oscillation of a simple pendulum (i.e., a massive object constrained to moving along a circular arc under the vertical force of gravity) is not perfectly independent of the amplitude. Prompted by Pascal's contest, Huygens decided to consider how an object would oscillate if constrained to follow an upside-down cycloidal path, and found to his delight that the frequency of such a system actually is perfectly independent of the amplitude. Thus he had discovered that the cycloid is the tautochrone, i.e., the curve for which the time taken by a particle sliding from any point on the curve to the lowest point on the curve is the same, independent of the starting point. He presented this result in his great treatise "Horologium Oscillatorium" (not published until 1673), in which he clearly described the modern principle of inertia (the foundation of relativity), the law of centripetal force, the conservation of kinetic energy, and many other important concepts of dynamics - ten years before Newton's "Principia". The cycloid went on attracting the attention of the world's best mathematicians, and revealing new and remarkable properties. For example, in June of 1696, John Bernoulli issued the following challenge to the other mathematicians of Europe: If two points A and B are given in a vertical plane, to assign to a mobile particle M the path AMB along which, descending under its own weight, it passes from

the point A to the point B in the briefest time. Pictorially the problem is as shown below:

In accord with its defining property, the requested curve is called the brachistochrone. The solution was first found by Jean and/or Jacques Bernoulli, depending on whom you believe. (Each of the brothers worked on the problem, and they later accused each other of plagiarism.) Jean, who was never accused of understating the significance of his discoveries, revealed his solution in January of 1697 by first reminding his readers of Huygens' tautochrone, and then saying "you will be petrified with astonishment when I say that precisely this same cycloid... is our required brachistochrone". Incidentally, the Bernoulli's were partisans on the side of Leibniz in the famous priority dispute between Leibniz and Newton over the invention of calculus. Before revealing his solution to the brachistochrone challenge problem, Jean Bernoulli along with Leibniz sent a copy of the challenge directly to Newton in England, and included in the public announcement of the challenge the words ...there are fewer who are likely to solve our excellent problems, aye, fewer even among the very mathematicians who boast that [they]... have wonderfully extended its bounds by means of the golden theorems which (they thought) were known to no one, but which in fact had long previously been published by others. It seems clear the intent was to humiliate the aging Newton (who by then had left Cambridge and was Warden of the Mint), by demonstrating that he was unable to solve a problem that Leibniz and the Bernoullis had solved. The story as recounted by Newton's biographer Conduitt is that Sir Isaac "in the midst of the hurry of the great recoinage did not come home till four from the Tower very much tired, but did not sleep till he had solved it, which was by 4 in the morning." In all, Bernoulli received only three solutions to his challenge problem, one from Leibniz, one from l'Hospital, and one anonymous solution from England. Bernoulli supposedly said he knew who the anonymous author must be, "as the lion is recognized by his print". Newton was obviously proud of his solution, although he commented later that "I do not love to be dunned & teezed by forreigners about Mathematical things..." It's interesting that Jean Bernoulli apparently arrived at his result from his studies of the path of a light ray through a non-uniform medium. He showed how this problem is related in general to the mechanical problem of an object moving with varying speeds

due to any cause. For example, he compared the refractive problem with the mechanical problem whose density is inversely proportional to the speed that a heavy body acquires in gravitational freefall. "In this way", he wrote, "I have solved two important problems an optical and a mechanical one...". Then he specialized this to Galileo's law of falling bodies, according to which the speeds of two falling bodies are to each other as the square roots of the altitudes traveled. He concluded Before I end I must voice once more the admiration I feel for the unexpected identity of Huygens' tautochrone and my brachistochrone. I consider it especially remarkable that this coincidence can take place only under the hypothesis of Galileo, so that we even obtain from this a proof of its correctness. Nature always tends to act in the simplest way, and so it here lets one curve serve two different functions, while under any other hypothesis we should need two curves... Presumably his enthusiasm would have been even greater had he known that the same curve describes radial gravitational freefall versus proper time in general relativity. We see from Bernoulli’s work that the variational techniques developed to solve problems like the brachistrochrone also found physical application in what came to be called the principle of least action, a principle usually attributed to Maupertius, or perhaps Leibniz (if one accepts the contention that “the best of all possible worlds” represents an expression of this principle). One particularly striking application of this variational approach was Fermat’s principle of least time for light rays, as discussed in Section 3.4. Essentially the same technique is used to determine the equations of a geodesic path in the curved spacetime of general relativity. In the twentieth century, Planck was the most prominent enthusiast for the variational approach, asserting that “the principle of least action is perhaps that which, as regards form and content, may claim to come nearest to that ideal final aim of theoretical research”. Indeed he even (at times) argued that the principle manifests a deep teleological aspect of nature, since it can be interpreted as a global imperative, i.e., systems evolve locally in a way that extremizes (or makes stationary) certain global measures in a temporally symmetrical way, as if the final state were already determined. He wrote In fact, the least-action principle introduces an entirely new idea into the concept of causality: The causa efficiens, which operates from the present into the future and makes future situations appear as determined by earlier ones, is joined by the causa finalis for which, inversely, the future – namely, a definite goal – serves as the premise from which there can be deduced the development of the processes which lead to this goal. It’s surprising to see this called “an entirely new idea”, considering that causa finalis was among the four fundamental kinds of causation enunciated by Aristotle. In any case, throughout his life the normally austere and conservative Planck continued to have an almost mystical reverence for the principle of least action, arguing that it is not only “the most comprehensive of all physical laws”, but that it actually represents the purest

expression of the thoughts of God. Interestingly, Fermat himself was much less philosophically committed to the principle that he himself originated (somewhat like Einstein’s ambivalence toward the quantum theory). After being challenged on the fundamental truth of the "least time" principle as a law of nature by the Cartesian Clerselier, Fermat replied in exasperation I do not pretend and I have never pretended to be in the secret confidence of nature. She moves by paths obscure and hidden... Fermat was content to regard the principle of least time as a purely abstract mathematical theorem, describing - though not necessarily explaining – the behavior of light. 8.4 Refractions on Relativity For now we see through a glass, darkly; but then face to face. Now I know in part, but then shall I know even as also I am known. I Corinthians 13,12

We saw in Section 3.4 that Fermat's Principle of least time predicts that paths of light rays passing through a plane boundary between regions of constant refractive index, but to more fully appreciate this principle it's useful to develop the equations of motion for light rays in a medium with arbitrarily varying refractive index. First, notice that Snell's law enables us to determine the paths of optical rays passing though a discrete boundary between regions of constant refractive index, but doesn't explicitly tell us the path of light in a medium of continuously varying refractivity. To determine this, we can refer to Fresnel's equations, which give the intensities of the reflected and transmitted

Consequently, the fraction of incident energy that is transmitted is 1  R. However, this formula assumes the thickness of the boundaries between regions of constant refractive index is small in comparison with the wavelength of the light, whereas in many real circumstances the density of the medium does not change abruptly at well-defined boundaries, but varies continuously as a function of position. Therefore, we would like a means of tracing rays of light as they pass through a medium with a continuously varying index of refraction. Notice that if we approximate a continuously changing index of refraction by a sequence of thin uniform plates, as we add more plates the ratio of n2/n1 from one region to the next approaches 1, and so according to Snell's Law the value of θ2 approaches the value of θ1. From Fresnel's equations we see that in this case the fraction of incident energy that is reflected goes to zero, and we find that a light ray with a given trajectory proceeds in just one direction through the continuous medium (provided the gradient of the scalar field n(x,y) is never too great relative to the wavelength of the light). So, it should be possible

to predict the unique path of transmission of a light ray in a medium with continuously varying index of refraction. Perhaps the most direct approach is via the usual calculus of variations. (For convenience we'll just work in 2 dimensions, but all the formulas can immediately be generalized to three dimensions.) We know that the index of refraction n at a point (x,y) equals c/v, where v is the velocity of light at that point. Thus, if we parameterize the path by the equations x = x(u) and y = y(u), the "optical path length" from point A to point B (i.e., the time taken by a light beam to traverse the path) is given by the integral

where dots signify derivatives with respect to the parameter u. To make this integral an extremum, let f denote the integrand function

Then the Euler equations (introduced in Section 5.4) are

which gives

Now, if we define our parameter u as the spatial path length s, then we have and so the above equations reduce to

,

These are the "equations of motion" for a photon in a heterogeneous medium, as they are usually formulated, in terms of the spatial path parameter s. However, another approach to this problem is to define a temporal metric on the space, i.e., a metric the represents the time taken by a light beam to travel from one point to another. This temporal approach has remarkable formal similarities to Einstein's metrical theory of gravity.

According to Fermat's Principle, the path taken by a ray of light from one point to another is such that the time is minimal (for slight perturbations of the path). Therefore, if we define a metric in the x,y space such that the metrical "distance" between any two infinitesimally close points is proportional to the time required by a photon to travel from one point to the other, then the paths of photons in this space will correspond to the geodesics. Since the refractive index n is a smooth continuous function of x and y, it can be regarded as constant in a sufficiently small region surrounding any particular point (x,y). The incremental spatial distance from this point to the nearby point (x+dx, y+dy) is given by ds2 = dx2 + dy2, and the incremental time dτ for a photon to travel the incremental distance ds is simply ds/v where v = c/n. Therefore, we have dτ = (n/c)ds, and so our metrical line element for this space is

If, instead of x and y, we name our two spatial coordinates x1 and x2 (where these superscripts denote indices, not exponents) we can express equation (2) in tensor form as

where guv is the covariant metric tensor

Note that in equation (3) we have invoked the usual summation convention. The contravariant form of the metric tensor, denoted by guv, is the matrix inverse of (4). According to Fermat's Principle, the path of a light ray must be a geodesic path based on this metric. As discussed in Section 5.4, the equations of a geodesic path are

Based on the metric of our 2D optical space we have the eight Christoffel symbols

Inserting these into (5) gives the equations for geodesic paths, which define the paths of light rays in this region. Reverting back to our original notation of x,y for our spatial coordinates, the differential equations for ray paths in this medium of continuously varying refractive index are

where nx and ny denote partials derivatives of n with respect to x and y respectively. These are the equations of motion for light based on the temporal metric approach. To show that these equations, based on the temporal path parameter τ, are equivalent to equations (1a) and (1b) based on the spatial path parameter s, notice that s and τ are linked by the relation ds/dτ = c/n where c is the velocity of light. Multiplying both inside and outside the right hand side expression of (1a) by the unity of (n/c)(ds/dτ) we get

Expanding the derivative on the right side gives

Since n is a function of x and y, we can express the derivative dn/dτ using the total derivative

Substituting this into the previous equation and factoring gives

Recalling that c/n = ds/dτ, we can multiply both sides of this equation by (ds/dτ)2 to give

Since s is the spatial path length, we have (ds)2 = (dx)2 + (dy)2, so we can substitute for ds on the left hand side and rearrange terms to give the result

which is the same as the geodesic equation (6a). A similar derivation shows that (1b) is equivalent to the geodesic equation (6b), so the two sets of equations of motion for light rays are identical. With these equations we can compute the locus of rays emanating from any given point in a medium with arbitrarily varying index of refraction. Of course, if the index of refraction is constant then the right hand sides of equations (6) vanish and the equations for light rays reduce to

which are simply the equations of straight lines. For a less trivial case, suppose the index of refraction in this region is a linear function of the x parameter, i.e., we have n(x) = Ax + B for some constants A and B. In this case the equations of motion reduce to

With A=5 and B=1/5 the locus of rays emanating from a point is as shown in Figure 1.

Figure 1 The correctness of the rays in Figure 1 are easily verified by noting that in a medium with n varying only in the horizontal direction it follows immediately from Snell's law that the product n sin(θ) must be constant, where θ is the angle which the ray makes with the horizontal axis. We can verify numerically that the rays shown in Figure 1, generated by the geodesic equations, satisfy Snell's Law throughout. We've placed the origin of these rays at the location where n = 5. The left-most point on this family of curves emanating from that point is at the x location where n = 0. Of course, in reality we could not construct a medium with n = 0, since that represents an infinite speed of light. It is, however, possible for the index of refraction of a medium to be less than 1 for certain frequencies, such as x-rays in glass. This implies that the velocity of light exceeds c, which may seem to conflict with relativity. However, the "velocity of light" that appears in the denominator of the refractive index is actually the phase velocity, rather than the group velocity, and the latter is typically the speed of energy transfer and signal propagation. (The phenomenon of "anomalous dispersion" can actually result in a group velocity greater than c, but in all cases the signal velocity is less than or equal to c.) Incidentally, these ray lines, in a medium with linearly varying index of refraction, are called catenary curves, which is the shape made by a heavy cable slung between two attachment points in uniform gravity. To prove this, let's first rotate the medium so that the refractive index varies vertically instead of horizontally, and let's slide the vertical axis so that n = Ay for some constant A. The general form of a catenary curve (with vertical axis of symmetry) is

for some constant m. It follows that dy/dx = sinh(x/m). Also, the incremental distance along the path is given by (ds)2 = (dx)2 + (dy)2, so we can substitute for dy to give

Therefore, we have ds = cosh(x/m) dx, which can be integrated to give s = sinh(x/m). Interestingly, this implies that dy/dx = s, so the slope of a catenary (with vertical axis) equals the distance along the curve from the minimum point. Also, from the relation x = m invsin(s) we have dx/ds = m / dy/ds = as/ equations

, so we can multiply this by dy/dx = s to give

. Integrating this gives y as a function of s, so we have the parametric

Letting n0 denote the index of refraction at the minimum point of the catenary (where the curve is parallel to the lines of constant refractive index), and letting A denote dn/dy, we have m = n0/A. For other values of y we have n = Ay = n0 . We can verify that the catenary represents the path of a light ray in a medium whose index of refraction varies linearly as a function of y by inserting these expressions for x, y, and n (and their derivatives) into equations of motion (1). The surface of revolution of one of these catenary curves about the vertical axis through the vertex of the envelope is called a catenoid. Each point inside the envelope of this family of curves is contained in exactly two curves, and the catenoid given by the shorter of these two curves is a minimal surface. It's also interesting to note that the "envelope" of rays emanating from a given point approaches a parabola whose focus is the given point. This parabola and focus are shown as a dotted line in Figure 1. For a less trivial example, the figure below shows the rays in a medium where the index of refraction is spherically symmetrical and drops off linearly with distance from some central point, which gives ray paths that are hypocycloidal loops.

Figure 2 It's also possible to arrange for the light rays to be loxodromic spirals, as shown below.

Figure 3 Finally, Figure 4 shows that the rays can circulate from one point to a central point in accord with "circles of Apollonius", much like the iterations of Mobius transformations in the complex plane.

Figure 4

This occurs with n varying inversely as the square of the distance from the central point. Theoretically, the light from any point, with an initial trajectory in any direction, will eventually turn around and head toward the singularity of infinite density at the center, which the ray approaches asymptotically slowly. Thus, it might be called a "black sphere" lens that refracts all incident light toward its center. Of course, there are obvious practical difficulties with actually constructing an object like this, not least of which is the infinite density at the center, as well as the problems of reflection and dispersion. As an aside, it's interesting to compare the light deflection predicted by the Schwarzschild solution with the deflection that would be given by a simple "refractive medium" with a scalar index of refraction defined at each point. We've seen that the "least time" metric in a plane is

where we have set c=1, and n(x,y) is the index of refraction at the point (x,y). If we write this in polar coordinates r,θ, and if we assume that both n and dτ/dt depend only on r, this can be written as

for some function n(r). In order to match the Schwarzschild radial speed of light dr/dt we must have n(r) = r/(r2m), which completely determines the "refractive model" metric for light rays on the plane. The corresponding geodesic equations are

These are similar, but not identical, to the geodesic equations based on the Schwarzschild metric, as can be seen by comparing them with equations (2) in Section 6.2. The weak field deflection is almost indistinguishable. To see this, we proceed as we did with the Schwarzschild metric, integrating the second geodesic equation and determining the constant of integration from the perihelion condition at r = r0 to give

Substituting this into the metric divided by (dt)2 and solving for dr/dt gives

Dividing dθ/dt by dr/dt gives dθ/dr. Then, making the substitution ρ = r0/r as before we arrive at the integral for the angular travel from the perihelion to infinity

Doubling this gives the total angular travel between the incoming and outgoing asymptotes, and subtracting p from this travel gives the deflection δ. Expanding the integral in powers of m/r0, we have the result

Thus the first-order deflection for this simple refraction model is the same as for the Schwarzschild solution. The solutions differ in the second order, but this difference is much too small to be measured in the weak gravitational fields found in our solar system. However, the difference would be significant near a "black hole", because the radius for lightlike circular orbits in this refractive model is 4m, as opposed to 3m for the Schwarzschild metric. On the other hand, it's important to keep in mind that the physical significance of the usual Schwarzschild coordinates can't be taken for granted when translated into a putative model based on simple refraction. The angular coordinates are fairly unambiguous, but we have various resonable choices for the radial parameter. One common choice gives the so-called isotropic coordinates. For the radial coordinate we use ρ , defined with respect to the Schwarzschild coordinate r by the relation

Note that the perimeter of a circular orbit of radius r is 2πr, consistent with Euclidean geometry, whereas the perimeter of a circle of radius ρ is roughly 2πρ(1 + m/ρ). In terms of this radial parameter, the Schwarzschild metric takes the form

This leads to the positive-definite metric for light paths

Hence if we postulate a Euclidean space with the coordinates ρ,θ,ϕ centered on the mass m, and a refractive index varying with ρ according to the formula

then the equations of motion for light are formally identical to those predicted by general relativity. However, when we postulate a Euclidean space with the radial parameter r we are neglecting the fact that the perimeter of a circle of radius r in this space does not have the value 2πρ, so this is not an entirely self-consistent interpretation, as opposed to the usual "curvature" interpretation of general relativity. In addition, physical refraction is ordinarily dependent on the frequency of the light, whereas gravitational deflection is not, so in order to achieve the formal match between the two we must make the physically implausible assumption of refractive index that is independent of frequency. Furthermore, it isn't self-evident that a refractive model can correctly account for the motions of timelike objects, whereas the curved-spacetime interpretation handles all these motions in a unified and self-consistent manner. 8.5 Scholium I earnestly ask that all this be appraised honestly, and that defects in matters so very difficult be not so much reprehended as investigated and kindly supplemented by new endeavors of my readers. Isaac Newton, 1687 Considering that the first Scholium of Newton's Principia begins with the famous assertion "absolute, true, and mathematical time...flows equably, without relation to

anything external", it's ironic that Newton's theory of universal gravitation can be interpreted as a theory of variations in the flow of time. Suppose in Newton's absolute space we establish the Cartesian coordinates x,y,z, and then assign a fourth coordinate, t, to every point. We will call this the coordinate time parameter, but we don't necessarily identify this with the "true time" of events. Instead we postulate that the true lapse of time along an incremental timelike path is dτ, given by

From the Galilean standpoint, we assume that a single set of assignments of the time coordinate t to events corresponds to the lapses of proper time dτ along any and all paths, which implies that g00 = 1 and k = 0. However, this can only be known to within some observational tolerance. Strictly speaking we can say only that g00 is extremely close to 1, and the constant k is very close to zero (in conventional units of measure). Using indices with x0 = t, x1 = x, x2 = y, and x3 = z, we can re-write (1) as the summation

where

Now let's define a four-dimensional array of number representing the second partial derivatives of the gbd as a function of every pair of coordinates xa, xc

Also, we define the "contraction" of this array (using the summation convention for repeated indices) as

Since the only non-zero components of Rabcd are R00cd, it follows that the only non-zero component of Rab is

If we assume g00 is independent of the coordinate t (meaning that the metrical configuration is static), the first term vanishes and we find that R00 is just the Laplacian of g00. Hence if we take our vacuum field equations to be Rµν = 0, this is equivalent to requiring that the Laplacian of g00 vanish, i.e.,

For convenience let us define the scalar φ = g00/2. If we consider just spherically symmetrical fields about the origin, we have φ = φ(r) and so

and similarly for the partials with respect to y and z. Since

we have

and similarly for the y and z partials. Making these substitutions back into the Laplace equation gives

This is simple linear differential equation has the unique solution dφ/dr = J/r2 where J is a constant of integration, and so we have φ = -J/r + K for some constants J and K. Incidentally, it's worth noting that this applies only in three dimensions. If we were working in just two dimensions, the constant "2" in the above equation would be "1", and the unique solution would be dφ/dr = J/r, giving φ = J ln(r) + K. This shows that Newtonian gravity "works" only with three space dimensions, just as general relativity works only with four spacetime dimensions. Now that we've solved for the g00 field we need the equations of motion. We assume that objects in gravitational free-fall follow geodesics through the spacetime, so the equations of motion are just the geodesic equations

where xα denote the quasi-Euclidean coordinates t,x,y,z defined above. Since we have assumed that the scale factor k between spatial and temporal coordinates is virtually zero,

and that g00 is nearly equal to unity, it's clear that all the speed components dx/dτ, dy/dτ, dz/dτ are extremely small, whereas the derivative dt/dτ is virtually equal to 1. Neglecting all terms containing one or more of the speed components, we're left with the zerothorder approximation for the spatial accelerations

From the definition of the Christoffel symbols we have

and similarly for the Christoffel symbols in the y and z equations. Since the metric components are independent of time, the partials with respect to t are all zero. Also, the metric tensor gµν and its inverse gµν are both diagonal and the non-zero components of the latter are virtually equal to 1, 1/k, 1/k, 1/k. All the mixed components of gµν vanish, so we are left with

and similarly for Γytt and Γztt. As a result, the equations of motion in the weak slow limit are closely approximated by

We've seen that the Laplace equation requires gtt to be of the form 2K  2J/r for some constants K and J in a spherically symmetrical field, and since we expect dt/dτ to approach 1 as r increases, we can set 2K = 1. With gtt = 1  2J/r we have

and similarly for the partials with respect to y and z. Therefore the approximate equations of motion in the weak slow limit are

If we set J/k = -m, i.e., to the negative of the mass of the gravitating source, these are exactly the equations of motion for Newton's inverse-square attraction. Interestingly, this implies that precisely one of J,k is negative. If we choose to make J negative, then the

gravitational "potential" has the form gtt = 1 + 2|J|/r, which signifies that the potential would increase as we approach the source, as would the rate of proper time along a stationary worldline with respect to coordinate time. In such a universe the value of k would need to be positive in order for gravity to be attractive, i.e., in order for geodesics to converge on the gravitating source. On the other hand, if we choose to make J positive, so that the potential and the rate of proper time decrease as we approach the source, then the constant k must be negative. Referring back to the original line element, this implies an indefinite metric. Naturally we can scale our units so that |k| = 1, but the sign of k is significant. Thus from the observation that "things fall down" we can nearly infer the Minkowski metrical structure of spacetime. The fact that we can derive the correct trajectories of free-falling objects based on either of two diametrically opposed assumptions is not without precedent. This is very closely related to how Descartes and Newton were able to deduce the correct law of refraction based on the assumption that light travels more rapidly in denser media, while Fermat deduced the same law from the opposite assumption. In any case, taking k = 1 and J = m, we see that Newton's law of gravitation in the vacuum is Rµν = 0, closely paralleling the vacuum field equations of general relativity, which represents the vanishing of the Laplacian of g00/2. At a point with non-zero mass density we simply set this equal to 4πρ to give Poisson's equation. Hence is we define the energy-momentum array

we can express Newton's geometrical spacetime law of gravitation as

This can be compared with Einstein's field equations

Of course the "R" and "T" arrays in Newton's law are based on simple partial derivatives, rather than covariant differentiation, so they are not precisely identical to the Ricci tensor and the energy-momentum tensor of general relativity. However, the definitions are close enough that the tensors of general relativity can rightly be viewed as the natural generalizations of the simple Newtonian arrays. The above equations show that the acceleration of gravity is proportional to the rate of change of gtt as a function of r. At any given r we have dτ/dt =

, so gtt corresponds to the squared "rate of proper time"

(with respect to coordinate time) at the given r. It follows that our feet are younger than our heads, because time advances more slowly as we get closer to the center of the field. So, despite Newton's conception of the perfectly equable flow of time, his theory of gravitation can well be interpreted as a description of the effects of the inequable flow of time. In essence, the effect of Newtonian gravity can be explained in terms of the flow of time being slower near massive objects, and just as a refracted ray of light veers toward the medium in which light goes more slowly (and as a tank veers in the direction of the slower tread-track), objects progressing in time veer in the direction of slower proper time, causing them to accelerate toward massive objects.

8.6 On Gauss's Mountains Grossmann is getting his doctorate on a topic that is connected with fiddling around and non-Euclidean geometry. I don’t know exactly what it is. Einstein to Mileva Maric, 1902 One of the most famous stories about Gauss depicts him measuring the angles of the great triangle formed by the mountain peaks of Hohenhagen, Inselberg, and Brocken for evidence that the geometry of space is non-Euclidean. It's certainly true that Gauss acquired geodetic survey data during his ten-year involvement in mapping the Kingdom of Hanover during the years from 1818 to 1832, and this data included some large "test triangles", notably the one connecting the those three mountain peaks, which could be used to check for accumulated errors in the smaller triangles. It's also true that Gauss understood how the intrinsic curvature of the Earth's surface would theoretically result in slight discrepancies when fitting the smaller triangles inside the larger triangles, although in practice this effect is negligible, because the Earth's curvature is so slight relative to even the largest triangles that can be visually measured on the surface. Still, Gauss computed the magnitude of this effect for the large test triangles because, as he wrote to Olbers, "the honor of science demands that one understand the nature of this inequality clearly". (The government officials who commissioned Gauss to perform the survey might have recalled Napoleon's remark that Laplace as head of the Department of the Interior had "brought the theory of the infinitely small to administration".) It is sometimes said that the "inequality" which Gauss had in mind was the possible curvature of space itself, but taken in context it seems he was referring to the curvature of the Earth's surface. On the other hand, if the curvature of space was actually great enough to be observed in optical triangles of this size, then presumably Gauss would have noticed it, so we may still credit him with having performed an empirical observation of geometry, but in this same sense every person who ever lived has made such observations. It might be more meaningful to name people who have explicitly argued against the empirical status of geometry, i.e., who have claimed that the character of spatial relations could be known

without empirical observation. In his "Critique of Pure Reason", Kant famously declared that Euclidean geometry is the only possible way in which the mind can organize information about extrinsic spatial relations. One could also cite Plato and other idealists and a priorists. On the other hand, Poincare advocated a conventionalist view of geometry, arguing that we can always, if we wish, cast our physics within a Euclidean spatial framework - provided we are prepared to make whatever adjustments in our physical laws are necessary to preserve this convention. In any case, it seems reasonable to agree with Buhler, who concludes in his biography of Gauss that "the oft-told story according to which Gauss wanted to decide the question [of whether space is perfectly Euclidean] by measuring a particularly large triangle is, as far as we know, a myth." The first person to publicly propose an actual test of the geometry of space was apparently Lobachevski, who suggested that one might "investigate a stellar triangle for an experimental resolution of the question." The "stellar triangle" he proposed was the star Sirius and two different positions of the Earth at 6-month intervals. This was used by Lobachevski as an example to show how we could place limits on the deviation from flatness of actual space, based on the fact that, in a hyperbolic space of constant curvature, there is a limit to how small a star's parallax can be, even the most distant star. Gauss had already (in private correspondence with Taurinus in 1824) defined the "characteristic length" of a hyperbolic space, which he called "k", and had derived several formulas for the properties of such a space in terms of this parameter. For example, the circumference of a circle of radius r in a hyperbolic space whose "characteristic length" is k is given by

Since sinh(x) = x + x3/3! +..., it follows that C approaches 2πr as k increases to infinity. From the fact that the maximum parallax of Sirius (as seen from the Earth at various times) is 1",24, Lobachevski deduced that the value of k for our space must be at least 166,000 times the radius of the Earth's orbit. Naturally the same analysis for more distant stars gives an even larger lower bound on k. The first definite measurement of parallax for a fixed star was performed by Friedrich Bessel (a close friend of Gauss) in 1838, on the star 61 Cygni. Shortly thereafter he measured Sirius (and discovered its binary nature). Lobachevski's first paper on "the new geometry" was presented as a lecture at Kasan in 1826, followed by publications in 1829, 1835, 1840, and 1855 (a year before his death). He presented his lower bound on "k" in the later editions based on the still fairly recent experimental results of stellar parallax measurements. In 1855 Lobachevski was completely blind, so he dictated his exposition. The other person credited with discovering non-Euclidean geometry, Janos Bolyai, was the son of Wolfgang Bolyai, who was a friend (almost the only friend) of Gauss during their school days at Gottingen in the late 1790's. The elder Bolyai had also been interested in the foundations of geometry, and spent many years trying to prove that Euclid's parallel postulate is a consequence of the other postulates. Eventually he

concluded that it had been a waste of time, and he became worried when his son Janos became interested in the same subject. The alarmed father wrote to his son For God's sake, I beseech you, give it up. Fear it no less than sensual passions because it, too, may take all your time, and deprive you of your health, peace of mind, and happiness in life. Undeterred, Janos continued to devote himself to the study of the parallel postulate, and in 1829 he succeeded in proving just the opposite of what his father (and so many others) had tried in vain to prove. Janos found (as had Gauss, Taurinnus, and Lobachevesky just a few years earlier) that Euclid's parallel postulate is not a consequence of the other postulates, but is rather an independent assumption, and that alternative but equally consistent geometries based on different assumptions may be constructed. He called this the "Absolute Science of Space", and wrote to his father that "I have created a new universe from nothing". The father then, forgetting his earlier warnings, urged Janos to publish his findings as soon as possible, noting that ...ideas pass easily from one to another, and secondly... many things have an epoch, in which they are found at the same time in several places, just as violets appear on every side in spring. Naturally the elder Bolyai sent a copy of his son's spectacular discovery to Gauss, in June of 1831, but it was apparently lost in the mail. Another copy was sent in January of 1832, and then seven weeks later Gauss sent a reply to his old friend: If I commenced by saying that I am unable to praise this work, you would certainly be surprised for a moment. But I cannot say otherwise. To praise it would be to praise myself. Indeed the whole contents of the work, the path taken by your son, the results to which he is led, coincide almost entirely with my meditations, which have occupied my mind partly for the last thirty or thirty-five years. So I remained quite stupefied. So far as my own work is concerned, of which up till now I have put little on paper, my intention was not to let it be published during my lifetime. ... I have found very few people who could regard with any special interest what I communicated to them on this subject. ...it was my idea to write down all this later so that at least it should not perish with me. It is therefore a pleasant surprise for me that I am spared this trouble, and I am very glad that it is just the son of my old friend, who takes the precedence of me in such a remarkable manner. In his later years Gauss' response to many communications of new mathematical results was similar to the above. For example, he once remarked that a paper of Abel's saved him the trouble of having to publish about a third of his results concerning elliptic integrals. Likewise he confided to friends that Jacobi and Eisenstein had "spared him the trouble" of publishing important results that he (Gauss) had possessed since he was a teenager, but had never bothered to publish. Dedekind even reports that Gauss made a similar comment about Riemann's dissertation. It's true that Gauss' personal letters and

notebooks substantiate to some extent his private claims of priority for nearly every major mathematical advance of the 19th century, but the full extent of his early and unpublished accomplishments did not become known until after his death, and in any case it wouldn't have softened the blow to his contemporaries. Janos Bolyai was so embittered by Gauss's backhanded response to his non-Euclidean geometry that he never published again. As another example of what Wolfgang Bolyai called "violets appearing on every side", Maxwell's great 1865 triumph of showing that electromagnetic waves propagate at the speed of light was, to some degree, anticipated by others. In 1848 Kirchoff had noted that the ratio of electromagnetic and electrostatic units was equal to the speed of light, although he gave no explanation for this coincidence. In 1858 Riemann presented a theory based on the hypothesis that electromagnetic effects propagate at a fixed speed, and then deduced that this speed must equal the ratio of electromagnetic and electrostatic units, i.e.,

.

Even in this field we find that Gauss can plausibly claim priority for some interesting developments. Recall that, in addition to being the foremost mathematician of his day, Gauss was also prominent in studying the phenomena of electricity and magnetism (in fact the unit of magnetism is called a Gauss), and even dabbled in electrodynamics. As mentioned in Section 3.5, he reached the conclusion that the keystone of electrodynamics would turn out to depend on an understanding of how electric effects propagate in time. In 1835 he wrote (in an unpublished papers, discovered after his death) that Two elements of electricity in a state of relative motion attract or repel one another, but not in the same way as if they are in a state of relative rest. He even suggested the following mathematical form for the complete electromagnetic force F between two particles with charges q1 and q2 in arbitrary states of motion

where r is the scalar distance, r is the vector distance, u is the relative velocity between the particles, and dots signify derivatives with respect to time. This formula actually gives the correct results for particles in uniform (inertial) motion, in which case the second derivative of the vector r is zero. However, the dot product in Gauss’s formula violates conservation of energy for general motions. A few years later (in 1845), Gauss’s friend Wilhelm Weber proposed a force law identical to Gauss’s, except he excluded the dot product, i.e., he proposed the formula

Weber pointed out that, unlike Gauss’s original formula, this force law satisfies conservation of energy, as shown by the fact that it can be derived from the potential function

In terms of this potential, the force given by F = dψ/dr is precisely Weber’s force law. Equation (1) was used by Weber as the basis of his theory of electrodynamics published in 1846. Indeed this formula served as the basis for most theoretical studies of electromagnetism until it was finally superseded by Maxwell's theory beginning in the 1870s. It’s interesting that in order for energy to be conserved it was necessary to eliminate the vectors from Gauss’s formula, making the result entirely in terms of the scalar distance and its derivatives. Compare this with the separation equations discussed in Sections 4.2 and 4.4. Note that according to (1) the condition for the force between two charged particles to vanish is that the quantity in parentheses equals zero, i.e.,

Differentiating both sides and dividing by r gives the condition , which is the same as equation (4) of Section 4.2 if we set N = 0. (The vanishing of the third derivative is also the condition for zero radiation reaction according to the Lorentz-Dirac equations of classical electrodynamics.) Interestingly, Kurt Schwarzschild published a paper in 1903 describing in detail how the Gauss-Weber approach could actually have been developed into a viable theory. In any case, if the two charged particles are separating (without rotation) at a uniform speed , Gauss' formula relates the electrostatic force 2 F0 = q1q1/r to the dynamic force as

So, to press the point, one could argue that Gauss' offhand suggestion for the formula expressing electrodynamic force already represents the seeds of Lorentz's molecular force hypothesis, from which follows the length contraction and time dilation of the Lorentz transformations and special relativity. In fact, pursuing this line of thought, Riemann (one of Gauss’ successors at Gottingen) proposed in 1858 that the electric potential should satisfy the equation

where ρ is the charge density. This equation does indeed give the retarded electrostatic

potential, which, combined with the similar equation for the vector potential, serves as the basis for the whole classical theory of electromagnetism. Assuming conservation of charge, the invariance of the Minkowski spacetime metric clearly emerges from this equation, as does the invariance of the speed of light in terms of any suitable (i.e., inertially homogeneous and isotropic) system coordinates. 8.7 Strange Meeting It seemed that out of battle I escaped Down some profound dull tunnel... Willfred Owen (1893-1918) In the summer of 1913 Einstein accepted an offer of a professorship at the University of Berlin and membership in the Prussian Academy of Sciences. He left Zurich in the Spring of 1914, and his inagural address before the Prussian Academy took place on July 2, 1914. A month later, Germany was at war with Belgium, Russia, France, and Britian. Surprisingly, the world war did not prevent Einstein from continuing his intensive efforts to generalize the theory of relativity so as to make it consistent with gravitation - but his marriage almost did. By April of 1915 he was separated from his wife Mileva and their two young sons, who had once again taken up residence in Zurich. The marriage was not a happy one, and he later wrote to his friend Besso that if he had not kept her at a distance, he would have been worn out, physically and emotionally. Besso and Fritz Haber (Einstein's close friend and colleague) both made efforts to reconcile Albert and Mileva, but without success. It was also during this period that Haber was working for the German government to develop poison gas for use in the war. On April 22, 1915 Haber directed the release of chlorine gas on the Western Front at Ypres in France. On May 23rd Italy declared war on Austria-Hungary, and subsequently against Germany itself. Meanwhile an Allied army was engaged in a disastrous campaign to take the Galipoli Peninsula from Germany's ally, the Turks. Germany shifted the weight of its armies to the Eastern Front during this period, hoping to knock Russia out of the war while fighting a holding action against the French and British in the West. In a series of huge battles from May to September the Austro-German armies drove the Russians back 300 miles, taking Poland and Lithuania and eliminating the threat to East Prussia. Despite these defeats, the Russians managed to re-form their lines and stay in the war (at least for another two years). The astronomer Kurt Schwarzschild was stationed with the German Army in the East, but still kept close watch on Einstein's progress, which was chronicled like a serialized Dickens novel in almost weekly publications of the Berlin Academy. Toward the end of 1915, having failed to drive Russia out of the war, the main German armies were shifted back to the Western Front. Falkenhayn (the chief of the German general staff) was now convinced that a traditional offensive breakthrough was not feasible, and that Germany's only hope of ultimately ending the war on favorable terms was to engage the French in a war of attrition. His plan was to launch a methodical and

sustained assault on a position that the French would feel honor-bound to defend to the last man. The ancient fortress of Verdun ("they shall not pass") was selected, and the plan was set in motion early in 1916. Falkenhayn had calculated that only one German soldier would be killed in the operation for every three French soldiers, so they would "bleed the French white" and break up the Anglo-French alliance. However, the actual casualty ratio turned out to be four Germans for every five French. By the end of 1916 a million men had been killed at Verdun, with no decisive change in the strategic position of either side, and the offensive was called off. At about the same time that Falkenhayn was formulating his plans for Verdun, on Nov 25, 1915, Einstein arrived at the final form of the field equations for general relativity. After a long and arduous series of steps (and mis-steps), he was able to announce that "finally the general theory of relativity is closed as a logical structure". Given the subtlety and complexity of the equations, one might have expected that rigorous closedform solutions for non-trivial conditions would be difficult, if not impossible, to find. Indeed, Einstein's computations of the bending of light, the precession of Mercury's orbit, and the gravitational redshift were all based on approximate solutions in the weak field limit. However, just two months later, Schwarzschild had the exact solution for the static isotropic field of a mass point, which Einstein presented on his behalf to the Prussian Academy on January 16, 1916. Sadly, Schwarzschild lived only another four months. He became ill at the front and died on May 11 at the age of 42. It's been said that Einstein was scandalized by Schwarzschild's solution, for two reasons. First, he still imagined that the general theory might be the realization of Mach's dream of a purely relational theory of motion, and Einstein realized that the fixed spherically symmetrical spacetime of a single mass point in an otherwise empty universe is highly non-Machian. That such a situation could correspond to a rigorous solution of his field equations came as something of a shock, and probably contributed to his eventual rejection of Mach's ideas and positivism in general. Second, the solution found by Schwarzschild - which was soon shown by Birkhoff to be the unique spherically symmetric solution to the field equations (barring a non-zero cosmological constant) contained what looked like an unphysical singularity. Of course, since the source term was assumed to be an infinitesimal mass point, a singularity at r = 0 is perhaps not too surprising (noting that Newton's inverse square law is also singular at r = 0). However, the Schwarzschild solution was also (apparently) singular at r = 2m, where m is the mass of the gravitating object in geometric units. Einstein and others argued that it wasn't physically realistic for a configuration of particles of total mass M to reside within their joint Schwarzschild radius r = 2m, and so this "singularity" cannot exist in reality. However, subsequent analyses have shown that (barring some presently unknown phenomenon) there is nothing to prevent a sufficiently massive object from collapsing to within its Schwarzschild radius, so it's worthwhile to examine the formal singularity at r = 2m to understand its physical significance. We find that the spacetime manifold at this boundary need not be considered as singular, because it can be shown that the singularity is removable, in the sense that all the invariant measures of the field smoothly approach fixed finite values as r approaches 2m from

either direction. Thus we can analytically continue the solution through the singularity. Now, admittedly, describing the Schwarzschild boundary as an "analytically removable singularity" is somewhat unorthodox. It's customary to assert that the Schwarzschild solution is unequivocally non-singular at r = 2m, and that the intrinsic curvature and proper time of a free-falling object are finite and well-behaved at that radius. Indeed we derived these facts in Section 6.4. However, it's worth remembering that even with respect to the proper frame of an infalling test particle, we found that there remains a formal singularity at r = 2m. (See the discussion following equation 5 of Section 6.4.) The free-falling coordinate system does not remove the singularity, but it makes the singularity analytically removable. Similarly our derivation in Section 6.4 of the intrinsic curvature K of the Schwarzschild solution at r = 2m tacitly glossed over the intermediate result

Strictly speaking, the middle term on the right side is 0/0 (i.e., undefined) at r = 2m. Of course, we can divide the numerator and denominator by (r2m), but this step is unambiguously valid only if (r-2m) is not equal to zero. If (r-2m) does equal zero, this cancelation is still possible, but it amounts to the analytic removal of a singularity. In addition, once we have removed this singularity, the resulting term is infinite, formally equal to the third term, which is also infinite, but with opposite sign. We then proceed to subtract the infinite third term from the infinite second term to arrive at the innocuouslooking finite result K = -2m/r3 at r = 2m. Granted, the form of the metric coefficients and their derivatives depends on the choice of coordinates, and in a sense we can attribute the troublesome behavior of the metric components at r = 2m to the unsuitability of the traditional Schwarzschild coordinates r,t at this location. From this we might be tempted to conclude that the Schwarzschild radius has no physical significance. This is true locally, but globally the Schwarzschild radius is physically significant, as the event horizon between two regions of the manifold. Hence it isn't surprising that, in terms of the r,t coordinates, we encounter singularities and infinities, because these coordinates are globally unique, viz., the Schwarzschild coordinate t is the essentially unique time coordinate for which the manifold is globally static. Interestingly, the solution in Schwarzschild's 1916 paper was not presented in terms of what we today call Schwarzschild coordinates. Those were introduced a year later by Droste. Schwarzschild presented a line element that is formally identical to the one for which he is know, viz,

In this formula the coordinates t, θ, and ϕ have their usual meanings, and the parameter α is to be identified with 2m as usual. However, he did not regard "R" as the physically

significant radial distance from the center of the field. He begins by declaring a set of rectangular space coordinates x,y,z, and then defines the radial parameter r such that r2 = x2 + y2 + z2 Accordingly he relates these parameters to the angular coordinates θ, and ϕ by the usual polar definitions

He wishes to make use of the truncated field equations

which (as discussed in Section 5.8) requires that the determinant of the metric be constant. Remember that this was written in 1915 (formally conveyed by Einstein to the Prussian academy on 13 January 1916), and apparently Schwarzschild was operating under the influence of Einstein's conception of the condition g=-1 as a physical principle, rather than just a convenience enabling the use of the truncated field equations. In any case, this is the form that Schwarzschild set out to solve, and he realized that the metric components of the most general spherically symmetrical static polar line element

where f and h are arbitrary functions of r has the determinant g = f(r) h(r) r4sin(θ)2. (Schwarzschild actually included an arbitrary function of r on the angular terms of the line element, but that was superfluous.) To simplify the determinant condition he introduces the transformation

from which we get the differentials

Substituting these into the general line element gives the transformed line element

which has the determinant g = f(r)h(r). Schwarzschild then requires this to equal -1, so his derivation essentially assumes a priori that h(r) = 1/f(r). Interestingly, with this

assumption it's easy to see that there is really only one function f(r) that can yield Kepler's laws of motion, as discussed in Section 5.5. Hence it could be argued that the field equations were superfluous to the determination of the spherically symmetrical static spacetime metric. On the other hand, the point of the exercise was to verify that this one physically viable metric is actually a solution of the field equations, thereby supporting their general applicability. In any case, noting that r = (3x1)1/3 and sin(θ)2 = 1  (x2)2, and with the stipulation that h(r) = 1/f(r), and that the metric go over to the Minkowski metric as r goes to infinity, Schwarzschild essentially showed that Einstein's field equations are satisfied by the above line element if f(r) = 1  α/r where α is a constant of integration that "depends on the value of the mass at the origin". Naturally we take α = 2m for agreement with observation in the Newtonian limit. However, in the process of integrating the conditions on f(r) there appears another constant of integration, which Schwarzschild calls ρ. So the general solution is actually

We ordinarily take α = 2m and ρ = 0 to give the usual result f(r) = 1  α/r, but Schwarzschild was concerned to impose an additional constraint on the solution (beyond spherical symmetry, staticality, asymptotic flatness, and the field equations), which he expressed as "continuity of the [metric coefficients], except at r = 0". The metric coefficient h(r) = 1/f(r) is obviously discontinuous when f(r) vanishes, which is to say when r3 + ρ = α3. With the usual choice ρ = 0 this implies that the metric is discontinuous when r = α = 2m, which of course it is. This is the infamous Schwarzschild radius, where the usual Schwarzschild time coordinate becomes singular, representing the event horizon of a black hole. In retrospect, Schwarzschild's requirement for "continuity of the metric coefficients" is obviously questionable, since a discontinuity or singularity of a coordinate system is not generally indicative of a singularity in the manifold - the classical example being the singularity of polar coordinates at the North pole. Probably Schwarzschild meant to impose continuity on the manifold itself, rather than on the coordinates, but as Einstein remarked, "it is not so easy to free one's self from the idea that coordinates must have a direct metric significance". It's also somewhat questionable to impose continuity and absence of singularities except at the origin, because if this is a matter of principle, why should there be an exception, and why at the "origin" of the spherically symmetrical coordinate system? Nevertheless, following along with Schwarzschild's thought, he obviously needs to require that the equality r3 + ρ = α3 be satisfied only when r = 0, which implies ρ = α3. Consequently he argues that the expression (r3 + ρ)1/3 should not be reduced to r. Instead, he defines the parameter R = (r3 + ρ)1/3, in terms of which the metric has the familiar form (1). Of course, if we put ρ = 0 then R = r and equation (1) reduces to the usual form of the Schwarzschild/Droste solution. However, with ρ = α3 we appear to have a physically distinct result, free of any coordinate singularity except at r = 0, which corresponds to the

location R = α. The question then arises as to whether this is actually a physically distinct solution from the usual one. From the definitions of the quasi-orthogonal coordinates x,y,z we see that x = y = z = 0 when r = 0, but of course the x,y,z coordinates also take on negative values at various points of the manifold, and nothing prevents us from extending the solution to negative values of the parameter r, at least not until we arrive at the condition R = 0, which corresponds to r = α. At this location it can be shown that we have a genuine singularity in the manifold, because the curvature scalar becomes infinite. In terms of these coordinates the entire surface of the Schwarzschild horizon has the same spatial coordinates x = y = z = 0, but nothing prevents us from passing through this point into negative values of r. It may seem that by passing into negative values of x,y,z we are simply increasing r again, but this overlooks the duality of solutions to

The distinction between the regions of positive and negative r is clearly shown in terms of polar coordinates, because the point in the equatorial plane with polar coordinates r,0 need not be identified with the point r,π. Essentially polar coordinates cover two separate planes, one with positive r and the other with negative r, and the only smooth path between them is through the boundary point r = 0. According to Schwarzschild's original conception of the coordinates, this boundary point is the event horizon, whereas the physical singularity in the manifold occurs at the surface of a sphere whose radius is r = 2m. In other words, the singularity at the "center" of the Schwarzschild solution occurs just on the other side of the boundary point r = 0 of these polar coordinates. We can shift this boundary point arbitrarily by simply shifting the "zero point" of the complete r scale, which actually extends from - to +. However, none of this changes any of the proper intervals along any physical paths, because those are invariant under arbitrary (diffeomorphic) transformations. So Schwarzschild's version of the solution is not physically distinct from the usual interpretation introduced by Droste in 1917. It's interesting that as late as 1936 (two decades after Schwarzschild's death) Einstein proposed to eliminate the coordinate singularity in the (by then) conventional interpretation of the Schwarzschild solution by defining a radial coordinate ρ in terms of the Droste coordinate r by the relation ρ2 = r  2m. In terms of this coordinate the line element is

Einstein notes that as ρ ranges from - to + the corresponding values of r range from + down to 2m and them back to +, so he conceives of the complete solution as two identical sheets of physical space connected by the "bridge" at the boundary ρ = 0, where r = 2m and the determinant of the metric vanishes. This is called the Einstein-Rosen bridge. For values of r less than 2m he argues that "there are no corresponding real

values of ρ". On this basis he asserts that the region r < 2m has been excluded from the solution. However, this is really just another re-expression of the original Schwarzschild solution, describing the "exterior" portions of the solution, but neglecting the interior portion, where ρ is imaginary. However, just as we can allow Schwarzschild's r to take on negative values, we can allow Einstein's ρ to take on imaginary values. The maximal analytic extension of the Schwarzschild solution necessarily includes the interior region, and it can't be eliminated simply by a change of variables. Ironically, the reason the manifold seems to be well-behaved across Einstein's "bridge" between the two exterior regions while jumping over the interior region is precisely that the ρ coordinate is locally ill-behaved at ρ = 0. Birkhoff proved that the Schwarzschild solution is the unique spherically symmetrical solution of the field equations, and it has been shown that the maximal analytic extension of this solution (called the Kruskal extension) consists of two exterior regions connected by the internal region, and contains a genuine manifold singularity. On the other hand, just because the maximally extended Schwarzschild solution satisfies the field equations, it doesn't necessarily follow that such a thing exists. In fact, there is no known physical process that would produce this configuration, since it requires two asymptotically flat regions of spacetime that happen to become connected at a singularity, and there is no reason to believe that such a thing would ever happen. In contrast, it's fairly plausible that some part of the complete Schwarzschild solution could be produced, such as by the collapse of a sufficiently massive star. The implausibility of the maximally extended solutions doesn't preclude the existence of black holes - although it does remind us to be cautious about assuming the actual existence of things just because they are solutions of the field equations. Despite the implausibility of an Einstein-Rosen bridge connecting two distinct sheets of spacetime, this idea has recently gained widespread attention, the term "bridge" having been replaced with "wormhole". It's been speculated that under certain conditions it might be possible to actually traverse a wormhole, passing from one region of spacetime to another. As discussed above this is definitely not possible for the Schwarzschild solution, because of the unavoidable singularity, but people have recently explored the possibilities of traversable wormholes. Naturally if such direct conveyance between widely separate regions of spacetime were possible, and if those regions were also connected by (much longer) ordinary timelike paths, this raises the prospect of various kinds of "time travel", assuming a wormhole connected to the past was somehow established and maintained. However, these rather far-fetched scenarios all rely on the premise of negative energy density, which of course violates so-called "null energy condition", not to mention the weak, strong, and dominant energy conditions of classical relativity. In other words, on the basis of classical relativity and the traditional energy conditions we could rule out traversable wormholes altogether. It is only the fact that some quantum phenomena do apparently violate these energy conditions (albeit very slightly) that leaves open the remote possibility of such things.

8.8 Who Invented Relativity? All beginnings are obscure. H. Weyl There have been many theories of relativity throughout history, from the astronomical speculations of Heraclides to the geometry of Euclid to the classical theory of space, time, and dynamics developed by Galileo, Newton and others. Each of these was based on one or more principle of relativity. However, when we refer to the “theory of relativity” today, we usually mean one particular theory of relativity, namely, the body of ideas developed near the beginning of the 20th century and closely identified with the work of Albert Einstein. These ideas are distinguished from previous theories not by relativity itself, but by the way in which relativistically equivalent coordinate systems are related to each other. One of the interesting historical aspects of the modern relativity theory is that, although often regarded as the highly original and even revolutionary contribution of a single individual, almost every idea and formula of the theory had been anticipated by others. For example, Lorentz covariance and the inertia of energy were both (arguably) implicit in Maxwell’s equations. Also, Voigt formally derived the Lorentz transformations in 1887 based on general considerations of the wave equation. In the context of electro-dynamics, Fitzgerald, Larmor, and Lorentz had all, by the 1890s, arrived at the Lorentz transformations, including all the peculiar "time dilation" and "length contraction" effects (with respect to the transformed coordinates) associated with Einstein's special relativity. By 1905, Poincare had clearly articulated the principle of relativity and many of its consequences, had pointed out the lack of empirical basis for absolute simultaneity, had challenged the ontological significance of the ether, and had even demonstrated that the Lorentz transformations constitute a group in the same sense as do Galilean transformations. In addition, the crucial formal synthesis of space and time into spacetime was arguably the contribution of Minkowski in 1907, and the dynamics of special relativity were first given in modern form by Lewis and Tolman in 1909. Likewise, the Riemann curvature and Ricci tensors for n-dimensional manifolds, the tensor formalism itself, and even the crucial Bianchi identities, were all known prior to Einstein’s development of general relativity in 1915. In view of this, is it correct to regard Einstein as the sole originator of modern relativity? The question is complicated by the fact that relativity is traditionally split into two separate theories, the special and general theories, corresponding to the two phases of Einstein's historical development, and the interplay between the ideas of Einstein and those of his predecessors and contemporaries are different in the two cases. In addition, the title of Einstein’s 1905 paper (“On the Electrodynamics of Moving Bodies”) encouraged the idea that it was just an interpretation of Lorentz's theory of electrodynamics. Indeed, Wilhelm Wein proposed that the Nobel prize of 1912 be awarded jointly to Lorentz and Einstein, saying

The principle of relativity has eliminated the difficulties which existed in electrodynamics and has made it possible to predict for a moving system all electrodynamic phenomena which are known for a system at rest... From a purely logical point of view the relativity principle must be considered as one of the most significant accomplishments ever achieved in theoretical physics... While Lorentz must be considered as the first to have found the mathematical content of relativity, Einstein succeeded in reducing it to a simple principle. One should therefore assess the merits of both investigators as being comparable. As it happens, the physics prize for 1912 was awarded to the Nils Gustaf Dalen (for the "invention of automatic regulators for lighting coastal beacons and light buoys during darkness or other periods of reduced visibility"), and neither Einstein, Lorentz, nor anyone else was ever awarded a Nobel prize for either the special or general theories of relativity. This is sometimes considered to have been an injustice to Einstein, although in retrospect it's conceivable that a joint prize for Lorentz and Einstein in 1912, as Wein proposed, assessing "the merits of both investigators as being comparable", might actually have diminished Einstein's subsequent popular image as the sole originator of both special and general relativity. On the other hand, despite the somewhat misleading title of Einstein’s paper, the second part of the paper (“The Electrodynamic Part”) was really just an application of the general theoretical framework developed in the first part of the paper (“The Kinematic Part”). It was in the first part that special relativity was founded, with consequences extending far beyond Lorentz's electrodynamics. As Einstein later recalled, The new feature was the realization that the bearing of the Lorentz transformation transcended its connection with Maxwell's equations and was concerned with the nature of space and time in general. To give just one example, we may note that prior to the advent of special relativity the experimental results of Kaufmann and others involving the variation of an electron’s mass with velocity were thought to imply that all of the electron’s mass must be electromagnetic in origin, whereas Einstein’s kinematics revealed that all mass – regardless of its origin – would necessarily be affected by velocity in the same way. Thus an entire research program, based on the belief that the high-speed behavior of objects represented dynamical phenomena, was decisively undermined when Einstein showed that the phenomena in question could be interpreted much more naturally on a purely kinematic basis. Now, if this interpretation applied only to electrodynamics, it’s significance might be debatable, but already by 1905 it was clear that, as Einstein put it, “the Lorentz transformation transcended its connection with Maxwell’s equations”, and must apply to all physical phenomena in order to account for the complete inability to detect absolute motion. Once this is recognized, it is clear that we are dealing not just with properties of electricity and magnetism, or any other specific entities, but with the nature of space and time themselves. This is the aspect of Einstein's 1905 theory that prompted Witkowski, after reading vol. 17 of Annalen der Physic, to exclaim: "A new Copernicus is born! Read Einstein's paper!" The comparison is apt, because the

contribution of Copernicus was, after all, essentially nothing but an interpretation of Ptolemy’s astronomy, just as Einstein's theory was an interpretation of Lorentz's electrodynamics. Only subsequently did men like Kepler, Galileo, and Newton, taking the Copernican insight even more seriously than Copernicus himself had done, develop a substantially new physical theory. It's clear that Copernicus was only one of several people who jointly created the "Copernican revolution" in science, and we can argue similarly that Einstein was only one of several individuals (including Maxwell, Lorentz, Poincare, Planck, and Minkowski) responsible for the "relativity revolution". The historical parallel between special relativity and the Copernican model of the solar system is not merely superficial, because in both cases the starting point was a preexisting theoretical structure based on the naive use of a particular system of coordinates lacking any inherent physical justification. On the basis of these traditional but eccentric coordinate systems it was natural to imagine certain consequences, such as that both the Sun and the planet Venus revolve around a stationary Earth in separate orbits. However, with the newly-invented telescope, Galileo was able to observe the phases of Venus, clearly showing that Venus moves in (roughly) a circle around the Sun. In this way the intrinsic patterns of the celestial bodies became better understood, but it was still possible (and still is possible) to regard the Earth as stationary in an absolute extrinsic sense. In fact, for many purposes we continue to do just that, but from an astronomical standpoint we now almost invariably regard the Sun as the "center" of the solar system. Why? The Sun too is moving among the stars in the galaxy, and the galaxy itself is moving relative to other galaxies, so on what basis do we decide to regard the Sun as the "center" of the solar system? The answer is that the Sun is the inertial center. In other words, the Copernican revolution (as carried to its conclusion by the successors of Copernicus) can be summarized as the adoption of inertia as the prime organizing principle for the understanding and description of nature. The concept of physical inertia was clearly identified, and the realization of its significance evolved and matured through the works of Kepler, Galileo, Newton, and others. Nature is most easily and most perspicuously described in terms of inertial coordinates. Of course, it remains possible to adopt some non-inertial system of coordinates with respect to which the Earth can be regarded as the stationary center, but there is no longer any imperative to do this, especially since we cannot thereby change the fact that Venus circles the Sun, i.e., we cannot change the intrinsic relations between objects, and those intrinsic relations are most readily expressed in terms of inertial coordinates. Likewise the pre-existing theoretical structure in 1905 described events in terms of coordinate systems that were not clearly understood and were lacking in physical justification. It was natural within this framework to imagine certain consequences, such as anisotropy in the speed of light, i.e., directional dependence of light speed resulting from the Earth's motion through the (assumed stationary) ether. This was largely motivated by the idea that light consists of a wave in the ether, and therefore is not an inertial phenomenon. However, experimental physicists in the late 1800's began to discover facts analogous to the phases of Venus, e.g., the symmetry of electromagnetic

induction, the "partial convection" of light in moving media, the isotropy of light speed with respect to relatively moving frames of reference, and so on. Einstein accounted for all these results by showing that they were perfectly natural if things are described in terms of inertial coordinates - provided we apply a more profound understanding of the definition and physical significance of such coordinate systems and the relationships between them. As a result of the first inertial revolution (initiated by Copernicus), physicists had long been aware of the existence of a preferred class of coordinate systems - the inertial systems - with respect to which inertial phenomena are isotropic. These systems are equivalent up to orientation and uniform motion in a straight line, and it had always been tacitly assumed that the transformation from one system in this class to another was given by a Galilean transformation. The fundamental observations in conflict with this assumption were those involving electric and magnetic fields that collectively implied Maxwell's equations of electromagnetism. These equations are not invariant under Galilean transformations, but they are invariant under Lorentz transformations. The discovery of Lorentz invariance was similar to the discovery of the phases of Venus, in the sense that it irrevocably altered our awareness of the intrinsic relations between events. We can still go on using coordinate systems related by Galilean transformations, but we now realize that only one of those systems (at most) is a truly inertial system of coordinates. Incidentally, the electrodynamic theory of Lorentz was in some sense analogous to Tycho Brahe's model of the solar system, in which the planets revolve around the Sun but the Sun revolves around a stationary Earth. Tycho's model was kinematically equivalent to Copernicus' Sun-centered model, but expressed – awkwardly – in terms of a coordinate system with respect to which the Earth is stationary, i.e., a non-inertial coordinate system. It's worth noting that we define inertial coordinates just as Galileo did, i.e., systems of coordinates with respect to which inertial phenomena are isotropic, so our definition hasn't changed. All that has changed is our understanding of the relations between inertial coordinate systems. Einstein's famous "synchronization procedure" (which was actually first proposed by Poincare) was expressed in terms of light rays, but the physical significance of this procedure is due to the empirical fact that it yields exactly the same synchronization as does Galileo's synchronization procedure based on mechanical inertia. To establish simultaneity between spatially separate events while floating freely in empty space, throw two identical objects in opposite directions with equal force, so that the thrower remains stationary in his original frame of reference. These objects then pass equal distances in equal times, i.e., they serve to assign inertially simultaneous times to separate events as they move away from each other. In this way we can theoretically establish complete slices of inertial simultaneity in spacetime, based solely on the inertial behavior of material objects. Someone moving uniformly relative to us can carry out this same procedure with respect to his own inertial frame of reference and establish his own slices of inertial simultaneity throughout spacetime. The unavoidable intrinsic relations that were discovered at the end of the 19th century show that these two sets of simultaneity slices are not identical. The two main approaches to the interpretation of

these facts were discussed in Sections 1.5 and 1.6. The approach advocated by Einstein was to adhere to the principle of inertia as the basis for organizing our understanding and descriptions of physical phenomena - which was certainly not a novel idea. In his later years Einstein observed "there is no doubt that the Special Theory of Relativity, if we regard its development in retrospect, was ripe for discovery in 1905". The person (along with Lorentz) who most nearly anticipated Einstein's special relativity was undoubtedly Poincare, who had already in 1900 proposed an explicitly operational definition of clock synchronization and in 1904 suggested that the ether was in principle undetectable to all orders of v/c. Those two propositions and their consequences essentially embody the whole of special relativity. Nevertheless, as late as 1909 Poincare was not prepared to say that the equivalence of all inertial frames combined with the invariance of (two-way) light speed were sufficient to infer Einstein's model. He maintained that one must also stipulate a particular contraction of physical objects in their direction of motion. This is sometimes cited as evidence that Poincare still failed to understand the situation, but there's a sense in which he was actually correct. The two famous principles of Einstein's 1905 paper are not sufficient to uniquely identify special relativity, as Einstein himself later acknowledged. One must also stipulate, at the very least, homogeneity, memorylessness, and isotropy. Of these, the first two are rather innocuous, and one could be forgiven for failing to explicitly mention them, but not so the assumption of isotropy, which serves precisely to single out Einstein's simultaneity convention from all the other - equally viable - interpretations. (See Section 4.5). This is also precisely the aspect that is fixed by Poincare's postulate of contraction as a function of velocity. In a sense, the failure of Poincare to found the modern theory of relativity was not due to a lack of discernment on his part (he clearly recognized the Lorentz group of space and time transformations), but rather to an excess of discernment and philosophical sophistication, preventing him from subscribing to the young patent examiner's inspired but perhaps slightly naive enthusiasm for the symmetrical interpretation, which is, after all, only one of infinitely many possibilities. Poincare recognized too well the extent to which our physical models are both conventional and provisional. In retrospect, Poincare's scruples have the appearance of someone arguing that we could just as well regard the Earth rather than the Sun as the center of the solar system, i.e., his reservations were (and are) technically valid, but in some sense misguided. Also, as Max Born remarked, to the end of Poincare’s life his expositions of relativity “definitely give you the impression that he is recording Lorentz’s work”, and yet “Lorentz never claimed to be the author of the principle of relativity”, but invariably attributed it to Einstein. Indeed Lorentz himself often expressed reservations about the relativistic interpretation. Regarding Born’s impression that Poincare was just “recording Lorentz’s work”, it should be noted that Poincare habitually wrote in a self-effacing manner. He named many of his discoveries after other people, and expounded many important and original ideas in writings that were ostensibly just reviewing the works of others, with “minor amplifications and corrections”. So, we shouldn’t be misled by Born’s impression. Poincare always gave the impression that he was just recording someone else’s work – in

contrast with Einstein, whose style of writing, as Born said, “gives you the impression of quite a new venture”. Of course, Born went on to say, when recalling his first reading of Einstein’s paper in 1907, “Although I was quite familiar with the relativistic idea and the Lorentz transformations, Einstein’s reasoning was a revelation to me… which had a stronger influence on my thinking than any other scientific experience”. Lorentz’s reluctance to fully embrace the relativity principle (that he himself did so much to uncover) is partly explained by his belief that "Einstein simply postulates what we have deduced... from the equations of the electromagnetic field". If this were true, it would be a valid reason for preferring Lorentz's approach. However, if we closely examine Lorentz's electron theory we find that full agreement with experiment required not only the invocation of Fitzgerald's contraction hypothesis, but also the assumption that mechanical inertia is Lorentz covariant. It's true that, after Poincare complained about the proliferation of hypotheses, Lorentz realized that the contraction could be deduced from more fundamental principles (as discussed in Section 1.5), but this was based on yet another hypothesis, the co-called molecular force hypothesis, which simply asserts that all physical forces and configurations (including the unknown forces that maintain the shape of the electron) transform according to the same laws as do electromagnetic forces. Needless to say, it obviously cannot follow deductively "from the equations of the electromagnetic field" that the necessarily non-electromagnetic forces which hold the electron together must transform according to the same laws. (Both Poincare and Einstein had already realized by 1905 that the mass of the electron cannot be entirely electromagnetic in origin.) Even less can the Lorentz covariance of mechanical inertia be deduced from electromagnetic theory. We still do not know to this day the origin of inertia, so there is no sense in which Lorentz or anyone else can claim to have deduced Lorentz covariance in any constructive sense, let alone from the laws of electromagnetism. Hence Lorentz's molecular force hypothesis and his hypothesis of covariant mechanical inertia together are simply a disguised and piece-meal way of postulating universal Lorentz invariance - which is precisely what Lorentz claims to have deduced rather than postulated. The whole task was to reconcile the Lorentzian covariance of electromagnetism with the Galilean covariance of mechanical dynamics, and Lorentz simply recognized that one way of doing this is to assume that mechanical dynamics (i.e., inertia) is actually Lorentz covariant. This is presented as an explicit postulate (not a deduction) in the final edition of his book on the Electron Theory. In essence, Lorentz’s program consisted of performing a great deal of deductive labor, at the end of which it was still necessary, in order to arrive at results that agreed with experiment, to simply postulate the same principle that forms the basis of special relativity. (To his credit, Lorentz candidly acknowledged that his deductions were "not altogether satisfactory", but this is actually an understatement, because in the end he simply postulated what he claimed to have deduced.) In contrast, Einstein recognized the necessity of invoking the principle of relativity and Lorentz invariance at the start, and then demonstrated that all the other "constructive" labor involved in Lorentz's approach was superfluous, because once we have adopted

these premises, all the experimental results arise naturally from the simple kinematics of the situation, with no need for molecular force hypotheses or any other exotic and dubious conjectures regarding the ultimate constituency of matter. On some level Lorentz grasped the superiority of the purely relativistic approach, as is evident from the words he included in the second edition of his "Theory of Electrons" in 1916: If I had to write the last chapter now, I should certainly have given a more prominent place to Einstein's theory of relativity by which the theory of electromagnetic phenomena in moving systems gains a simplicity that I had not been able to attain. The chief cause of my failure was my clinging to the idea that the variable t only can be considered as the true time, and that my local time t' must be regarded as no more than an auxiliary mathematical quantity. Still, it's clear that neither Lorentz nor Poincare ever whole-heartedly embraced special relativity, for reasons that may best be summed up by Lorentz when he wrote Yet, I think, something may also be claimed in favor of the form in which I have presented the theory. I cannot but regard the aether, which can be the seat of an electromagnetic field with its energy and its vibrations, as endowed with a certain degree of substantiality, however different it may be from all ordinary matter. In this line of thought it seems natural not to assume at starting that it can never make any difference whether a body moves through the aether or not, and to measure distances and lengths of time by means of rods and clocks having a fixed position relatively to the aether. This passage implies that Lorentz's rationale for retaining a substantial aether and attempting to refer all measurements to the rest frame of this aether (without, of course, specifying how that is to be done) was the belief that it might, after all, make some difference whether a body moves through the aether or not. In other words, we should continue to look for physical effects that violate Lorentz invariance (by which we now mean local Lorentz invariance), both in new physical forces and at higher orders of v/c for the known forces. A century later, our present knowledge of the weak and strong nuclear forces and the precise behavior of particles at 0.99999c has vindicated Einstein's judgment that Lorentz invariance is a fundamental principle whose significance and applicability extends far beyond Maxwell's equations, and apparently expresses a general attribute of space and time, rather than a specific attribute of particular physical entities. In addition to the formulas expressing the Lorentz transformations, we can also find precedents for other results commonly associated with special relativity, such as the equivalence of mass and energy. In fact, the general idea of associating mass with energy in some way had been around for about 25 years prior to Einstein's 1905 papers. Indeed, as Thomson and even Einstein himself noted, this association is already implicit in Maxwell's theory. With electric and magnetic fields e and b, the energy density is (e2 + b2)/(8π) and the momentum density is (e x b)/(4πc), so in the case of radiation (when e and b are equal and orthogonal) the energy density is E = e2/(4π) and the momentum density is p = e2/(4πc). Taking momentum p as the product of the radiation's "mass" m

times its velocity c, we have

and so E = mc2. Indeed, in the 1905 paper containing his original deduction of massenergy equivalence, Einstein acknowledges that it was explicitly based on "Maxwell's expression for the electromagnetic energy of space". We can also mention the pre-1905 work of Poincare and others on the electron mass arising from it's energy, and the work of Hasenohrl on how the mass of a cavity increases when it is filled with radiation. However, these suggestions were all very restricted in their applicability, and didn't amount to the assertion of a fundamental equivalence such as emerges so clearly from Einstein's relativistic interpretation. Hardly any of the formulas in Einstein's two 1905 papers on relativity were new, but what Einstein provided was a single conceptual framework within which all those formulas flow quite naturally from a simple set of general principles. Occasionally one hears of other individuals who are said to have discovered one or more aspect of relativity prior to Einstein. For example, in November of 1999 there appeared in newspapers around the world a story claiming that "The mathematical equation that ushered in the atomic age was discovered by an unknown Italian dilettante two years before Albert Einstein used it in developing the theory of relativity...". The "dilettante" in question was named Olinto De Pretto, and the implication of the story was that Einstein got the idea for mass-energy equivalence from "De Pretto's insight". There are some obvious difficulties with this account, only some of which can be blamed on the imprecision of popular journalism. First, the story claimed that Einstein used the idea of mass-energy equivalence to develop special relativity, whereas in fact the suggestion that energy has inertia appeared in a very brief note that Einstein submitted for publication toward the end of 1905, after the original paper on special relativity. The report went on to say that "De Pretto had stumbled on the equation, but not the theory of relativity... It was republished in 1904 by Veneto's Royal Science Institute... A Swiss Italian named Michele Besso alerted Einstein to the research and in 1905 Einstein published his own work..." Now, it's certainly true that Besso was Italian, and worked with Einstein at the Bern Patent Office during the years leading up to 1905, and it's true that they discussed physics, and Besso provided Einstein with suggestions for reading (for example, it was Besso who introduced him to the works of Ernst Mach). However, the idea that Einstein's second relativity paper in 1905 (let alone the first) was in any way prompted by De Pretto's obscure and unfounded comments is bizarre. In essence, De Pretto's "insight" was the (hardly novel) idea that matter consists of tiny particles (of what, he does not say), agitated by their exposure to the ultra-mundane ether particles of Georges Le Sage's "shadow theory" of gravity. Since the particles in every aggregate of matter are in motion, every quantity of mass contains an amount of energy equal to Leibniz's "vis viva", the living force, which Leibniz defined as mv2. Oddly enough, De Pretto seems to have been under the impression that mv2 was the kinetic

energy of macroscopic bodies moving at the speed v. On this (erroneous) basis, and despite the fact that De Pretto did not regard the speed of light as a physically limiting speed, he noted that Le Sage's ether particles were thought to move at approximately the speed of light, and so (he reasoned) the particles comprising a stationary aggregate of matter may also be vibrating internally at the speed of light. In that case, the vis viva of each quantity of mass m would be mc2, which, he alertly noted, is a lot of energy. Needless to say, this bears no resemblance at all to the path that Einstein actually followed to mass-energy equivalence. Moreover, there were far more accessible and authoritative sources available to him for the idea of mass-energy equivalence, including Thomson, Lorentz, Poincare, etc. (not to mention Isaac Newton, who famously asked "Are not gross bodies and light convertible into one another...?"). After all, the idea that the electron's mass was electromagnetic in origin was one of the leading hypotheses of research at that time. It would be like saying that some theoretical physicist today had never heard of string theory! Also, the story requires us to believe that Einstein got this information after submitting the paper on Electrodynamics of Moving Bodies in the summer of 1905 (which contained the complete outline of special relativity but no mention of E = mc2) but prior to submitting the follow-up note just a few months later. Reader's can judge for themselves from a note that Einstein wrote to his close friend Conrad Habicht as he was preparing the massenergy paper whether this idea was prompted by the inane musings of an obscure Italian dilettante on Leibnizian vis viva: One more consequence of the paper on electrodynamics has also occurred to me. The principle of relativity, in conjunction with Maxwell's equations, requires that mass be a direct measure of the energy contained in a body; light carries mass with it. A noticeable decrease of mass should occur in the case of radium [as it emits radiation]. The argument [which he intends to present in the paper] is amusing and seductive, but for all I know the Lord might be laughing over it and leading me around by the nose. These are clearly the words of someone who is genuinely working out the consequences of his own recent paper, and wondering about their validity, not someone who has gotten an idea from seeing a formula in someone else's paper. Of course, the most obvious proof that special relativity did not arise from any Leibnizian or Le Sagean ideas is simply the wonderfully lucid thought process presented by Einstein in his 1905 paper, beginning from first principles and a careful examination of the physical significance of time and space, and leading to the kinematics of special relativity, from which the inertia of energy follows naturally. Nevertheless, we shouldn't underestimate the real contributions to the development of special relativity made by Einstein's predecessors, most notably Lorentz and Poincare. In addition, although Einstein was remarkably thorough in his 1905 paper, there were nevertheless important contributions to the foundations of special relativity made by others in the years that followed. For example, in 1907 Max Planck greatly clarified relativistic mechanics, basing it on the conservation of momentum with his "more

advantageous" definition of force, as did Tolman and Lewis. Planck also critiqued Einstein's original deduction of mass-energy equivalence, and gave a more general and comprehensive argument. (This led Johannes Stark in 1907 to cite Planck as the originator of mass-energy equivalence, prompting an angry letter from Einstein saying that he "was rather disturbed that you do not acknowledge my priority with regard to the connection between mass and energy". In later years Stark became an outspoken critic of Einstein's work.) Another crucially important contribution was made by Hermann Minkowski (one of Einstein's former professors), who recognized that what Einstein had described was simply ordinary kinematics in a four-dimensional spacetime manifold with the pseudometric (dτ)2 = (dt)2  (dx)2  (dy)2  (dz)2 Poincare had also recognized this as early as 1905. This was vital for the generalization of relativity which Einstein – with the help of his old friend Marcel Grossmann – developed on the basis on the theory of curved manifolds developed in the 19th century by Gauss and Riemann. The tensor calculus and generally covariant formalism employed by Einstein in his general theory had been developed by Gregorio Ricci-Curbastro and Tullio Levi-Civita around 1900 at the University of Padua, building on the earlier work of Gauss, Riemann, Beltrami, and Christoffel. In fact, the main technical challenge that occupied Einstein in his efforts to find a suitable field law for gravity, which was to construct from the metric tensor another tensor whose covariant derivative automatically vanishes, had already been solved in the form of the Bianchi identities, which lead directly to the Einstein tensor as discussed in Section 5.8. Several other individuals are often cited as having anticipated some aspect of general relativity, although not in any sense of contributing seriously to the formulation of the theory. John Mitchell wrote in 1783 about the possibility of "dark stars" that we so massive light could not escape from them, and Laplace contemplated the same possibility in 1796. Around 1801 Johann von Soldner predicted that light rays passing near the Sun would be deflected by the Sun’s gravity, just like a small corpuscle of matter moving at the speed of light. (Ironically, although Newton’s theory implies a deflection of just half the relativistic value, Soldner erroneously omitted a factor of 1/2 from his calculation, so he arrived at the relativistic value, albeit by a computational error.) William Clifford wrote about a possible connection between matter and curved space in 1873. Interestingly, the work of Soldner had been virtually forgotten until being rediscovered and publicized by Philipp Lenard in 1921, along with the claim that Hasenohrl should be credited with the mass-energy equivalence relation. Similarly in 1917 Ernst Gehrcke arranged for the re-publication of a 1898 paper by a secondary school teacher named Paul Gerber which contained a formula for the precession of elliptical orbits identical to the one Einstein had derived from the field equations of general relativity. Gerber's approach

was based on the premise that the gravitational potential propagates at the speed of light, and that the effect of the potential on the motion of a body depends on the body's velocity through the potential field. His potential was similar in form to the Gauss-Weber theories. However, Gerber's "theory" was (and still is) regarded as unsatisfactory, mainly because his conclusions don’t follow from his premises, but also because the combination of Gerber's proposed gravitational potential with the rest of (nonrelativistic) physics results in predictions (such as 3/2 the relativistic prediction for the deflection of light rays near the Sun) which are inconsistent with observation. In addition, Gerber's free mixing of propagating effects with some elements of action-at-a-distance tended to undermine the theoretical coherence of his proposal. The writings of Mitchell, Soldner, Gerber, and others were, at most, anticipations of some of the phenomenology later associated with general relativity, but had nothing to do with the actual theory of general relativity, i.e., a theory that conceives of gravity as a manifestation of the curvature of spacetime. A closer precursors can be found in the notional writings of William Kingdon Clifford, but like Gauss and Riemann he lacked the crucial idea of including time as one of the dimensions of the manifold. As noted above, the formal means of treating space and time as a single unified spacetime manifold was conceived by Poincare and Minkowski, and the tensor calculus was developed by Ricci and Levi-Civita, with whom Einstein corresponded during the development of general relativity. It’s also worth mentioning that Einstein and Grossmann, working in collaboration, came very close to discovering the correct field equations in 1913, but were diverted by an erroneous argument that led them to believe no fully covariant equations could be consistent with experience. In retrospect, this accident may have been all that prevented Grossmann from being perceived as a co-creator of general relativity. On the other hand, Grossmann had specifically distanced himself from the physical aspects of the 1913 paper, and Einstein wrote to Sommerfeld in July 1915 (i.e., prior to arriving at the final form of the field equations) that Grossman will never lay claim to being co-discoverer. He only helped in guiding me through the mathematical literature but contributed nothing of substance to the results. In the summer of 1915 Einstein gave a series of lectures at Gottingen on the general theory, and apparently succeeded in convincing both Hilbert and Klein that he was close to an important discovery, despite the fact that he had not yet arrived at the final form of the field equations. Hilbert took up the problem from an axiomatic standpoint, and carried on an extensive correspondence with Einstein until the 19th of November. On the 20th, Hilbert submitted a paper to the Gesellschaft der Wissenschaften in Gottingen with a derivation of the field equations. Five days later, on 25 November, Einstein submitted a paper with the correct form of the field equations to the Prussian Academy in Berlin. The exact sequence of events leading up to the submittal of these two papers – and how much Hilbert and Einstein learned from each other – is somewhat murky, especially since Hilbert’s paper was not actually published until March of 1916, and seems to have undergone some revisions from what was originally submitted. However, the question of who first wrote down the fully covariant field equations (including the trace term) is less

significant than one might think, because, as Einstein wrote to Hilbert on 18 November after seeing a draft of Hilbert’s paper The difficulty was not in finding generally covariant equations for the gµν’s; for this is easily achieved with the aid of Riemann’s tensor. Rather, it was hard to recognize that these equations are a generalization – that is, a simple and natural generalization – of Newton’s law. It might be argued that Einstein was underestimating the mathematical difficulty, since he hadn’t yet included the trace term in his published papers, but in fact he repeated the same comment in a letter to Sommerfeld on 28 November, this time explicitly referring to the full field equations, with the trace term. He wrote It is naturally easy to set these generally covariant equations down; however, it is difficult to recognize that they are generalizations of Poisson’s equations, and not easy to recognize that they fulfill the conservation laws. I had considered these equations with Grossmann already 3 years ago, with the exception of the [trace term], but at that time we had come to the conclusion that it did not fulfill Newton’s approximation, which was erroneous. Thus he regards the purely mathematical task of determining the most general fully covariant expression involving the gµν’s and their first and second derivatives as comparatively trivial and straightforward – as indeed it is for a competent mathematician. The Bianchi identities were already known, so there was no new mathematics involved. The difficulty, as Einstein stressed, was not in writing down the solution of this mathematical problem, but in conceiving of the problem in the first place, and then showing that it represents a viable law of gravitation. In this, Einstein was undeniably the originator, not only in showing that the field equations reduce to Newton’s law in the first approximation, but also in showing that they yield Mercury’s excess precession in the second approximation. Hilbert was suitably impressed when Einstein showed this in his paper of 18 November, and it’s important to note that this was how Einstein was spending his time around the 18th of November, establishing the physical implications of the fully covariant field equations, while Hilbert was busying himself with elaborating the mathematical aspects of the problem that Einstein had outlined the previous summer. Whatever the true sequence of events, it seems that Einstein initially had some feelings of resentment toward Hilbert, perhaps thinking that Hilbert had acted ungraciously and stolen some of his glory. Already on November 20 he had written to a friend The theory is incomparably beautiful, but only one colleague understands it, and that one works skillfully at "nostrification". I have learned the deplorableness of humans more in connection with this theory than in any other personal experience. But it doesn't bother me. (Literally the word “nostrification” refers to the process by which a country accepts foreign academic degrees as if they had been granted by one of its own universities, but

the word has often been used to suggest the appropriation and re-packaging of someone else’s ideas and making them one’s own.) However, by December 20 he was able to write a conciliatory note to Hilbert, saying There has been between us a certain unpleasantness, whose cause I do not wish to analyze. I have struggled against feelings of bitterness with complete success. I think of you again with untroubled friendliness, and ask you to do the same with me. It would be a shame if two fellows like us, who have worked themselves out from this shabby world somewhat, cannot enjoy each other. Thereafter they remained on friendly terms, and Hilbert never publicly claimed any priority in the discovery of general relativity, and always referred to it as Einstein’s theory. As it turned out, Einstein can hardly have been dissatisfied with the amount of popular credit he received for the theories of relativity, both special and general. Nevertheless, one senses a bit of annoyance when Max Born mentioned to Einstein in 1953 (two years before Einstein's death) that the second volume of Edmund Whittaker's book “A History of the Theories of Aether and Electricity” had just appeared, in which special relativity is attributed to Lorentz and Poincare, with barely a mention of Einstein except to say that "in the autumn of [1905] Einstein published a paper which set forth the relativity theory of Poincare and Lorentz with some amplifications, and which attracted much attention". In the same book Whittaker attributes some of the fundamental insights of general relativity to Planck and a mathematician named Harry Bateman (a former student of Whittaker’s). Einstein replied to his old friend Born Everybody does what he considers right... If he manages to convince others, that is their own affair. I myself have certainly found satisfaction in my efforts, but I would not consider it sensible to defend the results of my work as being my own 'property', as some old miser might defend the few coppers he had laboriously scrapped together. I do not hold anything against him [Whittaker], nor of course, against you. After all, I do not need to read the thing. On the other hand, in the same year (1953), Einstein wrote to the organizers of a celebration honoring the upcoming fiftieth anniversary of his paper on the electrodynamics of moving bodies, saying I hope that one will also take care on that occasion to suitably honor the merits of Lorentz and Poincare. 8.9 Paths Not Taken Two roads diverged in a yellow wood, And sorry I could not travel both And be one traveler, long I stood

And looked down one as far as I could To where it bent in the undergrowth… Robert Frost, 1916 The Archimedian definition of a straight line as the shortest path between two points was an early expression of a variational principle, leading to the modern idea of a geodesic path. In the same spirit, Hero explained the paths of reflected rays of light based on a principle of least distance, which Fermat reinterpreted as a principle of least time, enabling him to account for refraction as well. Subsequently, Maupertius and others developed this approach into a general principle of least action, applicable to mechanical as well as optical phenomena. Of course, as discussed in Chapter 3.4, a more correct statement of these principles is that systems evolve along stationary paths, which may be maximal, minimal, or neither (at an inflection point). This is a tremendously useful principle, but as a realistic explanation it has always been at least slightly suspect, because (for example) it isn't clear how a single ray of light (or a photon) moving along a particular path can "know" that it is an extremal path in the variational sense. To illustrate the problem, consider a photon traveling from A to B through a transparent medium whose refractive index n increases in the direction of travel, as indicated by the solid vertical lines in the drawing below:

Since the path AB is parallel to the gradient of the refractive index, it undergoes no refraction. However, if the lines of constant refractive index were tilted as shown by the dashed diagonal lines in the figure, a ray of light initially following the path AB will be refracted and arrive at C, even though the index of refraction at each point along the path AB is identical to what it was before, where there was no refraction. This shows that the path of a light ray cannot be explained solely in terms of the value of the refractive index the path. We must also consider the transverse values of the refractive index along neighboring paths, i.e., along paths not taken. The classical wave explanation, proposed by Huygens, resolves this problem by denying that light can propagate in the form of a single ray. According to the wave interpretation, light propagates as a wave front possessing transverse width. A small section of a propagating wave front is shown in the figure below, with the gradient of the refractive index perpendicular to the initial trajectory of light:

Clearly the wave front propagates more rapidly on the side where the refractive index is low (viz, the speed of light is high) than on the side where the refractive index is high. As a result, the wave front naturally turns in the direction of higher refractive index (i.e., higher density). It's easy to see that the amount of deflection of the normal to the wave front agrees precisely with the result of applying Fermat's principle, because the wave front represents a locus of points that are at an equal phase distance from the point of emission. Thus the normal to the wave front is, by definition, a stationary path in the variational sense. More generally, Huygens articulated the remarkable principle that every point of a wave front can be regarded as the origin of a secondary spherical wave, and the envelope of all these secondary waves constitutes the propagated wave front. This is illustrated in the figure below:

Huygens also assumed the secondary wave originating at any point has the same speed and frequency as the primary wave at that point. The main defect in Huygens' wave theory of optics was it's failure to account for the ray-like properties of light, such as the casting of sharp shadows. Because of this failure (and also the inability of the wave theory to explain polarization), the corpuscular theory of light favored by Newton seemed more viable throughout the 18th century. However, early in the 19th century, Young and Fresnel modified Huygens' principle to include the crucial element of interference. The modified principle asserts that the amplitude of the propagated wave is determined by the superposition of all the (unobstructed) secondary wavelets originating on the wave front at any prior instant. (Young also proposed that light was a transverse rather than longitudinal wave, thereby accounting for polarization - but only at the expense of making it very difficult to conceive of a suitable material medium, as discussed in Chapter 3.5.) In his critique of the wave theory of light Newton (apparently) never realized that waves actually do exhibit "rectilinear motion", and cast sharp shadows, etc., provided that the wavelength is small on the scale of the obstructions. In retrospect, it's surprising that

Newton, the superb experimentalist, never noticed this effect, since it can be seen in ordinary waves on the surface of a pool of water. Qualitatively, if the wavelength is large relative to an aperture, the phases of the secondary wavelets emanating from every point in the mouth of the aperture to any point in the region beyond will all be within a fraction of a cycle from each other, so they will (more or less) constructively reinforce each other. On the other hand, if the wavelength is very small in comparison with the size of the aperture, the region of purely constructive interference on the far side of the aperture will just be a narrow band perpendicular to the aperture. The wave theory of light is quite satisfactory for a wide range of optical phenomena, but when examined on a microscopic scale we find the transfer of energy and momentum via electromagnetic waves exhibits a granularity, suggesting that light comes in discrete quanta (packets). Planck had originated the quantum theory in 1900 by showing that the so-called ultra-violet catastrophe entailed by the classical theory of blackbody radiation (which predicted infinite energy at the high end of the spectrum) could be avoided - and the actual observed radiation could be accurately modeled - if we assume oscillators lining the walls of the cavity can absorb and emit electromagnetic energy only in discrete units proportional to the frequency, ν. The constant of proportionality is now known as Planck's constant, denoted by h, and has the incredibly tiny value (6.626)10-34 Joule seconds. Thus a physical oscillator with frequency ν emits and absorbs energy in integer multiples of hν. Planck's interpretation was that the oscillators were quantized, i.e., constrained to emit and absorb energy in discrete units, but he did not (explicitly) suggest that electromagnetic energy itself was inherently quantized. However, in a sense, this further step was unavoidable, because ultimately light is nothing but its emissions and absorptions. It's not possible to "see" an isolated photon. The only perceivable manifestation of photons is their emissions and absorptions by material objects. Thus if we carry Planck's assumption to its logical conclusion, it's natural to consider light itself as being quantized in tiny bundles of energy hν. This was explicitly proposed by Einstein in 1905 as a heuristic approach to understanding the photoelectric effect. Incidentally, it was this work on the photoelectric effect, rather than anything related to special or general relativity, that was cited by the Nobel committee in 1921 when Einstein was finally awarded the prize. Interestingly, the divorce settlement of Albert and Mileva Einstein, negotiated through Einstein's faithful friend Besso in 1918, included the provision that the cash award of any future Nobel prize which Albert might receive would go to Mileva for the care of the children, as indeed it did. We might also observe that Einstein's work on the photoelectric effect was much more closely related to the technological developments leading to the invention of television than his relativity theory was to the unleashing of atomic energy. Thus, if we wish to credit or blame Einstein for laying the scientific foundations of a baneful technology, it might be more accurate to cite television rather than the atomic bomb. In any case, it had been known for decades prior to 1905 that if an electromagnetic wave shines on a metallic substance, which possesses many free valence electrons, some of

those electrons will be ejected from the metal. However, the classical wave theory of light was unable to account for several features of this observed phenomena. For example, according to the wave theory the kinetic energy of the ejected electrons should increase as the intensity of the incident light is increased (at constant frequency), but in fact we observe that the ejected electrons invariably possess exactly the same kinetic energy for a given frequency of light. Also, the wave theory predicts that the photoelectric effect should be present (to some degree) at all frequencies, whereas we actually observe a definite cutoff frequency, below which no electrons are ejected, regardless of the intensity of the incident light. A more subtle point is that the classical wave theory predicts a smooth continuous transfer of energy from the wave to a particle, and this implies a certain time lag between when the light first strikes the metal and when electrons begin to be ejected. No such time lag is observed. Einstein's proposal for explaining the details of the photoelectric effect was to take Planck's quantum theory seriously, and consider the consequences of assuming that light of frequency ν consists of tiny bundles - later given the name photons - of energy hν. Just as Planck had said, each material "oscillator" emits and absorbs energy in integer multiples of this quantity, which Einstein interpreted as meaning that material particles (such as electrons) emit and absorb whole photons. This is an extraordinary hypothesis, and might seem to restore Newton's corpuscular theory of light. However, these particles of light were soon found to possess properties and exhibit behavior quite unlike ordinary macroscopic particles. For example, in 1924 Bose gave a description of blackbody radiation using the methods of statistical thermodynamics based on the idea that the cavity is filled with a "gas" of photons, but the statistical treatment regards the individual photons as indistinguishable and interchangeable, i.e., not possessing distinct identities. This leads to the Bose-Einstein distribution

which gives, for a system in equilibrium at temperature T, the expected number of particles in a quantum state with energy E. In this equation, k is Boltzman's constant and A is a constant determined by number of particles in the system. Particles that obey Bose-Einstein statistics are called Bosons. Compare this distribution with the classical Boltzman distribution, which applies to a collection of particles with distinct identities (such as complex atoms and molecules)

A third equilibrium distribution arises if we consider indistinguishable particles that obey the Pauli exclusion principle, which precludes more than one particle from occupying any given quantum state in a system. Such particles are called fermions, the most prominent example being electrons. It is the exclusion principle that accounts for the variety and complexity of atoms, and their ability to combine chemically to form molecules. The energy distribution in an equilibrium gas of fermions is

The reason photons obey Bose-Einstein rather than Fermi statistics is that they do not satisfy the Pauli exclusion principle. In fact, multiple bosons actually prefer to occupy the same quantum state, which led to Einstein's prediction of stimulated emission, the principle of operation behind lasers, which have become so ubiquitous today in CD players, fiber optic communications, and so on. Thus the photon interpretation has become an indispensable aspect of our understanding of light. However, it also raises some profound questions about our most fundamental ideas of space, time, and motion. First, the indistinguishability and interchangeability of fundamental particles (fermions as well as bosons) challenges the basic assumption that distinct objects can be identified from one instant of time to the next, which (as discussed in Chapter 1.1) underlies our intuitive concept of motion. Second, even if we consider the emission and absorption of just a single particle of light, we again face the question of how the path of this particle is chosen from among all possible paths between the emission and absorption events. We've seen that Fermat's principle of least time seems to provide the answer, but it also seems to imply that the photon somehow "knows" which direction at any given point is the quickest way forward, even though the knowledge must depend on the conditions at points not on the path being followed. Also, the principle presupposes either a fixed initial trajectory or a defined destination, neither of which is necessarily available to a photon at the instant of emission. In a sense, the principle of least time is backwards, because it begins by positing particular emission and absorption events, and infers the hypothetical path of a photon connecting them, whereas we should like (classically) to begin with just the emission event and infer the time and location of the absorption event. The principle of Fermat can only assist us if we assume a particular definite trajectory for the photon at emission, without reference to any absorption. Unfortunately, the assignment of a definite trajectory to a photon is highly problematical because, as noted above, a photon really is nothing but an emission and an associated absorption. To speak about the trajectory of a free photon is to speak about something that cannot, even in principle, ever be observed. Moreover, many optical phenomena are flatly inconsistent with the notion of free photons with definite trajectories. The wavelike behavior of light, such as demonstrated in Young's two-slit interference experiment, defy explanation in terms of free particles of light moving along free trajectories independent of the emission and absorption events. The figure below gives a schematic of Young's experiment, showing that the intensity of light striking the collector screen exhibits the interference effects of the light emanating from the two slits in the intermediate screen.

This interference pattern is easily explained in terms of interfering waves, but for light particles we expect the intensity on the collector screen to be just the sum of the intensities given by each slit individually. Still, if we regard the flow of light as consisting of a large number of photons, each with their own phases, we might be able to imagine that they somehow mingle with each other while passing from the source to the collector, thereby producing the interference pattern. However, the problem becomes more profound if we reduce the intensity of the light source to a sufficiently low level that we can actually detect the arrival of individual photons, like clicks on a Geiger counter, by an array of individual photo-detectors lining the collector screen. Each arrival is announced by just a single detector. We can even reduce the intensity to such a low level that no more than one photon is "in flight" at any given time. Under these conditions there can be no "mingling" of various photons, and yet if the experiment is carried on long enough we find that the number of arrivals at each point on the collector screen matches the interference pattern. The modern theory of quantum electrodynamics explains this behavior by denying that photons follow definite trajectories through space and time. Instead, an emitter has at each instant along its worldline a particular complex amplitude for emitting a photon, and a potential absorber has a complex amplitude for absorbing that photon. The amplitude at the absorber is the complex sum of the emission amplitudes of the emitter at various times in the past, corresponding to the times required to traverse each of the possible paths from the emitter to the absorber. At each of those times the light source had a certain complex amplitude for emitting a photon, and the phase of that amplitude advances steadily along the timeline of the emitter, giving a frequency equal to the frequency of the emitted light. For example, when we look at the reflection of a light source on a mirror our eye is at one end of a set of rays, each of slightly different length, which implies that amplitude for each path corresponds to the amplitude of the emitter at a slightly different time in the past. Thus, we are actually receiving an image of the light source from a range of times in the past. This is illustrated in the drawing below:

If the optical path lengths of the bundle of incoming rays in a particular direction are all nearly equal (meaning that the path is "stationary" in the variational sense), their amplitudes will all be nearly in phase, so they reinforce each other, yielding a large complex sum. On the other hand, if the lengths of the paths arriving from a particular direction differ significantly, the complex sum of amplitudes will be taken over several whole cycles of the oscillating emitter amplitude, so they largely cancel out. This is why most of the intensity of the incoming ray arrives from the direction of the stationary path, which conforms with Hero's equi-angular reflection. To test the reality of this interpretation, notice that it claims the absence of reflected light at unequal angles is due to the canceling contributions of neighboring paths, so in theory we ought to be able to delete the paths corresponding to all but one phase angle of the emitter, and thereby enable us to see non-Heronian reflected light. This is actually the principle of operation of a diffraction grating, where alternating patches of a reflecting surface are scratched away, at intervals in proportion to the wavelength of the light. When this is done, it is indeed possible to see light reflected at highly non-Heronian angles, as illustrated below.

All of this suggests that the conveyance of electromagnetic energy from an emitter to an absorber is not well-described in terms of a classical free particle following a free path through spacetime. It also suggests that what we sometimes model as wave properties of electromagnetic radiation are really wave properties of the emitter. This is consistent with the fact that the wave function of a putative photon does not advance along its null worldline. See Section 9.10, where it is argued that the concept of a "free photon" is meaningless, because every photon is necessarily emitted and absorbed. If we compare a photon to a clap, then a "free photon" is like clapping with no hands.

9.1 In the Neighborhood Nothing puzzles me more than time and space; and yet nothing troubles me less, as I never think about them. Charles Lamb (1775-1834) It's customary to treat the relativistic spacetime manifold as an ordinary topological space with the same topology as a four-dimensional Euclidean manifold, denoted by R4. This is typically justified by noting that the points of spacetime can be parameterized by a set of four coordinates x,y,z,t, and defining the "neighborhood" of a point somewhat informally as follows (quoted from Ohanian and Ruffinni): "...the neighborhood of a given point is the set of all points such that their coordinates differ only a little from those of the given point." Of course, the neighborhoods given by this definition are not Lorentz-invariant, because the amount by which the coordinates of two points differ is highly dependent on the frame of reference. Consider, for example, two spacetime points in the xt plane with the coordinates {0,0} and {1,1} with respect to a particular system of inertial coordinates. If we consider these same two points with respect to the frame of an observer moving in the positive x direction with speed v (and such that the origin coincides with the former coordinate origin), the differences in both the space and time coordinates are reduced by a factor of , which can range anywhere between 0 and . Thus there exist valid inertial reference systems with respect to which both of the coordinates of these points differ (simultaneously) by as little or as much as we choose. Based on the above definition of neighborhood (i.e., points whose coordinates “differ only a little”), how can we decide if these two points are in the same neighborhood? It might be argued that the same objection could be raised against this coordinate-based definition of neighborhoods in Euclidean space, since we're free to scale our coordinates arbitrarily, which implies that the numerical amount by which the coordinates of two given (distinct) points differ is arbitrary. However, in Euclidean space this objection is unimportant, because we will arrive at the same definition of limit points, and thus the same topology, regardless of what scale factor we choose. In fact, the same applies even if we choose unequal scale factors in different directions, provided those scale factors are all finite and non-zero. From a strictly mathematical standpoint, the usual way of expressing the arbitrariness of metrical scale factors for defining a topology on a set of points is to say that if two systems of coordinates are related by a diffeomorphism (a differentiable mapping that possess a differentiable inverse), then the definition of neighborhoods in terms of "coordinates that differ only a little" will yield the same limit points and thus the same topology. However, from the standpoint of a physical theory it's legitimate to ask whether the set of distinct points (i.e., labels) under our chosen coordinate system actually corresponds one-to-one with the distinct physical entities whose connectivities

we are tying to infer. For example, we can represent formal fractions x/y for real values of x and y as points on a Euclidean plane with coordinates (x,y), and conclude that the topology of formal fractions is R2, but of course the value of every fraction lying along a single line through the origin is the same, and the values of fractions have the natural topology of R1 (because the reals are closed under division, aside from divisions by zero). If the meanings assigned to our labels are arbitrary, then these are simply two different manifolds with their own topologies, but for a physical theory we may wish to decide whether the true objects of our study - the objects with ontological status in our theory are formal fractions or the values of fractions. When trying to infer the natural physical topology of the points of spacetime induced by the Minkowski metric we face a similar problem of identifying the actual physical entities whose mutual connectivities we are trying to infer, and the problem is complicated by the fact that the "Minkowski metric" is not really a metric at all (as explained below). Recall that for many years after general relativity was first proposed by Einstein there was widespread confusion and misunderstanding among leading scientists (including Einstein himself) regarding various kinds of singularities. The main source of confusion was the failure to clearly distinguish between singularities of coordinate systems as opposed to actual singularities of the manifold/field. This illustrates how we can be misled by the belief that the local topology of a physical manifold corresponds to the local topology of any particular system of coordinates that we may assign to that physical manifold. It’s entirely possible for the “manifold of coordinates” to have a different topology than the physical manifold to which those coordinates are applied. With this in mind, it’s worthwhile to consider carefully whether the most physically meaningful local topology of spacetime is necessarily the same as the topology of the usual fourdimensional systems of coordinates that are conventionally applied to it. Before examining the possible topologies of Minkowski spacetime in detail, it's worthwhile to begin with a review of the basic definitions of point set topologies and topological spaces. Given a set S, let P(S) denote the set of all subsets of S. A topology for the set S is a mapping T from the Cartesian product {S  P(S)} to the discrete set {0,1}. In other words, given any element e of S, and any subset A of S, the mapping T(A,e) returns either 0 or 1. In the usual language of topology, we say that e is a limit point of A if and only if T(A,e) = 1. As an example, we can define a topology on the set of points of 2D Euclidean space equipped with the usual Pythagorean metric (1) by saying that the point e is a limit point of any subset A of points of the plane if and only if for every positive real number ε there is an element u (other than e) of A such that d(e,u) < ε. Clearly this definition relies on prior knowledge of the "topology" of the real numbers, which is denoted by R1. The topology of 2D Euclidean space is called R2, since it is just the Cartesian product R1  R1.

The topology of a Euclidean space described above is actually a very special kind of topology, called a topological space. The distinguishing characteristic of a topological space S,T is that S contains a collection of subsets, called the open sets (including S itself and the empty set) which is closed under unions and finite intersections, and such that a point p is a limit point of a subset A of S if and only if every open set containing p also contains a point of A distinct from p. For example, if we define the collection of open spherical regions in Euclidean space, together with any regions that can be formed by the union or finite intersection of such spherical regions, as our open sets, then we arrive at the same definition of limit points as given previously. Therefore, the topology we've described for the points of Euclidean space constitutes a topological space. However, it's important to realize that not every topology is a topological space. The basic sets that we used to generate the Euclidean topology were spherical regions defined in terms of the usual Pythagorean metric, but the same topology would also be generated by any other metric. In general, a basis for a topological space on the set S is a collection B of subsets of S whose union comprises all of S and such that if p is in the intersection of two elements Bi and Bj of B, then there is another element Bk of B which contains p and which is entirely contained in the intersection of Bi and Bj, as illustrated below for circular regions on a plane.

Given a basis B on the set S, the unions of elements of B satisfy the conditions for open sets, and hence serve to define a topological space. (This relies on the fact that we can represent non-circular regions, such as the intersection of two circular open sets, as the union of an infinite number of circular regions of arbitrary sizes.) If we were to substitute the metric d(a,b) = |xa  xb| + |ya  yb| in place of the Pythagorean metric, then the basis sets, defined as loci of points whose "distances" from a fixed point p are less than some specified real number r, would be square-shaped diamonds instead of circles, but we would arrive at the same topology, i.e., the same definition of limit points for the subsets of the Euclidean plane E2. In general, any true metric will induce this same local topology on a manifold. Recall that a metric is defined as a distance function d(a,b) for any two points a,b in the space satisfying the three axioms (1) d(a,b) = 0 if and only if a = b

(2) d(a,b) = d(b,a) for each a,b (3) d(a,c)  d(a,b) + d(b,c) for all a,b,c It follows that d(a,b)  0 for all a,b. Any distance function that satisfies the conditions of a metric will induce the same (local) topology on a set of points, and this will be a topological space. However, it's possible to conceive of more general "distance functions" that do not satisfy all the axioms of a metric. For example, we can define a distance function that is commutative (axiom 2) and satisfies the triangle inequality (axiom 3), but that allows d(a,b) = 0 for distinct points a,b. Thus we replace axiom (1) with the weaker requirement d(a,a) = 0. Such a distance function is called a pseudometric. Obviously if a,b are any two points with d(a,b) = 0 we must have d(a,c) = d(b,c) for every point c, because otherwise the points a,b,c would violate the triangle inequality. Thus a pseudometric partitions the points of the set into equivalence classes, and the distance relations between these equivalence classes must be metrical. We've already seen a situation in which a pseudometric arises naturally, if we define the distance between two points in the plane of formal fractions as the absolute value of the difference in slopes of the lines from the origin to those two points. The distance between any two points on a single line through the origin is therefore zero, and these lines represent the equivalence classes induced by the pseudometric. Of course, the distances between the slopes satisfy the requirements of a metric. Therefore, the absolute difference of value is a pseudometric for the space of formal fractions. Now, we know that the points of a two-dimensional plane can be assigned the R2 topology, and the values of fractions can be assigned the R1 topology, but what kind of local topology is induced on the two-dimensional space of formal fractions by the pseudometric? We can use our pseudometric distance function to define a basis, just as with a metrical distance function, and arrive at a topological space, but this space will not generally possess all the separation properties that we commonly expect for distinct points of a topological space. It's convenient to classify the separation properties of topological spaces according to the "trennungsaxioms", also called the Ti axioms, introduced by Alexandroff and Hopf. These represent a sequence of progressively stronger separation axioms to be met by the points of a topological space. A space is said to be T0 if for any two distinct points at least one of them is in a neighborhood that does not include the other. If each point is contained in a neighborhood that does not include the other, then the space is called T1. If the space satisfies the even stronger condition that any two points are contained in disjoint open sets, then the space is called T2, also known as a Hausdorff space. There are still more stringent separation axioms that can be applied, corresponding to T3 (regular), T4 (normal), and so on. Many topologists will not even consider a topological space which is not at least T2 (and some aren't interested in anything which is not at least T4), and yet it's clear that the topology of the space of formal fractions induced by the pseudometric of absolute values

is not even T0, because two distinct fractions with the same value (such as 1/3 and 2/6) cannot be separated into different neighborhoods by the pseudometric. Nevertheless, we can still define the limit points of the set of formal fractions based on the pseudometric distance function, thereby establishing a perfectly valid topology. This just illustrates that the distinct points of a topology need not exhibit all the separation properties that we usually associate with distinct points of a Hausdorff spaces (for example). Now let's consider 1+1 dimensional Minkowski spacetime, which is physically characterized by an invariant spacetime interval whose magnitude is d(a,b) = | (ta  tb)2  (xa  xb)2 |

(2)

Empirically this appears to be the correct measure of absolute separation between the points of spacetime, i.e., it corresponds to what clocks measure along timelike intervals and what rulers measure along spacelike intervals. However, this distance function clearly does not satisfy the definition of a metric, because it can equal zero for distinct points. Moreover, it is not even a pseudo-metric, because the interval between points a and b is always greater than the sum of the intervals from a to c and from c to b, contradicting the triangle inequality. For example, it's quite possible in Minkowski spacetime to have two sides of a "triangle" equal to zero while the remaining side is billions of light years in length. Thus, the absolute interval of space-time does not provide a metrical measure of distance in the strict sense. Nevertheless, in other ways the magnitude of the interval d(a,b) is quite analogous to a metrical distance, so it's customary to refer to it loosely as a "metric", even though it is neither a true metric nor even a pseudometric. We emphasize this fact to remind ourselves not to prejudge the topology induced by this distance function on the points of Minkowski spacetime, and not to assume that distinct events possess the separation properties or connectivities of a topological space. The ε-neighborhood of a point p in the Euclidean plane based on the Pythagorean metric (1) consists of the points q such that d(p,q) < ε. Thus the ε-neighborhoods of two points in the plane are circular regions centered on the respective points, as shown in the lefthand illustration below. In contrast, the ε-neighborhoods of two points in Minkowski spacetime induced by the Lorentz-invariant distance function (2) are the regions bounded by the hyperbolic envelope containing the light lines emanating from those points, as shown in the right-hand illustration below.

This illustrates the important fact that the concept of "nearness" implied by the Minkowski metric is non-transitive. In a metric (or even a pseudometric) space, the triangle inequality ensures that if A and B are close together, and B and C are close together, then A and C cannot be very far apart. This transitivity obviously doesn't apply to the absolute magnitudes of the spacetime intervals between events, because it's possible for A and B to be null-separated, and for B and C to be null separated, while A and C are arbitrarily far apart. Interestingly, it is often suggested that the usual Euclidean topology of spacetime might break down on some sufficiently small scale, such as over distances on the order of the Planck length of roughly 10-35 meters, but the system of reference for evaluating that scale is usually not specified. As noted previously, the spatial and temporal components of two null-separated events can both simultaneously be regarded as arbitrarily large or arbitrarily small (including less than 10-35 meters), depending on which system of inertial coordinates we choose. This null-separation condition permeates the whole of spacetime (recall Section 1.10 on Null Coordinates), so if we take seriously the possibility of nonEuclidean topology on the Planck scale, we can hardly avoid considering the possibility that the effective physical topology ("connectedness") of the points of spacetime may be non-Euclidean along null intervals in their entirety, which span all scales of spacetime. It's certainly true that the topology induced by a direct application of the Minkowski distance function (2) is not even a topological space, let alone Euclidean. To generate this topology, we simply say that the point e is a limit point of any subset A of points of Minkowski spacetime if and only if for every positive real number ε there is an element u (other than e) of A such that d(e,u) < ε. This is a perfectly valid topology, and arguably the one most consistent with the non-transitive absolute intervals that seem to physically characterize spacetime, but it is not a topological space. To see this, recall that in order for a topology to be a topological space it must be possible to express the limit point mapping in terms of open sets such that a point e is a limit point of a subset A of S if and only if every open set containing e also contains a point of A distinct from e. If we define

our topological neighborhoods in terms of the Minkowski absolute intervals, our open sets would naturally include complete Minkowski neighborhoods, but these regions don't satisfy the condition for a topological space, as illustrated below, where e is a limit point of A, but e is also contained in Minkowski neighborhoods containing no point of A.

The idea of a truly Minkowskian topology seems unsatisfactory to many people, because they worry that it implies every two events are mutually "co-local" (i.e., their local neighborhoods intersect), and so the entire concept of "locality" becomes meaningless. However, the fact that a set of points possesses a non-positive-definite line element does not imply that the set degenerates into a featureless point (which is fortunate, considering that the spacetime we inhabit is characterized by just such a line element). It simply implies that we need to apply a more subtle understanding of the concept of locality, taking account of its non-transitive aspect. In fact, the overlapping of topological neighborhoods in spacetime suggests a very plausible approach to explaining the "nonlocal" quantum correlations that seem so mysterious when viewed from the viewpoint of Euclidean topology. We'll consider this in more detail in subsequent chapters. It is, of course, possible to assign the Euclidean topology to Minkowski spacetime, but only by ignoring the non-transitive null structure implied by the Lorentz-invariant distance function. To do this, we can simply take as our basis sets all the finite intersections of Minkowski neighborhoods. Since the contents of an ε-neighborhood of a given point are invariant under Lorentz transformations, it follows that the contents of the intersection of the ε-neighborhoods of two given points are also invariant. Thus we can define each basis set by specifying a finite collection of events with a specific value of ε for each one, and the resulting set of points is invariant under Lorentz transformations. This is a more satisfactory approach than defining neighborhoods as the set of points whose coordinates (with respect to some arbitrary system of coordinates) differ only a little, but the fact remains that by adopting this approach we are still tacitly abandoning the Lorentz-invariant sense of nearness and connectedness, because we are segregating

null-separated events into disjoint open sets. This is analogous to saying, for the plane of formal fractions, that 4/6 is not a limit point of every set containing 2/3, which is certainly true on the formal level, but it ignores the natural topology possessed by the values of fractions. In formulating a physical theory of fractions we would need to decide at some point whether the observable physical phenomena actually correspond to pairings of numerators and denominators, or to the values of fractions, and then select the appropriate topology. In the case of a spacetime theory, we need to consider whether the temporal and spatial components of intervals have absolute significance, or whether it is only the absolute intervals themselves that are significant. It's worth reviewing why we ever developed the Euclidean notion of locality in the first place, and why it's so deeply engrained in our thought processes, when the spacetime which we inhabit actually possesses a Minkowskian structure. This is easily attributed to the fact that our conscious experience is almost exclusively focused on the behavior of macro-objects whose overall world-lines are nearly parallel relative to the characteristic of the metric. In other words, we're used to dealing with objects whose mutual velocities are small relative to c, and for such objects the structure of spacetime does approach very near to being Euclidean. On the scales of space and time relevant to macro human experience the trajectories of incoming and outgoing light rays through any given point are virtually indistinguishable, so it isn't surprising that our intuition reflects a Euclidean topology. (Compare this with the discussion of Postulates and Principles in Chapter 3.1.) Another important consequence of the non-positive-definite character of Minkowski spacetime concerns the qualitative nature of geodesic paths. In a genuine metric space the geodesics are typically the shortest paths from place to place, but in Minkowski spacetime the timelike geodesics are the longest paths, in terms of the absolute value of the invariant intervals. Of course, if we allow curvature, there may be multiple distinct "maximal" paths between two given events. For example, if we shoot a rocket straight up (with less than escape velocity), and it passes an orbiting satellite on the way up, and passes the same satellite again on the way back down, then each of them has followed a geodesic path between their meetings, but they have followed very different paths. From one perspective, it's not surprising that the longest paths in spacetime correspond to physically interesting phenomena, because the shortest path between any two points in Minkowski spacetime is identically zero. Hence the structure of events was bound to involve the longest paths. However, it seems rash to conclude that the shortest paths play no significant role in physical phenomena. The shortest absolute timelike path between two events follows a "dog leg" path, staying as close as possible to the null cones emanating from the two events. Every two points in spacetime are connected by a contiguous set of lightlike intervals whose absolute magnitudes are zero. Minkowski spacetime provides an opportunity to reconsider the famous "limit paradox" from freshman calculus in a new context. Recall the standard paradox begins with a twopart path in the xy plane from point A to point C by way of point B as shown below:

If the real segment AC has length 1, then the dog-leg path ABC has length , as does each of the zig-zag paths ADEFC, AghiEjklC, and so on. As we continue to subdivide the path into more and smaller zigzags the envelope of the path converges on the straight line from A to C. The "paradox" is that the limiting zigzag path still has length , whereas the line to which it converges (and from which we might suppose it is indistinguishable) has length 1. Needless to say, this is not a true paradox, because the limit of a set of convergents does not necessarily possess all the properties of the convergents. However, from a physical standpoint it teaches a valuable lesson, which is that we can't necessarily assess the length of a path by assuming it equals the length of some curve from which it never differs by any measurable amount. To place this in the context of Minkowski spacetime, we can simply replace the y axis with the time axis, and replace the Euclidean metric with the Minkowski pseudo-metric. We can still assume the length of the interval AC is 1, but now each of the diagonal segments is a null interval, so the total path length along any of the zigzag paths is identically zero. In the limit, with an infinite number of infinitely small zigzags, the jagged "null path" is everywhere practically coincident with the timelike geodesic path AC, and yet its total length remains zero. Of course, the oscillating acceleration required to propel a massive particle on a path approaching these light-like segments would be enormous, as would the frequency of oscillation. 9.2 Up To Diffeomorphism The mind of man is more intuitive than logical, and comprehends more than it can coordinate. Vauvenargues, 1746 Einstein seems to have been strongly wedded to the concept of the continuum described

by partial differential equations as the only satisfactory framework for physics. He was certainly not the first to hold this view. For example, in 1860 Riemann wrote As is well known, physics became a science only after the invention of differential calculus. It was only after realizing that natural phenomena are continuous that attempts to construct abstract models were successful… In the first period, only certain abstract cases were treated: the mass of a body was considered to be concentrated at its center, the planets were mathematical points… so the passage from the infinitely near to the finite was made only in one variable, the time [i.e., by means of total differential equations]. In general, however, this passage has to be done in several variables… Such passages lead to partial differential equations… In all physical theories, partial differential equations constitute the only verifiable basis. These facts, established by induction, must also hold a priori. True basic laws can only hold in the small and must be formulated as partial differential equations. Compare this with Einstein’s comments (see Section 3.2) over 70 years later about the unsatisfactory dualism inherent in Lorentz’s theory, which expressed the laws of motion of particles in the form of total differential equations while describing the electromagnetic field by means of partial differential equations. Interestingly, Riemann asserted that the continuous nature of physical phenomena was “established by induction”, but immediately went on to say it must also hold a priori, referring somewhat obscurely to the idea that “true basic laws can only hold in the infinitely small”. He may have been trying to convey by these words his rejection of “action at a distance”. Einstein attributed this insight to the special theory of relativity, but of course the Newtonian concept of instantaneous action at a distance had always been viewed skeptically, so it isn’t surprising that Riemann in 1860 – like his contemporary Maxwell – adopted the impossibility of distant action as a fundamental principle. (It’s interesting the consider whether Einstein might have taken this, rather than the invariance of light speed, as one of the founding principles of special relativity, since it immediately leads to the impossibility of rigid bodies, etc.) In his autobiographical notes (1949) Einstein wrote There is no such thing as simultaneity of distant events; consequently, there is also no such thing as immediate action at a distance in the sense of Newtonian mechanics. Although the introduction of actions at a distance, which propagate at the speed of light, remains feasible according to this theory, it appears unnatural; for in such a theory there could be no reasonable expression for the principle of conservation of energy. It therefore appears unavoidable that physical reality must be described in terms of continuous functions in space. It’s worth noting that while Riemann and Maxwell had expressed their objections in terms of “action at a (spatial) distance”, Einstein can justly claim that special relativity revealed that the actual concept to be rejected was instantaneous action at a distance. He acknowledge that “distant action” propagating at the speed of light – which is to say, action over null intervals – is remains feasible. In fact, one could argue that such “distant action” was made more feasible by special relativity, especially in the context of

Minkowski’s spacetime, in which the null (light-like) intervals have zero absolute magnitude. For any two light-like separated events there exist perfectly valid systems of inertial coordinates in terms of which both the spatial and the temporal measures of distance are arbitrarily small. It doesn’t seem to have troubled Einstein (nor many later scientists) that the existence of non-trivial null intervals potentially undermines the identification of the topology of pseudo-metrical spacetime with that of a true metric space. Thus Einstein could still write that the coordinates of general relativity express the “neighborliness” of events “whose coordinates differ but little from each other”. As argued in Section 9.1, the assumption that the physically most meaningful topology of a pseudo-metric space is the same as the topology of continuous coordinates assigned to that space, even though there are singularities in the invariant measures based on those coordinates, is questionable. Given Einstein’s aversion to singularities of any kind, including even the coordinate singularity at the Schwarzschild radius, it’s somewhat ironic that he never seems to have worried about the coordinate singularity of every lightlike interval and the non-transitive nature of “null separation” in ordinary Minkowski spacetime. Apparently unconcerned about the topological implications of Minkowski spacetime, Einstein inferred from the special theory that “physical reality must be described in terms of continuous functions in space”. Of course, years earlier he had already considered some of the possible objections to this point of view. In his 1936 essay on “Physics and Reality” he considered the “already terrifying” prospect of quantum field theory, i.e., the application of the method of quantum mechanics to continuous fields with infinitely many degrees of freedom, and he wrote To be sure, it has been pointed out that the introduction of a space-time continuum may be considered as contrary to nature in view of the molecular structure of everything which happens on a small scale. It is maintained that perhaps the success of the Heisenberg method points to a purely algebraical method of description of nature, that is to the elimination of continuous functions from physics. Then, however, we must also give up, on principle, the space-time continuum. It is not unimaginable that human ingenuity will some day find methods which will make it possible to proceed along such a path. At the present time, however, such a program looks like an attempt to breathe in empty space. In his later search for something beyond general relativity that would encompass quantum phenomena, he maintained that the theory must be invariant under a group that at least contains all continuous transformations (represented by the symmetric tensor), but he hoped to enlarge this group. It would be most beautiful if one were to succeed in expanding the group once more in analogy to the step that led from special relativity to general relativity. More specifically, I have attempted to draw upon the group of complex transformations of the coordinates. All such endeavours were unsuccessful. I also gave up an open or concealed increase in the number of dimensions, an endeavor that … even today has its adherents.

The reference to complex transformations is an interesting fore-runner of more recent efforts, notably Penrose’s twistor program, to exploit the properties of complex functions (cf Section 9.9). The comment about increasing the number of dimensions certainly has relevance to current “string theory” research. Of course, as Einstein observed in an appendix to his Princeton lectures, “In this case one must explain why the continuum is apparently restricted to four dimensions”. He also mentioned the possibility of field equations of higher order, but he thought that such ideas should be pursued “only if there exist empirical reasons to do so”. On this basis he concluded We shall limit ourselves to the four-dimensional space and to the group of continuous real transformations of the coordinates. He went on to describe what he (then) considered to be the “logically most satisfying idea” (involving a non-symmetric tensor), but added a footnote that revealed his lack of conviction, saying he thought the theory had a fair probability of being valid “if the way to an exhaustive description of physical reality on the basis of the continuum turns out to be at all feasible”. A few years later he told Abraham Pais that he “was not sure differential geometry was to be the framework for further progress”, and later still, in 1954, just a year before his death, he wrote to his old friend Besso (quoted in Section 3.8) that he considered it quite possible that physics cannot be based on continuous structures. The dilemma was summed up at the conclusion of his Princeton lectures, where he said One can give good reasons why reality cannot at all be represented by a continuous field. From the quantum phenomena it appears to follow with certainty that a finite system of finite energy can be completely described by a finite set of numbers… but this does not seem to be in accordance with a continuum theory, and must lead to an attempt to find a purely algebraic theory for the description of reality. But nobody knows how to obtain the basis of such a theory. The area of current research involving “spin networks” might be regarded as attempts to obtain an algebraic basis for a theory of space and time, but so far these efforts have not achieved much success. The current field of “string theory” has some algebraic aspects, but it seems to entail much the same kind of dualism that Einstein found so objectionable in Lorentz’s theory. Of course, most modern research into fundamental physics is based on quantum field theory, about which Einstein was never enthusiastic – to put it mildly. (Bargmann told Pais that Einstein once “asked him for a private survey of quantum field theory, beginning with second quantization. Bargman did so for about a month. Thereafter Einstein’s interest waned.”) Of all the various directions that Einstein and others have explored, one of the most intriguing (at least from the standpoint of relativity theory) was the idea of “expanding the group once more in analogy to the step that led from special relativity to general relativity”. However, there are many different ways in which this might conceivably be done. Einstein referred to allowing complex transformations, or non-symmetric, or increasing the number of dimensions, etc., but all these retain the continuum hypothesis.

He doesn’t seem to have seriously considered relaxing this assumption, and allowing completely arbitrary transformations (unless this is what he had in mind when he referred to an “algebraic theory”). Ironically in his expositions of general relativity he often proudly explained that it gave an expression of physical laws valid for completely arbitrary transformations of the coordinates, but of course he meant arbitrary only up to diffeomorphism, which in the absolute sense is not very arbitrary at all. We mentioned in the previous section that diffeomorphically equivalent sets can be assigned the same topology, but from the standpoint of a physical theory it isn't selfevident which diffeomorphism is the right one (assuming there is one) for a particular set of physical entities, such as the events of spacetime. Suppose we're able to establish a 1to-1 correspondence between certain physical events and the sets of four real-valued numbers (x0,x1,x2,x3). (As always, the superscripts are indices, not exponents.) This is already a very strong supposition, because the real numbers are uncountable, even over a finite range, so we are supposing that physical events are also uncountable. However, I've intentionally not characterized these physical events as points in a certain contiguous region of a smooth continuous manifold, because the ability to place those events in a one-to-one correspondence with the coordinate sets does not, by itself, imply any particular arrangement of those events. (We use the word arrangement here to signify the notions of order and nearness associated with a specific topology.) In particular, it doesn't imply an arrangement similar to that of the coordinate sets interpreted as points in the four-dimensional space denoted by R4. To illustrate why the ability to map events with real coordinates does not, by itself, imply a particular arrangement of those events, consider the coordinates of a single event, normalized to the range 0-1, and expressed in the form of their decimal representations, where xmn denotes the nth most significant digit of the mth coordinate, as shown below x0 = 0. x1 = 0. x2 = 0. x3 = 0.

x01 x02 x03 x04 x11 x12 x13 x14 x21 x22 x23 x24 x31 x32 x33 x34

x05 x06 x07 x08 ... x15 x16 x17 x18 ... x25 x26 x27 x28 ... x35 x36 x37 x38 ...

We could, as an example, assign each such set of coordinates to a point in an ordinary four-dimensional space with the coordinates (y0,y1,y2,y3) given by the diagonal sets of digits from the corresponding x coordinates, taken in blocks of four, as shown below y0 = 0. y1 = 0. y2 = 0. y3 = 0.

x01 x12 x23 x34 x02 x13 x24 x31 x03 x14 x21 x32 x04 x11 x22 x33

x05 x16 x27 x38 ... x06 x17 x28 x35 ... x07 x18 x25 x35 ... x08 x15 x26 x37 ...

We could also transpose each consecutive pair of blocks, or scramble the digits in any number of other ways, provided only that we ensure a 1-to-1 mapping. We could even imagine that the y space has (say) eight dimensions instead of four, and we could construct those eight coordinates from the odd and even numbered digits of the four x

coordinates. It's easy to imagine numerous 1-to-1 mappings between a set of abstract events and sets of coordinates such that the actual arrangement of the events (if indeed they possess one) bears no direct resemblance to the arrangement of the coordinate sets in their natural space. So, returning to our task, we've assigned coordinates to a set of events, and we now wish to assert some relationship between those events that remains invariant under a particular kind of transformation of the coordinates. Specifically, we limit ourselves to coordinate mappings that can be reached from our original x mapping by means of a simple linear transformation applied on the natural space of x. In other words, we wish to consider transformations from x to X given by a set of four continuous functions f i with continuous partial first derivatives. Thus we have X0 X1 X2 X3

= = = =

f 0 (x0 , x1 , x2 , x3) f 1 (x0 , x1 , x2 , x3) f 2 (x0 , x1 , x2 , x3) f 3 (x0 , x1 , x2 , x3)

Further, we require this transformation to posses a differentiable inverse, i.e., there exist differentiable functions Fi such that x0 x1 x2 x3

= = = =

F0 (X0 , X1 , X2 , X3) F1 (X0 , X1 , X2 , X3) F2 (X0 , X1 , X2 , X3) F3 (X0 , X1 , X2 , X3)

A mapping of this kind is called a diffeomorphism, and two sets are said to be equivalent up to diffeomorphism if there is such a mapping from one to the other. Any physical theory, such as general relativity, formulated in terms of tensor fields in spacetime automatically possess the freedom to choose the coordinate system from among a complete class of diffeomorphically equivalent systems. From one point of view this can be seen as a tremendous generality and freedom from dependence on arbitrary coordinate systems. However, as noted above, there are infinitely many systems of coordinates that are not diffeomorphically equivalent, so the limitation to equivalent systems up to diffeomorphism can also be seen as quite restrictive. For example, no such functions can possibly reproduce the digit-scrambling transformations discussed previously, such as the mapping from x to y, because those mappings are everywhere discontinuous. Thus we cannot get from x coordinates to y coordinates (or vice versa) by means of continuous transformations. By restricting ourselves to differentiable transformations we're implicitly focusing our attention on one particular equivalence class of coordinate systems, with no a priori guarantee that this class of systems includes the most natural parameterization of physical events. In fact, we don't even know if physical events possess a natural parameterization, or if they do, whether it is unique.

Recall that the special theory of relativity assumes the existence and identifiability of a preferred equivalence class of coordinate systems called the inertial systems. The laws of physics, according to special relativity, should be the same when expressed with respect to any inertial system of coordinates, but not necessarily with respect to non-inertial systems of reference. It was dissatisfaction with having given a preferred role to a particular class of coordinate systems that led Einstein to generalize the "gage freedom" of general relativity, by formulating physical laws in pure tensor form (general covariance) so that they apply to any system of coordinates from a much larger equivalence class, namely, those that are equivalent to an inertial coordinate system up to diffeomorphism. This entails accelerated coordinate systems (over suitably restricted regions) that are outside the class of inertial systems. Impressive though this achievement is, we should not forget that general relativity is still restricted to a preferred class of coordinate systems, which comprise only an infinitesimal fraction of all conceivable mappings of physical events, because it still excludes non-diffeomorphic transformations. It's interesting to consider how we arrive at (and agree upon) our preferred equivalence class of coordinate systems. Even from the standpoint of special relativity the identification of an inertial coordinate system is far from trivial (even though it's often taken for granted). When we proceed to the general theory we have a great deal more freedom, but we're still confined to a single topology, a single pattern of coherence. How is this coherence apprehended by our senses? Is it conceivable that a different set of senses might have led us to apprehend a different coherent structure in the physical world? More to the point, would it be possible to formulate physical laws in such a way that they remain applicable under completely arbitrary transformations? 9.3 Higher-Order Metrics A similar path to the same goal could also be taken in those manifolds in which the line element is expressed in a less simple way, e.g., by a fourth root of a differential expression of the fourth degree… Riemann, 1854 Given three points A,B,C, let dx1 denote the distance between A and B, and let dx2 denote the distance between B and C. Can we express the distance ds between A and C in terms of dx1 and dx2? Since dx1, dx2, and ds all represent distances with comensurate units, it's clear that any formula relating them must be homogeneous in these quantities, i.e., they must appear to the same power. One possibility is to assume that ds is a linear combination of dx1 and dx2 as follows

where g1 and g2 are constants. In a simple one-dimensional manifold this would indeed be the correct formula for ds, with |g1| = |g2| = 1, except for the fact that it might give a

negative sign for ds, contrary to the idea of an interval as a positive magnitude. To ensure the correct sign for ds, we might take the absolute value of the right hand side, which suggests that the fundamental equality actually involves the squares of the two sides of the above equation, i.e., the quantities ds, dx1, dx2 satisfy the relation

where we have put gij = gi gj. Thus we have g11g22  4(g12)2 = 0, which is the condition for factorability of the expanded form as the square of a linear expression. This will be the case in a one-dimensional manifold, but in more general circumstances we find that the values of the gij in the expanded form of (2) are such that the expression is not factorable into linear terms with real coefficients. In this way we arrive at the second-order metric form, which is the basis of Riemannian geometry. Of course, by allowing the second-order coefficients gij to be arbitrary, we make it possible for (ds)2 to be negative, analagous to the fact that ds in equation (1) could be negative, which is what prompted us to square both sides of (1), leading to equation (2). Now that (ds)2 can be negative, we're naturally led to consider the possibility that the fundamental relation is actually the equality of the squares of boths sides of (2). This gives

where the sum is evaluated for µναβ each ranging from 1 to n, where n is the dimension of the manifold. Once again, having arrived at this form, we immediately dispense with the assumption of factorability, and allow general fourth-order metrics. These are non-Riemannian metrics, although Riemann actually alluded to the possibility of fourth and higher order metrics in his famous inagural dissertation. He noted that The line element in this more general case would not be reducible to the square root of a quadratic sum of differential expressions, and therefore in the expression for the square of the line element the deviation from flatness would be an infinitely small quantity of degree two, whereas for the former manifolds [i.e., those whose squared line elements are sums of squares] it was an infinitely small quantity of degree four. This pecularity [i.e., this quantity of the second degree] in the latter manifolds therefore might well be called the planeness in the smallest parts… It's clear even from his brief comments that he had given this possibility considerable thought, but he never published any extensive work on it. Finsler wrote a dissertation on this subject in 1918, so such metrics are now often called Finsler metrics.

To visualize the effect of higher order metrics, recall that for a second-order metric the locus of points at a fixed distance ds from the origin must be a conic, i.e., an ellipse, hyperbola, or parabola. In contrast, a fourth-order metric allows more complicated loci of equi-distant points. When applied in the context of Minkowskian metrics, these higherorder forms raise some intriguing possibilities. For example, instead of a spacetime structure with a single light-like characteristic c, we could imagine a structure with two null characteristics, c1 and c2. Letting x and t denote the spacelike and timelike coordinates respectively, this means (ds/dt)4 vanishes for two values (up to sign) of dx/dt. Thus there are four roots given by c1 and c2, and we have

The resulting metric is

The physical significance of this "metric" naturally depends on the physical meaning of the coordinates x and t. In Minkowski spacetime these represent what physical rulers and clocks measure, and we can translate these coordinates from one inertial system to another according to the Lorentz transformations while always preserving the form of the Minkowski metric with a fixed numerical value of c. The coordinates x and t are defined in such a way that c remains invariant, and this definition happily coincides with the physical measures of rulers and clocks. However, with two distinct light-like "eigenvalues", it's no longer possible for a single family of spacetime decompositions to preserve the values of both c1 and c2. Consequently, the metric will take the form of (3) only with respect to one particular system of xt coordinates. In any other frame of reference at least one of c1 and c2 must be different. Suppose that with respect to a particular inertial system of coordinates x,t the spacetime metric is given by (3) with c1 = 1 and c2 = 2. We might also suppose that c1 corresponds to the null surfaces of electromagnetic wave propagation, just as in Minkowski spacetime. Now, with respect to any other system of coordinates x',t' moving with speed v relative to the x,t coordinates, we can decompose the absolute intervals into space and time components such that c1 = 1, but then the values of the other lightlines (corresponding to c2') must be (v + c2)/(1 + v c2) and (v  c2)/(1  v c2). Consequently, for states of motion far from the one in which the metric takes the special form (3), the metric will become progressively more asymmetrical. This is illustrated in the figure below, which shows contours of constant magnitude of the squared interval.

Clearly this metric does not correspond to the observed spacetime structure, even in the symmetrical case with v = 0, because it is not Lorentz-invariant. As an alternative to this structure containing "super-light" null surfaces we might consider metrics with some finite number of "sub-light" null surfaces, but the failure to exhibit even approximate Lorentz-invariance would remain. However, it is possible to construct infinite-order metrics with infinitely many super-light and/or sub-light null surfaces, and in so doing recover a structure that in many respect is virtually identical to Minkowski spacetime, except for a set (of spacetime trajectories) of measure zero. This can be done by generalizing (3) to include infinitely many discrete factors

where the values of ci represent an infinite family of sub-light parameters given by

A plot showing how this spacetime structure develops as n increases is shown below.

This illustrates how, as the number of sub-light cones goes to infinity, the structure of the manifold goes over to the usual Minkowski pseudometric, except for the discrete null sub-light surfaces which are distributed throughout the interior of the future and past light cones, and which accumulate on the light cones. The sub-light null surfaces become so thin that they no linger show up on these contour plots for large n, but they remain present to all orders. In the limit as n approaches infinity they become discrete null trajectories embedded in what amounts to ordinary Minkowski spacetime. To see this, notice that if none of the factors on the right hand side of (4) is exactly zero we can take the natural log of both sides to give

Thus the natural log of (ds)2 is the asymptotic average of the natural logs of the quantities (dx)2  ci2(dt)2. Since the values of ci accumultate on 1, it's clear that this converges on the usual Minkowski metric (provided we are not precisely on any of the discrete sub-light null surfaces). The preceding metric was based purely on sub-light null surfaces. We could also include n super-light null surfaces along with the n sub-light null surfaces, yielding an asymptotic family of metrics which, again, goes over to the usual Minkowski metric as n goes to infinity (except for the discrete null surface structure). This metric is given by the

formula

where the values of ci are generated as before. The results for various values of n are illustrated in the figure below.

Notice that the quasi Lorentz-invariance of this metric has a subtle periodicity, because any one of the sublight null surfaces can be aligned with the time axis by a suitable choice of velocity, or the time axis can be placed "in between" two null surfaces. In a 1+1 dimensional spacetime the structure is perfectly symmetrical modulo this cycle from one null surface to the next. In other words, the set of exactly equivalent reference systems corresponds to a cycle with a period of µ, which is the increment between each ci and ci+1. However, with more spatial dimensions the sub-light null structure is subtly less symmetrical, because each null surface represents a discrete cone, which associates two of the trajectories in the xt plane as the sides of a single cone. Thus there must be an absolutely innermost cone, in the topological sense, even though that cone may be far off center, i.e., far from the selected time axis. Similarly for the super-light cones (or spheres), there would be a single state of motion with respect to which all of those null surfaces would be spherically symmetrical. Only the accumulation shell, i.e., the actual light-cone itself, would be spherically symmetrical with respect to all states of motion.

9.4 Spin and Polarization Every ray of light has therefore two opposite sides… And since the crystal by this disposition or virtue does not act upon the rays except when one of their sides of unusual refraction looks toward that coast, this argues a virtue or disposition in those sides of the rays which answers to and sympathizes with that virtue or disposition of the crystal, as the poles of two magnets answer to one another… Newton, 1717 The spin of a particle is quantized, so when we make a measurement at any specific angle we get only one of the two results UP or DOWN. This was shown by the famous Stern/Gerlach experiment, in which a beam of particles (atoms of silver) was passed through an oriented magnetic field, and it was found that the beam split into two beams, one deflected UP (relative to the direction of the magnetic field) and the other deflected DOWN, with about half of the particles in each.

This behavior implies that the state-vector for spin has just two components, vUP and vDOWN, for any given direction v. These components are weighted and the sum of the squares of the weights equals 1. (The overall state-vector for the particle can be decomposed into the product of a non-spin vector times the spin vector.) The observable "spin" then corresponds to three operators that are proportional to the Pauli spin matrices:

These operators satisfy the commutation relations

as we would expect by the correspondence principle from ordinary (classical) spin. Not surprisingly, this non-commutation is closely related to the non-commutation of ordinary spatial rotations of a classical particle, in the sense that they're both related to the crossproduct of orthogonal vectors. Given an orthogonal coordinate system [x,y,z] the angular momentum of a classical particle with momentum [px, py, pz] is (in component form)

Guided by the correspondence principle, we replace the classical components px, py, pz with their quantum mechanical equivalents, the differential operators -i d/dx, -i d/dy, -i d/dz, leading to the S operators noted above. Photons too have quantum spin (they are spin-1 particles), but since photons travel at the speed c, the "spin axis" of a photon is always parallel to its direction of motion, pointing either forward or backward. These two states correspond to left-handed and right-handed photons. Whenever a photon is absorbed by an object, an angular momentum of either +h/2π or -h/2π is imparted to the object. Each photon "in transit" may be considered to possess, in addition to its phase, a certain propensity to exhibit each of the two possible states of spin when it interacts with an object, and a beam of light can be characterized by the spin propensities (polarization) and phase relations of its constituent photons. Polarization behaves in a way that is formally very similar to the spin of massive particles. In a sense, the Schrodinger wave of a photon corresponds to the electromagnetic wave of light, and this wave is governed by Maxwell's equations, which tell us that the electric and magnetic fields oscillate transversely in the plane normal to the direction of motion (and perpendicular to each other). Thus a photon coming directly toward us "looks" something like this:

where E signifies the oscillating electric field and B the magnetic field. (This orientation is not necessarily fixed - it's possible for it to rotate like a windmill - but it's simplest to concentrate on "plane-polarized" photons.) The photon is said to be polarized in the direction of E. A typical beam of ordinary light has photons with all different polarizations mixed together, but certain substances (such as calcite crystals or a sheet of Polaroid) allow photons to pass through only if their electric field is oscillating in one a particular direction. Therefore, when we pass a beam of light through a polarizing material, the light that passes through is "polarized", because all the photons have their electric fields aligned. Since only photons with one particular alignment are allowed to pass, and since the incident beam has photons whose polarizations are distributed uniformly in all direction, one might expect to find that only a very small fraction of the photons would pass

through a perfect polarizing substance. (In fact, the fraction of photons from a uniform distribution with polarizations exactly aligned with the polarizing axis of the substance should be vanishingly small.) However, we actually find that a sheet of Polaroid cuts the intensity of an ordinary light beam about in half. Just as in the Stern/Gerlach experiment with massive particles, the Polaroid sheet acts as a measurement for each photon, and gives one of two answers, as if the incoming photons were all polarized in one of just two directions, exactly parallel to the polarizing axis of the substance, or exactly perpendicular to it. This is analogous to the binary UP/DOWN results for spin-1/2 particles such as electrons. If we place a second sheet of Polaroid behind the first, and orient its axis in the same direction, then we find that all the light which passes through the first sheet also passes through the second. If we rotate the second sheet it will start to cut down on the photons allowed through. When we get the second sheet axis at 90 degrees to the first, it will essentially block all the photons. In general, if the two sheets (i.e., measurements) are oriented at an angle of θ relative to each other, then the intensity of the light passing all the way through is I cos(θ)2, where I is the intensity of the original beam. The thickness of the polarizing substance isn't crucial (assuming the polarization axis is perfectly uniform throughout the substance, because the first surface effectively "selects" the suitably aligned photons, which then pass freely through the rest of the substance. The light emerging from the other side is plane-polarized with half the intensity of the incident light. On the other hand to convert circularly polarized incident light into planepolarized light of the same intensity, the traditional method is to use a "quarter-wave plate" thickness of a crystal substance such as mica. In this case we're not masking out the non-aligned components, but rather introducing a relative phase shift between them so as to force them into alignment. Of course, a particular thickness of plate only "works" this way for a particular frequency. Incidentally, most people have personal "hands on" knowledge of polarized electromagnetic waves without even realizing it. The waves broadcast by a radio or television tower are naturally polarized, and if you've ever adjusted the orientation of "rabbit ears" and found that your reception is better at some orientations than at others (for a particular station) you've demonstrated the effects of electromagnetic wave polarization. It may be worth noting that light polarization and photon spin, although intimately related, are not precisely synonymous. The photon's spin axis is always parallel to the direction of travel, whereas the polarization axis of a wave of light is perpendicular to the direction of travel. It happens that the polarization affects the behavior of photons in a formally similar way to the effect of spin on the behavior of massive particles. Polarization itself is often not regarded as a quantum phenomenon, and it takes on quantum behavior only because light is quantized into photons. Regarding the parallel between Schrodinger's equations and Maxwell's equations, it's interesting to draw the further parallel between the real/imaginary complexity of the

Schrodinger wave and the electric/magnetic complexity of light waves.

9.5 Entangled Events Anyone who is not shocked by quantum theory has not understood it. Niels Bohr, 1927 A paper written by Einstein, Podalsky, and Rosen (EPR) in 1935 described a thought experiment which, the authors believed, demonstrated that quantum mechanics does not provide a complete description of physical reality, at least not if we accept certain common notions of locality and realism. Subsequently the EPR experiment was refined by David Bohm (so it is now called the EPRB experiment) and analyzed in detail by John Bell, who highlighted a fascinating subtlety that Einstein, et al, may have missed. Bell showed that the outcomes of the EPRB experiment predicted by quantum mechanics are inherently incompatible with conventional notions of locality and realism combined with a certain set of assumptions about causality. The precise nature of these causality assumptions is rather subtle, and Bell found it necessary to revise and clarify his premises from one paper to the next. In Section 9.6 we discuss Bell's assumptions in detail, but for the moment we'll focus on the EPRB experiment itself, and the outcomes predicted by quantum mechanics. Most actual EPRB experiments are conducted with photons, but in principle the experiment could be performed with massive particles. The essential features of the experiment are independent of the kind of particle we use. For simplicity we'll describe a hypothetical experiment using electrons (although in practice it may not be feasible to actually perform the necessary measurements on individual electrons). Consider the decay of a spin-0 particle resulting in two spin-1/2 particles, an electron and a positron, ejected in opposite directions. If spin measurements are then performed on the two individual particles, the correlation between the two results is found to depend on the difference between the two measurement angles. This situation is illustrated below, with α and β signifying the respective measurement angles at detectors 1 and 2.

Needless to say, the mere existence of a correlation between the measurements on these two particles is not at all surprising. In fact, this would be expected in most classical models, as would a variation in the correlation as a function of the absolute difference θ = |α  β| between the two measurement angles. The essential strangeness of the quantum mechanical prediction is not the mere existence of a correlation that varies with θ, it is the

non-linearity of the predicted variation. If the correlation varied linearly as θ ranged from 0 to π, it would be easy to explain in classical terms. We could simply imagine that the decay of the original spin-0 particle produced a pair of particles with spin vectors pointing oppositely along some randomly chosen axis. Then we could imagine that a measurement taken at any particular angle gives the result UP if the angle is within π/2 of the positive spin axis, and gives the result DOWN otherwise. This situation is illustrated below:

Since the spin axis is random, each measurement will have an equal probability of being UP or DOWN. In addition, if the measurements on the two particles are taken in exactly the same direction, they will always give opposite results (UP/DOWN or DOWN/UP), and if they are taken in the exact opposite directions they will always give equal results (UP/UP or DOWN/DOWN). Also, if they are taken at right angles to each other the results will be completely uncorrelated, meaning they are equally likely to agree or disagree. In general, if θ denotes the absolute value of the angle between the two spin measurements, the above model implies that the correlation between these two measurements would be C(q) = (2/π)θ  1, as plotted below.

This linear correlation function is consistent with quantum mechanics (and confirmed by experiment) if the two measurement angles differ by θ = 0, π/2, or π, giving the correlations 1, 0, and +1 respectively. However, for intermediate angles, quantum theory predicts (and experiments confirm) that the actual correlation function for spin-1/2 particles is not the linear function shown above, but the non-linear function given by C(θ) = cos(θ), as shown below

On this basis, the probabilities of the four possible joint outcomes of spin measurements performed at angles differing by θ are as shown in the table below. (The same table would apply to spin-1 particles such as photons if we replace θ with 2θ.)

To understand why the shape of this correlation function defies explanation within the classical framework of local realism, suppose we confine ourselves to spin measurements along one of just three axes, at 0, 120, and 240 degrees. For convenience we will denote these axes by the symbols A, B, and C respectively. Several pairs of particles are produced and sent off to two distant locations in opposite directions. In both locations a spin measurement along one of the three allowable axes is performed, and the results are recorded. Our choices of measurements (A, B, or C) may be arbitrary, e.g., by flipping coins, or by any other means. In each location it is found that, regardless of which measurement is made, there is an equal probability of spin UP or spin DOWN, which we will denote by "1" and "0" respectively. This is all that the experimenters at either site can determine separately. However, when all the results are brought together and compared in matched pairs, we find the following joint correlations

The numbers in this matrix indicate the fraction of times that the results agreed (both 0 or both 1) when the indicated measurements were made on the two members of a matched pair of objects. Notice that if the two distant experimenters happened to have chosen to make the same measurement for a given pair of particles, the results never agreed, i.e., they were always the opposite (1 and 0, or 0 and 1). Also notice that, if both measurements are selected at random, the overall probability of agreement is 1/2. The remarkable fact is that there is no way (within the traditional view of physical processes) to prepare the pairs of particles in advance of the measurements such that they will give the joint probabilities listed above. To see why, notice that each particle must be ready to respond to any one of the three measurements, and if it happens to be the same measurement as is selected on its matched partner, then it must give the opposite answer. Hence if the particle at one location will answer "0" for measurement A, then the particle at the other location must be prepared to give the answer "1" for measurement A. There are similar constraints on the preparations for measurements B and C, so there are really only eight ways of preparing a pair of particles

These preparations - and only these - will yield the required anti-correlation when the same measurement is applied to both objects. Therefore, assuming the particles are preprogrammed (at the moment when they separate from each other) to give the appropriate result for any one of the nine possible joint measurements that might be performed on them, it follows that each pair of particles must be pre-programmed in one of the eight ways shown above. It only remains now to determine the probabilities of these eight preparations. The simplest state of affairs would be for each of the eight possible preparations to be equally probable, but this yields the measurement correlations shown below

Not only do the individual joint probabilities differ from the quantum mechanical predictions, this distribution gives an overall probability of agreement of 1/3, rather than 1/2 (as quantum mechanics says it must be), so clearly the eight possible preparations cannot be equally likely. Now, we might think some other weighting of these eight preparation states will give the right overall results, but in fact no such weighting is possible. The overall preparation process must yield some linear convex combination of the eight mutually exclusive cases, i.e., each of the eight possible preparations must have some fixed long-term probability, which we will denote by a, b,.., h, respectively. These probabilities are all positive values in the range 0 to 1, and the sum of these eight values is identically 1. It follows that the sum of the six probabilities b through g must be less than or equal to 1. This is a simple form of "Bell's inequality", which must be satisfied by any local realistic model of the sort that Bell had in mind. However, the joint probabilities in the correlation table predicted by quantum mechanics imply

Adding these three expressions together gives 2(b + c + d + e + f + g) = 9/4, so the sum of the probabilities b through g is 9/8, which exceeds 1. Hence the results of the EPRB experiment predicted by quantum mechanics (and empirically confirmed) violate Bell's inequality. This shows that there does not exist a linear combination of those eight preparations that can yield the joint probabilities predicted by quantum mechanics, so there is no way of accounting for the actual experimental results by means of any realistic local physical model of the sort that Bell had in mind. The observed violations of Bell's inequality in EPRB experiments imply that Bell's conception of local realism is inadequate to represent the actual processes of nature. The causality assumptions underlying Bell's analysis are inherently problematic (see Section 9.7), but the analysis is still important, because it highlights the fundamental inconsistency between the predictions of quantum mechanics and certain conventional ideas about causality and local realism. In order to maintain those conventional ideas, we would be forced to conclude that information about the choice of measurement basis at one detector is somehow conveyed to the other detector, influencing the outcome at that detector, even though the measurement events are space-like separated. For this reason, some people have been tempted to think that violations of Bell's inequality imply superluminal communication, contradicting the principles of special relativity. However, there is actually no effective transfer of information from one measurement to the other in

an EPRB experiment, so the principles of special relativity are safe. One of the most intriguing aspects of Bell's analysis is that it shows how the workings of quantum mechanics (and, evidently, nature) involve correlations between space-like separated events that seemingly could only be explained by the presence of information from distant locations, even though the separate events themselves give no way of inferring that information. In the abstract, this is similar to "zero-information proofs" in mathematics. To illustrate, consider a "twins paradox" involving a pair of twin brothers who are separated and sent off to distant locations in opposite directions. When twin #1 reaches his destination he asks a stranger there to choose a number x1 from 1 to 10, and the twin writes this number down on a slip of paper along with another number y1 of his own choosing. Likewise twin #2 asks someone at his destination to choose a number x2, and he writes this number down along with a number y2 of his own choosing. When the twins are re-united, we compare their slips of paper and find that |y2  y1| = (x2  x1)2. This is really astonishing. Of course, if the correlation was some linear relationship of the form y2  y1 = A(x2  x1) + B for any pre-established constants A and B, the result would be quite easy to explain. We would simply surmise that the twins had agreed in advance that twin #1 would write down y1 = Ax1  B/2, and twin #2 would write down y2 = Ax2 + B/2. However, no such explanation is possible for the observed non-linear relationship, because there do not exist functions f1 and f2 such that f2(x2)  f1(x1) = (x2  x1)2. Thus if we assume the numbers x1 and x2 are independently and freely selected, and there is no communication between the twins after they are separated, then there is no "locally realistic" way of accounting for this non-linear correlation. It seems as though one or both of the twins must have had knowledge of his brother's numbers when writing down his own number, despite the fact that it is not possible to infer anything about the individual values of x2 and y2 from the values of x1 and y1 or vice versa. In the same way, the results of EPRB experiments imply a greater degree of interdependence between separate events than can be accounted for by traditional models of causality. One possible idea for adjusting our conceptual models to accommodate this aspect of quantum phenomena would be to deny the existence of any correlations until they becomes observable. According to the most radical form of this proposal, the universe is naturally partitioned into causally compact cells, and only when these cells interact do their respective measurement bases become reconciled, in such a way as to yield the quantum mechanical correlations. This is an appealing idea in many ways, but it's far from clear how it could be turned into a realistic model. Another possibility is that the preparation of the two particles at the emitter and the choices of measurement bases at the detectors may be mutually influenced by some common antecedent event(s). This can never be ruled out, as discussed in Section 9.6. Lastly, we mention the possibility that the preparation of the two particles may be conditioned by the measurements to which they are subjected. This is discussed in Section 9.10.

9.6 Von Neumann's Postulate and Bell’s Freedom If I have freedom in my love, And in my soul am free, Angels alone, that soar above, Enjoy such liberty. Richard Lovelace, 1649 In quantum mechanics the condition of a physical system is represented by a state vector, which encodes the probabilities of each possible result of whatever measurements we may perform on the system. Since the probabilities are usually neither 0 nor 1, it follows that for a given system with a specific state vector, the results of measurements generally are not uniquely determined. Instead, there is a set (or range) of possible results, each with a specific probability. Furthermore, according to the conventional interpretation of quantum mechanics (the "Copenhagen Interpretation" advocated by Niels Bohr, et al), the state vector is the most complete possible description of the system, which implies that nature is fundamentally probabilistic (i.e., non-deterministic). However, it's natural to question whether this interpretation is correct, or whether there might be some more complete description of a system, such that a fully specified system would respond deterministically to any measurement we might perform. Such proposals are called 'hidden variable' theories. In his assessment of hidden variable theories in 1932, John von Neumann pointed out a set of five assumptions which, if we accept them, imply that no hidden variable theory can possibly give deterministic results for all measurements. The first four of these assumptions are fairly unobjectionable, but the fifth seems much more arbitrary, and has been the subject of much discussion. (The parallel with Euclid's postulates, including the controversial fifth postulate discussed in Chapter 3.1, is striking.) To understand von Neumann's fifth postulate, notice that although the conventional interpretation does not uniquely determine the outcome of a particular measurement for a given state, it does predict a unique 'expected value' for that measurement. Let's say a measurement of X on a system with a state vector ϕ has an expected value denoted by <X;ϕ>, computed by simply adding up all the possible results multiplied by their respective probabilities. Not surprisingly, the expected values of observables are additive, in the sense that

In practice we can't generally perform a measurement of X+Y without disturbing the measurements of X and Y, so we can't measure all three observables on the same system. However, if we prepare a set of systems, all with the same initial state vector ϕ, and perform measurements of X+Y on some of them, and measurements of X or Y on the others, then the averages of the measured values of X, Y, and X+Y (over sufficiently many systems) will be related in accord with (1).

Remember that according to the conventional interpretation the state vector ϕ is the most complete possible description of the system. On the other hand, in a hidden variable theory the premise is that there are additional variables, and if we specify both the state vector ϕ AND the "hidden vector" H, the result of measuring X on the system is uniquely determined. In other words, if we let <X;ϕ,H> denote the expected value of a measurement of X on a system in the state (ϕ,H), then the claim of the hidden variable theorist is that the variance of individual measured values around this expected value is zero. Now we come to von Neumann's controversial fifth postulate. He assumed that, for any hidden variable theory, just as in the conventional interpretation, the averages of X+Y, X and Y evaluated over a set of identical systems are additive. (Compare this with Galileo's assumption of simple additivity for the composition of incommensurate speeds.) Symbolically, this is expressed as

for any two observables X and Y. On this basis he proved that the variance ("dispersion") of at least one observable's measurements must be greater than zero. (Technically, he showed that there must be an observable X such that <X2> is not equal to <X>2.) Thus, no hidden variable theory can uniquely determine the results of all possible measurements, and we are compelled to accept that nature is fundamentally nondeterministic. However, this is all based on (2), the assumption of additivity for the expectations of identically prepared systems, so it's important to understand exactly what this assumption means. Clearly the words "identically prepared" mean something different under the conventional interpretation than they do in the context of a hidden variable theory. Conventionally, two systems are said to be identically prepared if they have the same state vector (ϕ), but in a hidden variable theory two states with the same state vector are not necessarily "identical", because they may have different hidden vectors (H). Of course, a successful hidden variable theory must satisfy (1) (which has been experimentally verified), but must it necessarily satisfy (2)? Relation (1) implies that the averages of <X;ϕ,H>, etc, evaluated over all applicable hidden vectors H, leads to (1), but does it necessarily follow that (2) is satisfied for every (or even for ANY) specific value of H? To give a simple illustration, consider the following trivial set of data:

The averages over these four "conventionally indistinguishable" systems are <X;3> = 3, = 4, and <X+Y;3> = 7, so relation (1) holds. However, if we examine the "identically prepared" systems taking into account the hidden components of the state, we really have two different states (those with H=1 and those with H=2), and we find that the results are not additive (but they are deterministic) in these fully-defined states. Thus, equation (1) clearly doesn't imply equation (2). (If it did, von Neumann could have said so, rather than taking it as an axiom.) Of course, if our hidden variable theory is always going to satisfy (1), we must have some constraints on the values of H that arise among "conventionally indistinguishable" systems. For example, in the above table if we happened to get a sequence of systems all in the same condition as System #1 we would always get the results X=2, Y=5, X+Y=5, which would violate (1). So, if (2) doesn't hold, then at the very least we need our theory to ensure a distribution of the hidden variables H that will make the average results over a set of "conventionally indistinguishable" systems satisfy relation (1). (In the simple illustration above, we would just need to ensure that the hidden variables are equally distributed between H=1 and H=2.) In Bohm's 1952 theory the hidden variables consist of precise initial positions for the particles in the system – more precise than the uncertainty relations would typically allow us to determine - and the distribution of those variables within the uncertainty limits is governed as a function of the conventional state vector, ϕ. It's also worth noting that, in order to make the theory work, it was necessary for ϕ to be related to the values of H for separate particles instantaneously in an explicitly non-local way. Thus, Bohm's theory is a counter-example to von Neumann's theorem, but not to Bell's (see below). Incidentally, it may be worth noting that if a hidden variable theory is valid, and the variance of all measurements around their expectations are zero, then the terms of (2) are not only the expectations, they are the unique results of measurements for a given ϕ and H. This implies that they are eigenvalues, of the respective operators, whereas the expectations for those operators are generally not equal to any of the eigenvalues. Thus, as Bell remarked, "[von Neumann's] 'very general and plausible postulate' is absurd". Still, Gleason showed that we can carry through von Neumann's proof even on the weaker assumption that (2) applies to commuting variables. This weakened assumption has the advantage of not being self-evidently false. However, careful examination of Gleason's proof reveals that the non-zero variances again arise only because of the existence of non-commuting observables, but this time in a "contextual" sense that may not be obvious at first glance. To illustrate, consider three observables X,Y,Z. If X and Y commute and X and Z commute, it doesn't follow that Y and Z commute. We may be able to measure X and Y using one setup, and X and Z using another, but measuring the value of X and Y simultaneously will disturb the value of Z. Gleason's proof leads to non-zero variances precisely for measurements in such non-commuting contexts. It's not hard to understand

this, because in a sense the entire non-classical content of quantum mechanics is the fact that some observables do not commute. Thus it's inevitable that any "proof" of the inherent non-classicality of quantum mechanics must at some point invoke noncommuting measurements, but it's precisely at that point where linear additivity can only be empirically verified on an average basis, not a specific basis. This, in turn, leaves the door open for hidden variables to govern the individual results. Notice that in a "contextual" theory the result of an experiment is understood to depend not only on the deterministic state of the "test particles" but also on the state of the experimental apparatus used to make the measurements, and these two can influence each other. Thus, Bohm's 1952 theory escaped the no hidden variable theorems essentially by allowing the measurements to have an instantaneous effect on the hidden variables, which, of course, made the theory essentially non-local as well as non-relativistic (although Bohm and others later worked to relativize his theory). Ironically, the importance of considering the entire experimental setup (rather than just the arbitrarily identified "test particles") was emphasized by Niels Bohr himself, and it's a fundamental feature of quantum mechanics (i.e., objects are influenced by measurements no less than measurements are influenced by objects). As Bell said, even Gleason's relatively robust line of reasoning overlooks this basic insight. Of course, it can be argued that contextual theories are somewhat contrived and not entirely compatible with the spirit of hidden variable explanations, but, if nothing else, they serve to illustrate how difficult it is to categorically rule out "all possible" hidden variable theories based simply on the structure of the quantum mechanical state space. In 1963 John Bell sought to clarify matters, noting that all previous attempts to prove the impossibility of hidden variable interpretations of quantum mechanics had been “found wanting”. His idea was to establish rigorous limits on the kinds of statistical correlations that could possibly exist between spatially separate events under the assumption of determinism and what might be called “local realism”, which he took to be the premises of Einstein, et al. At first Bell thought he had succeeded, but it was soon pointed out that his derivation implicitly assumed one other crucial ingredient, namely, the possibility of free choice. To see why this is necessary, notice that any two spatially separate events share a common causal past, consisting of the intersection of their past light cones. This implies that we can never categorically rule out some kind of "pre-arranged" correlation between spacelike-separated events - at least not unless we can introduce information that is guaranteed to be causally independent of prior events. The appearance of such "new events" whose information content is at least partially independent of their causal past, constitutes a free choice. If no free choice is ever possible, then (as Bell acknowledged) the Bell inequalities do not apply. In summary, Bell showed that quantum mechanics is incompatible with a quite peculiar pair of assumptions, the first being that the future behavior of some particles (i.e., the "entangled" pairs) involved in the experiment is mutually conditioned and coordinated in advance, and the second being that such advance coordination is in principle impossible for other particles involved in the experiment (e.g., the measuring apparatus). These are

not quite each others' logical negations, but close to it. One is tempted to suggest that the mention of quantum mechanics is almost superfluous, because Bell's result essentially amounts to a proof that the assumption of a strictly deterministic universe is incompatible with the assumption of a strictly non-deterministic universe. He proved, assuming the predictions of quantum mechanics are valid (which the experimental evidence strongly supports), that not all events can be strictly consequences of their causal pasts, and in order to carry out this proof he found it necessary to introduce the assumption that not all events are strictly consequences of their causal pasts! In the paper "Atomic-Cascade Photons, Quantum Mechanical Non-Locality", Bell listed three possible positions that he thought could be taken with respect to the Aspect experiments. (Actually he lists four, the fourth being "Just ignore it".) These alternatives are

Regarding the third possibility, Bell wrote: ...if our measurements are not independently variable as we supposed...even if chosen by apparently free-willed physicists... then Einstein local causality can survive. But apparently separate parts of the world become deeply entangled, and our apparent free will is entangled with them. The third possibility clearly shows that Bell understood the necessity of assuming free acausal events for his derivation, but since this amounts to assuming precisely that which he was trying to prove, we must acknowledge that the significance of Bell's inequalities is less clear than many people originally believed. In effect, after clarifying the lack of significance of von Neumann's "no hidden variables proof" due to its assumption of what it meant to prove, Bell proceeded to repeat the mistake, albeit in a more subtle way. Perhaps Bell's most perspicacious remark was (in reference to Von Neumann's proof) that the only thing proved by impossibility proofs is the author's lack of imagination. This all just illustrates that it's extremely difficult to think clearly about causation, and the reasons for this can be traced back to the Aristotelian distinction between natural and violent motion. Natural motion consisted of the motions of non-living objects, such as the motions of celestial objects, the natural flows of water and wind, etc. These are the kinds of motion that people (like Bell) apparently have in mind when they think of determinism. Following the ancients, people tend to instinctively exempt "violent motions", i.e., motions resulting from acts of living volition, when considering determinism. It's psychologically very difficult for us to avoid bifurcating the world into inanimate objects that obey strict laws of causality, and animate objects (like ourselves) that do not. This dichotomy was historically appealing, but it always left the nagging question of how or why we (and our constituent atoms) manage to evade the iron hand of

determinism that governs everything else. This view affects our conception of science by suggesting to us that the experimenter is not himself part of nature, and is exempt from whatever determinism is postulated for the system being studied. Thus we imagine that we can "test" whether the universe is behaving deterministically by turning some dials and seeing how the universe responds, overlooking the fact that we and the dials are also part of the universe. This immediately introduces "the measurement problem", i.e., where do we draw the boundaries between separate phenomena? What is an observation? How do we distinguish "nature" from "violence", and is this distinction even warranted? It's worth noting that when people say they're talking about a deterministic world, they're almost always not. What they're usually talking about is a deterministic sub-set of the world that can be subjected to freely chosen inputs from a non-deterministic "exterior". But just as with the measurement problem in quantum mechanics, when we think we've figured out the constraints on how a deterministic test apparatus can behave in response to arbitrary inputs, someone says "but isn't the whole lab a deterministic system?", and then the whole building, and so on. At what point does "the collapse of determinism" occur, so that we can introduce free inputs to test the system? Just as the infinite regress of the measurement problem in quantum mechanics leads us into bewilderment, so too does the infinite regress of determinism. The other loop-hole that can never be closed is what Bell called "correlation by postarrangement" or "backwards causality". I'd prefer to say that the system may violate the assumption of strong temporal asymmetry, but the point is the same. Clearly the causal pasts of the spacelike separated arms of an EPR experiment overlap, so all the objects involved share a common causal past. Therefore, without something to "block off" this region of common past from the emission and absorption events in the EPR experiment, we're not justified in asserting causal independence, which is required for Bell's derivation. The usual and, as far as I know, only way of blocking off the causal past is by injecting some "other" influence, i.e., an influence other than the deterministic effects propagating from the causal past. This "other" may be true randomness, free will, or some other concept of "free occurrence". In any case, Bell's derivation requires us to assert that each measurement is a "free" action, independent of the causal past, which is inconsistent with even the most limited construal of determinism. There is a fascinating parallel between the ancient concepts of natural and violent motion and the modern quantum mechanical concepts of the linear evolution of the wave function and the collapse of the wave function. These modern concepts are sometimes termed U, for unitary evolution of the quantum mechanical state vector, and R, for reduction of the state vector onto a particular basis of measurement or observation. One could argue that the U process corresponds closely with Aristotle's natural (inanimate) evolution, while the R process represents Aristotle's violent evolution, triggered by some living act. As always, we face the question of whether this is an accurate or meaningful bifurcation of events. Today there are several "non-collapse" interpretations of quantum mechanics, including the famous "many worlds" interpretation of Everett and DeWit. However, to date, none of these interpretations has succeeded in giving a completely

satisfactory account of quantum mechanical processes, so we are not yet able to dispense with Aristotle's distinction between natural and violent motion.

9.7 The Gestalt of Determinism Then assuredly the world was made not in time, but simultaneously with time. St. Augustine, 400 AD

Determinism is commonly defined as the proposition that each event is the necessary and unique consequence of prior events. This implies that events transpire in a temporally ordered sequence, and that a wave of implication somehow flows along this sequence, fixing or deciding each successive event based on the preceding events, in accord with some definite rule (which may or may not be known to us). This description closely parallels the beginning of Laplace’s famous remarks on the subject: We ought then to regard the present state of the universe as the effect of the anterior state and as the cause of the one that is to follow…

However, at this point Laplace introduces a gestalt shift (like the sudden realignment of meaning that Donne often placed at the end of his "metaphysical" poems). After describing the temporally ordered flow of events, he notes a profound shift in the perception of "a sufficiently vast intelligence" ...nothing would be uncertain, and the future, as the past, would be present to its eyes.

This shows how we initially conceive of determinism as a temporally ordered chain of implication, but when carried to its logical conclusion we are led inevitably to the view of an atemporal "block universe" that simply exists. At some point we experience a gestalt shift from a universe that is occurring to a universe that simply is. The concepts of time and causality in such a universe can be (at most) psychological interpretations, lacking any active physical significance. In order for time and causality to be genuinely active, a degree of freedom is necessary, because without freedom we immediately regress to an atemporal block universe, in which there can be no absolute direction of implication. Of course, it may well be that certain directions in a deterministic block universe are preferred based on the simplicity with which they can be described and conceptually grasped. For example, it may be possible to completely specify the universe based on the contents of a particular cross-sectional slice, together with a simple set of fixed rules for recursively inferring the contents of neighboring slices in a particular sequence, whereas other sequences may require a vastly more complicated “rule”. However, in a deterministic universe this chain of implication is merely a descriptive convenience, and cannot be regarded as the effective mechanism by which the events “come into being”. The static view is fully consistent not only with the Newtonian universe that Laplace imagined, but also with the theory of relativity, in which the worldlines of objects

(through spacetime) can be considered to be already existent in their entirety. (Indeed this is a necessary interpretation if we are to incorporate worldlines actually crossing event horizons.) In this sense relativity is a purely classical theory. On the other hand, quantum mechanics is widely regarded as decidedly non-deterministic. Indeed, we saw in Section 9.6 the famous theorem of von Neumann purporting to rule out determinism (in the form of hidden variables) in the realm of quantum mechanics. However, as Einstein observed Whether objective facts are subject to causality is a question whose answer necessarily depends on the theory from which we start. Therefore, it will never be possible to decide whether the world is causal or not.

Note that the word “causal” is being used here as a synonym for deterministic, since Einstein had in mind strict causality, with no free choices, as summarized in his famous remark that “God does not play dice with the universe”. We've seen that von Neumann’s proof was based on a premise which is effectively equivalent to what he was trying to prove, nicely illustrating Einstein’s point that the answer depends on the theory from which we start. In other words, an assertion about what is recursively possible can be meaningful only if we place some constraint on the complexity of the allowable recursive "algorithm". For example, the nth state vector of a system may be the kn+1 through k(n+1) digits of π. This would be a perfectly deterministic system, but the relations between successive states would be extremely obscure. In fact, assuming the digits of the two transcendental numbers π and e are normally distributed (as is widely believed, though not proven), any finite string of decimal digits occurs infinitely often in their decimal expansions, and each string occurs with the same frequency in both expansions. (It's been noted that, assuming normality, the digits of π would make an inexhaustible source of high-quality "random" number sequences, higher quality than anything we can get out of conventional pseudorandom number generators). Therefore, given any finite number of digits (observations), we could never even decide whether the operative “algorithm” was π or e, nor whether we had correctly identified the relevant occurrence in the expansion. Thus we can easily imagine a perfectly deterministic universe that is also utterly unpredictable. (Interestingly, the recent innovation that enables computation of the nth hexadecimal digit of π (with much less work than required to compute the first n digits) implies that we could present someone with a sequence of digits and challenge them to determine where it first occurs in the decimal expansion of π, and it may be practically impossible for them to find the answer.) Even worse, there need be no simple rule of any kind relating the events of a deterministic universe. This highlights the important distinction between determinism and the concepts of predictability and complexity. There is no requirement for a deterministic universe to be predictable, or for its complexity to be limited in any way. Thus, we can never prove that any finite set of observations could only have occurred in a nondeterministic algorithm. In a sense, this is trivially true, because a finite Turing machine can always be written to generate any given finite string, although the algorithm necessary to generate a very irregular string may be nearly as long as the string itself. Since determinism is inherently undecidable, we may try to define a more tractable

notion, such as predictability, in terms of the exhibited complexity manifest in our observations. This could be quantified as the length of the shortest Turing machine required to reproduce our observations, and we might imagine that in a completely random universe, the size of the required algorithm would grow in proportion to the number of observations (as we are forced to include ad hoc modifications to the algorithm to account for each new observation). On this basis it might seem that we could eventually assert with certainty that the universe is inherently unpredictable (on some level of experience), i.e., that the length of the shortest Turing machine required to duplicate the results grows in proportion with the number of observations. In a sense, this is what the "no hidden variables" theorems try to do. However, we can never reach such a conclusion, as shown by Chaitin's proof that there exists an integer k such that it's impossible to prove that the complexity of any specific string of binary bits exceeds k (where "complexity" is defined as the length of the smallest Turing program that generates the string). This is true in spite of the fact that "almost all" strings have complexity greater than k. Therefore, even if we (sensibly) restrict our meaningful class of Turing machines to those of complexity less than a fixed number k (rather than allowing the complexity of our model to increase in proportion to the number of observations), it's still impossible for any finite set of observations (even if we continue gathering data forever) to be provably inconsistent with a Turing machine of complexity less than k. (Naturally we must be careful not to confuse the question of whether "there exist" sequences of complexity greater than k with the question of whether we can prove that any particular sequence has complexity greater than k.)

9.8 Quaedam Tertia Natura Abscondita The square root of 9 may be either +3 or -3, because a plus times a plus or a minus times a minus yields a plus. Therefore the square root of -9 is neither +3 nor -3, but is a thing of some obscure third nature. Girolamo Cardano, 1545 In a certain sense the peculiar aspects of quantum spin measurements in EPR-type experiments can be regarded as a natural extension of the principle of special relativity. Classically a particle has an intrinsic spin about some axis with an absolute direction, and the results of measurements depend on the difference between this absolute spin axis and the absolute measurement axis. In contrast, quantum theory says there are no absolute spin angles, only relative spin angles. In other words, the only angles that matter are the differences between two measurements, whose absolute values have no physical significance. Furthermore, the relations between measurements vary in a non-linear way, so it's not possible to refer them to any absolute direction. This "relativity of angular reference frames" in quantum mechanics closely parallels the relativity of translational reference frames in special relativity. This shouldn’t be too

surprising, considering that velocity “boosts” are actually rotations through imaginary angles. Recall from Section 2.4 that the relationship between the frequencies of a given signal as measured by the emitter and absorber depends on the two individual speeds ve and va relative to the medium through which the signal propagates at the speed cs, but as this speed approaches c (the speed of light in a vacuum), the frequency shift becomes dependent only on a single variable, namely, the mutual speed between the emitter and absorber relative to each other. This degeneration of dependency from two independent “absolute” variables down to a single “relative” variable is so familiar today that we take it for granted, and yet it is impossible to explain in classical Newtonian terms. Schematically we can illustrate this in terms of three objects in different translational frames of reference as shown below:

The object B is stationary (corresponding to the presumptive medium of signal propagation), while objects A and C move relative to B in opposite directions at high speed. Intuitively we would expect the velocity of A in terms of the rest frame of C (and vice versa) to equal the sum of the velocities of A and C in terms of the rest frame of B. If we allowed the directions of motion to be oblique, we would still have the “triangle inequality” placing limits on how the mutual speeds are related to each other. This could be regarded as something like a “Bell inequality” for translational frames of reference. When we measure the velocity of A in terms of the rest frame of C we find that it does not satisfy this additive property, i.e., it violates "Bell's inequality" for special relativity. Compare the above with the actual Bell's inequality for entangled spin measurements in quantum mechanics. Two measurements of the separate components of an entangle pair may be taken at different orientations, say at the angles A and C, relative to the presumptive common spin axis of the pair, as shown below:

We then determine the correlations between the results for various combinations of measurement angles at the two ends of the experiment. Just as in the case of frequency measurements taken at two different boost angles, the classical expectation is that the correlation between the results will depend on the two measurement angles relative to some reference direction established by the mechanism. But again we find that the

correlations actually depend only on the single difference between angles A and C, not on their two individual values relative to some underlying reference. The close parallel between the “boost inequalities” in special relativity and the Bell inequalities for spin measurements in quantum mechanics is more than just superficial. In both cases we find that the assumption of an absolute frame (angular or translational) leads us to expect a linear relation between observable qualities, and in both cases it turns out that in fact only the relations between one realized event and another, rather than between a realized event and some absolute reference, govern the outcomes. Recall from Section 9.5 that the correlation between the spin measurements (of entangled spin-1/2 particles) is simply -cos(θ) where θ is the relative spatial angle between the two measurements. The usual presumption is that the measurement devices are at rest with respect to each other, but if they have some non-zero relative velocity v, we can represent the "boost" as a complex rotation through an angle ϕ = arctanh(v) where arctanh is the inverse hyperbolic tangent (see Part 6 of the Appendix). By analogy, we might expect the "correlation" between measurements performed with respect to two basis systems with this relative angle would be

which of course is Lorentz-Fitzgerald factor that scales the transformation of space and time intervals from one system of inertial coordinates to another, leading to the relativistic Doppler effect, and so on. In other words, this factor represents the projection of intervals in one frame onto the basis axes of another frame, just as the correlation between the particle spin measurements is the projection of the spin vector onto the respective measurement bases. Thus the "mysterious" and "spooky" correlations of quantum mechanics can be placed in close analogy with the time dilation and length contraction effects of special relativity, which once seemed equally counterintuitive. The spinor representation, which uses complex numbers to naturally combine spatial rotations and "boosts" into a single elegant formalism, was discussed in Section 2.6. In this context we can formulate a generalized "EPR experiment" allowing the two measurement bases to differ not only in spatial orientation but also by a boost factor, i.e., by a state of relative motion. The resulting unified picture shows that the peculiar aspects of quantum mechanics can, to a surprising extent, be regarded as aspects of special relativity. In a sense, relativity and quantum theory could be summarized as two different strategies for accommodating the peculiar wave-particle duality of physical phenomena. One of the problems this duality presented to classical physics was that apparently light could either be treated as an inertial particle emitted at a fixed speed relative to the source, ala Newton and Ritz, or it could be treated as a wave with a speed of propagation fixed relative to the medium and independent of the source, ala Maxwell. But how can it be both? Relativity essentially answered this question by proposing a unified spacetime structure with an indefinite metric (viz, a pseudo-Riemannian metric). This is sometimes described by saying time is imaginary, so its square contributes negatively to the line element, and yields an invariant null-cone structure for light propagation, yielding invariant light

speed. But waves and particles also differ with regard to interference effects, i.e., light can be treated as a stream of inertial particles with no interference (though perhaps "fits and starts) ala Newton, or as a wave with fully wavelike interference effects, ala Huygens. Again the question was how to account for the fact that light exhibits both of these characteristics. Quantum mechanics essentially answered this question by proposing that observables are actually expressible in terms of probability amplitudes, and these amplitudes contain an imaginary component which, upon taking the norm, can contribute negatively to the probabilities, yielding interference effects. Thus we see that both of these strategies can be expressed in terms of the introduction of imaginary (in the mathematical sense) components in the descriptions of physical phenomena, yielding the possibility of cancellations in, respectively, the spacetime interval and superposition probabilities (i.e., interference). They both attempt to reconcile aspects of the wave-particle duality of physical entities. The intimate correspondence between relativity and quantum theory was not lost on Niels Bohr, who remarked in his Warsaw lecture in 1938 Even the formalisms, which in both theories within their scope offer adequate means of comprehending all conceivable experience, exhibit deep-going analogies. In fact, the astounding simplicity of the generalisation of classical physical theories, which are obtained by the use of multidimensional [nonpositive-definite] geometry and non-commutative algebra, respectively, rests in both cases essentially on the introduction of the conventional symbol sqrt(-1). The abstract character of the formalisms concerned is indeed, on closer examination, as typical of relativity theory as it is of quantum mechanics, and it is in this respect purely a matter of tradition if the former theory is considered as a completion of classical physics rather than as a first fundamental step in the thorough-going revision of our conceptual means of comparing observations, which the modern development of physics has forced upon us. Of course, Bernhardt Riemann, who founded the mathematical theory of differential geometry that became general relativity, also contributed profound insights to the theory of complex functions, the Riemann sphere (Section 2.6), Riemann surfaces, and so on. (Here too, as in the case of differential geometry, Riemann built on and extended the ideas of Gauss, who was among the first to conceive of the complex number plane.) More recently, Roger Penrose has argued that some “complex number magic” seems to be at work in many of the most fundamental physical processes, and his twistor formalism is an attempt to find a framework for physics that exploits this the special properties of complex functions at a fundamental level. Modern scientists are so used to complex numbers that, in some sense, the mystery is now reversed. Instead of being surprised at the physical manifestations of imaginary and complex numbers, we should perhaps wonder at the preponderance of realness in the world. The fact is that, although the components of the state vector in quantum mechanics

are generally complex, the measurement operators are all required – by fiat – to be Hermitian, meaning that they have strictly real eigenvalues. In other words, while the state of a physical system is allowed to be complex, the result of any measurement is always necessarily real. So we can’t claim that nature is indifferent to the distinction between real and imaginary numbers. This suggests to some people a connection between the “measurement problem” in quantum mechanics and the ontological status of imaginary numbers. The striking similarity between special relativity and quantum mechanics can be traced to the fact that, in both cases, two concepts that were formerly regarded as distinct and independent are found not to be so. In the case of special relativity, the two concepts are space and time, whereas in quantum mechanics the two concepts are position and momentum. Not surprisingly, these two pairs of concepts are closely linked, with space corresponding to position, and time corresponding to momemtum (the latter representing the derivative of position with respect to time). Considering the Heisenberg uncertainty relation, it’s tempting to paraphrase Minkowski’s famous remark, and say that henceforth position by itself, and momentum by itself, are doomed to fade away into mere shadows, and only a kind of union of the two will preserve an independent reality. 9.9 Locality and Temporal Asymmetry All these fifty years of conscious brooding have brought me no nearer to the answer to the question, 'What are light quanta?' Nowadays every Tom, Dick and Harry thinks he knows it, but he is mistaken. Einstein, 1954 We've seen that the concept of locality plays an important role in the EPR thesis and the interpretation of Bell's inequalities, but what precisely is the meaning of locality, especially in a quasi-metric spacetime in which the triangle inequality doesn't hold? The general idea of locality in physics is based on some concept of nearness or proximity, and the assertion that physical effects are transmitted only between suitably "nearby" events. From a relativistic standpoint, locality is often defined as the proposition that all causal effects of a particular event are restricted to the interior (or surface) of the future null cone of that event, which effectively prohibits communication between spacelikeseparated events (i.e., no faster-than-light communication). However, this restriction clearly goes beyond a limitation based on proximity, because it specifies the future null cone, thereby asserting a profound temporal asymmetry in the fundamental processes of nature. What is the basis of this asymmetry? It certainly is not apparent in the form of the Minkowski metric, nor in Maxwell's equations. In fact, as far as we know, all the fundamental processes of nature are perfectly time-symmetric, with the single exception of certain processes involving the decay of neutral kaons. However, even in that case, the original experimental evidence in 1964 for violation of temporal symmetry was actually a

demonstration of asymmetry in parity and charge conjugacy, from which temporal asymmetry is indirectly inferred on the basis of the CPT Theorem. As recently as 1999 there were still active experimental efforts to demonstrate temporal asymmetry directly. In any case, aside from the single rather subtle peculiarity in the behavior of neutral kaons, no one has ever found any evidence at all of temporal asymmetry in any fundamental interaction. How, then, do we justify the explicit temporal asymmetry in our definition of locality for all physical interactions? As an example, consider electromagnetic interactions, and recall that the only invariant measure of proximity (nearness) in Minkowski spacetime is the absolute interval

which is zero between the emission and absorption of a photon. Clearly, any claim that influence can flow from the emission event to the absorption event but not vice versa cannot be based on an absolute concept of physical nearness. Such a claim amounts to nothing more or less than an explicit assertion of temporal asymmetry for the most fundamental interactions, despite the complete lack of justification or evidence for such asymmetry in photon interactions. Einstein commented on the unnaturalness of irreversibility in fundamental interactions in a 1909 paper on electromagnetic radiation, in which he argued that the asymmetry of the elementary process of radiation according to the classical wave theory of light was inconsistent with what we know of other elementary processes. While in the kinetic theory of matter there exists an inverse process for every process in which only a few elementary particles take part (e.g., for every molecular collision), according to the wave theory this is not the case for elementary radiation processes. According to the prevailing theory, an oscillating ion produces an outwardly propagated spherical wave. The opposite process does not exist as an elementary process. It is true that the inwardly propagated spherical wave is mathematically possible, but its approximate realization requires an enormous number of emitting elementary structures. Thus, the elementary process of light radiation as such does not possess the character of reversibility. Here, I believe, our wave theory is off the mark. Concerning this point the Newtonian emission theory of light seems to contain more truth than does the wave theory, since according to the former the energy imparted at emission to a particle of light is not scattered throughout infinite space but remains available for an elementary process of absorption. In the same paper he wrote For the time being the most natural interpretation seems to me to be that the occurence of electromagnetic fields of light is associated with singular points just like the occurence of electrostatic fields according to the electron theory. It is not out of the question that in such a theory the entire energy of the electromagnetic field might be viewed as localized in these singularities, exactly like in the old

theory of action at a distance. This is a remarkable statement coming from Einstein, considering his deep commitment to the ideas of locality and the continuum. The paper is also notable for containing his premonition about the future course of physics: Today we must regard the ether hypothesis as an obsolete standpoint. It is undeniable that there is an extensive group of facts concerning radiation that shows that light possesses certain fundamental properties that can be understood far more readily from the standpoint of Newton's emission theory of light than from the standpoint of the wave theory. It is therefore my opinion that the next stage in the development of theoretical physics will bring us a theory of light that can be understood as a kind of fusion of the wave and emission theories of light. Likewise in a brief 1911 paper on the light quantum hypothesis, Einstein presented reasons for believing that the propagation of light consists of a finite number of energy quanta which move without dividing, and can be absorbed and generated only as a whole. Subsequent developments (quantum electrodynamics) have incorporated these basic insights, leading us to regard a photon (i.e., an elementary interaction) as an indivisible whole, including the null-separated emission and absorption events on a symmetrical footing. This view is supported by the fact that once a photon is emitted, its quantum phase does not advance while "in flight", because quantum phase is proportional to the absolute spacetime interval, which, as discussed in Section 2.1, is what gives the absolute interval its physical significance. If we take seriously the spacetime interval as the absolute measure of proximity, then the transmission of a photon is, in some sense, a single event, coordinated mutually and symmetrically between the points of emission and absorption. This image of a photon as a single unified event with a coordinated emission and absorption seems unsatisfactory to many people, partly because it doesn't allow for the concept of a "free photon", i.e., a photon that was never emitted and is never absorbed. However, it's worth remembering that we have no direct experience of "free photons", nor of any "free particles", because ultimately all our experience is comprised of completed interactions. (Whether this extends to gravitational interactions is an open question.) Another possible objection to the symmetrical view of elementary interactions is that it doesn't allow for a photon to have wave properties, i.e., to have an evolving state while "in flight", but this objection is based on a misconception. From the standpoint of quantum electrodynamics, the wave properties of electromagnetic radiation are actually wave properties of the emitter. All the potential sources of a photon have a certain (complex) amplitude for photon emission, and this amplitude evolves in time as we progress along the emitter's worldline. However, as noted above, once a photon is emitted, its phase does not advance. In a sense, the ancients who conceived of sight as something like a blind man's incompressible cane, feeling distant objects, were correct, because our retinas actually are in "direct" contact, via null intervals, with the sources of light. The null interval plays the role of the incompressible cane, and the wavelike properties we "feel" are really the advancing quantum phases of the source.

One might think that the reception amplitude for an individual photon must evolve as a function of its position, because if we had (contra-factually) encountered a particular photon one meter further away from its source than we did, we would surely have found it with a different phase. However, this again is based on a misconception, because the photon we would have received one meter further away (on the same timeslice) would necessarily have been emitted one light-meter earlier, carrying the corresponding phase of the emitter at that point on its worldline. When we consider different spatial locations relative to the emitter, we have to keep clearly in mind which points they correspond to along the worldline of the emitter. Taking another approach, it might seem that we could "look at" a single photon at different distances from the emitter (trying to show that its phase evolves in flight) by receding fast enough from the emitter so that the relevant emission event remains constant, but of course the only way to do this would be to recede at the speed of light (i.e., along a null interval), which isn't possible. This is just a variation of the young Einstein's thought experiment about how a "standing wave" of light would appear to someone riding along side it. The answer, of course, is that it’s not possible for a material object to move along-side a pulse of light (in vacuum), because light exists only as completed interactions on null intervals. If we attempted such an experiment, we would notice that, as our speed of recession from the source gets closer to c, the difference between the phases of the photons we receive becomes smaller (i.e., the "frequency" of the light gets red-shifted), and approaches zero, which is just what we should expect based on the fact that each photon is simply the lightlike null projection of the emitter's phase at a point on the emitter's worldline. Hence, if we stay on the same projection ray (null interval), we are necessarily looking at the same phase of the emitter, and this is true everywhere on that null ray. This leads to the view that the concept of a "free photon" is meaningless, and a photon is nothing but the communication of an emitter event's phase to some null-separated absorber event, and vice versa. More generally, since the Schrodinger wave function propagates at c, it follows that every fundamental quantum interaction can be regarded as propagating on null surfaces. Dirac gave an interesting general argument for this strong version of Huygens' Principle in the context of quantum mechanics. In his "Principles of Quantum Mechanics" he noted that a measurement of a component of the instantaneous velocity of a free electron must give the value c, which implies that electrons (and massive particles in general) always propagate along null intervals, i.e., on the local light cone. At first this may seem to contradict the fact that we observe massive objects to move at speeds much less than the speed of light, but Dirac points out that observed velocities are always average velocities over appreciable time intervals, whereas the equations of motion of the particle show that its velocity oscillates between +c and -c in such a way that the mean value agrees with the average value. He argues that this must be the case in any relativistic theory that incorporates the uncertainty principle, because in order to measure the velocity of a particle we must measure its position at two different times, and then divide the change in position by the elapsed time. To approximate as closely as possible to the instantaneous velocity, the time interval must go to zero, which implies that the position measurements

must approach infinite precision. However, according to the uncertainty principle, the extreme precision of the position measurement implies an approach to infinite indeterminancy in the momentum, which means that almost all values of momentum from zero to infinity - become equally probable. Hence the momentum is almost certainly infinite, which corresponds to a speed of c. This is obviously a very general argument, and applies to all massive particles (not just fermions). This oscillatory propagation on null cones is discussed further in Section 9.11. Another argument that seems to favor a temporally symmetric view of fundamental interactions comes from consideration of the exchange of virtual photons. (Whether virtual particles deserve to be called "real" particles is debatable; many people prefer to regard them only as sometimes useful mathematical artifacts, terms in the expansion of the quantum field, with no ontological status. On the other hand, it's possible to regard all fundamental particles that way, so in this respect virtual particles are not unique.) The emission and absorption points of virtual particles may be space-like separated, and we therefore can't say unambiguously that one happened "before" the other. The temporal order is dependent on the reference frame. Surely in these circumstances, when it's not even possible to say absolutely which side of the interaction was the emission and which was the absorption, those who maintain that fundamental interactions possess an inherent temporal asymmetry have a very difficult case to make. Over limited ranges, a similar argument applies to massive particles, since there is a non-negligible probability of a particle traversing a spacelike interval if it's absolute magnitude is less than about h2/(2π m)2, where h is Planck's constant and m is the mass of the particle. So, if virtual particle interactions are time-symmetric, why not all fundamental particle interactions? (Needless to say, time-symmetry of fundamental quantum interactions does not preclude asymmetry for macroscopic processes involving huge numbers of individual quantum interactions evolving from some, possibly very special, boundary conditions.) Experimentally, those who argue that the emission of a photon is conditioned by its absorption can point to the results from tests of Bell's inequalities, because the observed violations of those inequalities are exactly what the symmetrical model of interactions would lead us to expect. Nevertheless, the results of those experiments are rarely interpreted as lending support to the symmetrical model, apparently because temporal asymmetry is so deeply ingrained in peoples' intuitive conceptions of locality, despite the fact that there is very little (if any) direct evidence of temporal asymmetry in any fundamental laws or interactions. Despite the preceding arguments in favor of symmetrical (reversible) fundamental processes, there are clearly legitimate reasons for being suspicious of unrestricted temporal symmetry. If it were possible for general information to be transmitted efficiently along the past null cone of an event, this would seem to permit both causal loops and causal interactions with spacelike-separated events, as illustrated below.

On such a basis, it might seem as if the Minkowskian spacetime manifold would be incapable of supporting any notion of locality at all. That triangle inequality fails in this manifold, so there are null paths connecting every two points, and this applies even to spacelike separated points if we allow the free flow of information in either direction along null surfaces. Indeed this seems to have been the main source of Einstein’s uneasiness with the “spooky” entanglements entailed by quantum theory. In a 1948 letter to Max Born, Einstein tried to clearly articulate his concern with entanglement, which he regarded as incompatible with “the confidence I have in the relativistic group as representing a heuristic limiting principle”. It is characteristic of physical objects [in the world of ideas] that they are thought of as arranged in a space-time continuum. An essential aspect of this arrangement of things in physics is that they lay claim, at a certain time, to an existence independent of one another, provided these objects ‘are situated in different parts of space’. Unless one makes this kind of assumption about the independence of the existence (the 'being-thus') of objects which are far apart from one another in space… the idea of the existence of (quasi) isolated systems, and thereby the postulation of laws which can be checked empirically in the accepted sense, would become impossible. In essence, he is arguing that without the assumption that it is possible to localize physical systems, consistent with the relativistic group, in such a way that they are causally isolated, we cannot hope to analyze events in any effective way, such that one thing can be checked against another. After describing how quantum mechanics leads unavoidably to entanglement of potentially distant objects, and therefore dispenses with the principle of locality (in Einstein’s view), he says When I consider the physical phenomena known to me, even those which are being so successfully encompased by quantum mechanics, I still cannot find any fact anywhere which would make it appear likely that the requirement [of localizability] will have to be abandoned. At this point the precise sense in which quantum mechanics entail non-classical “influences” (or rather, correlations) for space-like separated events had not yet been clearly formulated, and the debate between Born and Einstein suffered (on both sides)

from this lack of clarity. Einstein seems to have intuited that quantum mechanics does indeed entail distant correlations that are inconsistent with very fundamental classical notions of causality and independence, but he was unable to formulate those correlations clearly. For his part, Born outlined a simple illustration of quantum correlations occuring in the passage of light rays through polarizing filters – which is exactly the kind of experiment that, twenty years later, provided an example of the very thing that Einstein said he had been unable to find, i.e., a fact which makes it appear that the requirement of localizability must be abandoned. It’s unclear to what extent Born grasped the nonclassical implications of those phenomena, which isn’t surprising, since the Bell inequalities had not yet been formulated. Born simply pointed out that quantum mechanics allows for coherence, and said that “this does not go too much against the grain with me”. Born often argued that classical mechanics was just as probabilistic as quantum mechanics, although his focus was on chaotic behavior in classical physics, i.e., exponential sensitivity to initial conditions, rather than on entanglement. Born and Einstein often seemed to be talking past each other, since Born focused on the issue of determinism, whereas Einstein’s main concern was localizability. Remarkably, Born concluded his reply by saying I believe that even the days of the relativistic group, in the form you gave it, are numbered. One might have thought that experimental confirmation of quantum entanglement would have vindicated Born’s forecast, but we now understand that the distant correlations implied by quantum mechanics (and confirmed experimentally) are of a subtle kind that do not violate the “relativistic group”. This seems to be an outcome that neither Einstein nor Born anticipated; Born was right that the distant entanglement implicit in quantum mechanics would be proven correct, but Einstein was right that the relativistic group would emerge unscathed. But how is this possible? Considering that non-classical distant correlations have now been experimentally established with high confidence, thereby undermining the classical notion of localizability, how can we account for the continued ability of physicists to formulate and test physical laws? The failure of the triangle inequality (actually, the reversal of it) does not necessarily imply that the manifold is unable to support non-trivial structure. There are absolute distinctions between the sets of null paths connecting spacelike separated events and the sets of null paths connecting timelike separated events, and these differences might be exploited to yield a structure that conforms with the results of observation. There is no reason this cannot be a "locally realistic" theory, provided we understand that locality in a quasi-metric manifold is non-transitive. Realism is simply the premise that the results of our measurements and observations are determined by an objective world, and it's perfectly possible that the objective world might possess a non-transitive locality, commensurate with the non-transitive metrical aspects of Minkowski spacetime. Indeed, even before the advent of quantum mechanics and the tests of Bell's inequality, we should have learned from special relativity that locality is not transitive, and this should have led

us to expect non-Euclidean connections and correlations between events, not just metrically, but topologically as well. From this point of view, many of the seeming paradoxes associated with quantum mechanics and locality are really just manifestations of the non-intuitive fact that the manifold we inhabit does not obey the triangle inequality (which is one of our most basic spatio-intuitions), and that elementary processes are temporally reversible. On the other hand, we should acknowledge that the Bell correlations can't be explained in a locally realistic way simply by invoking the quasi-metric structure of Minkowski spacetime, because if the timelike processes of nature were ontologically continuous it would not be possible to regard them as propagating on null surfaces. We also need our fundamental physical processes to consist of irreducible discrete interactions, as discussed in Section 9.10. 9.10 Spacetime Mediation of Quantum Interactions No reasonable definition of reality could be expected to permit this. Einstein, Podolsky, and Rosen, 1935 According to general relativity the shape of spacetime determines the motions of objects while those objects determine (or at least influence) the shape of spacetime. Similarly in electrodynamics the fields determine the motions of charges in spacetime while the charges determine the fields in spacetime. This dualistic structure naturally arises when we replace action-at-a-distance with purely local influences in such a way that the interactions between "separate" objects are mediated by an entity extending between them. We must then determine the dynamical attributes of this mediating entity, e.g., the electromagnetic field in electrodynamics, or spacetime itself in general relativity. However, many common conceptions regarding the nature and extension of these mediating entities are called into question by the apparently "non-local" correlations in quantum mechanics, as highlighted by EPR experiments. The apparent non-locality of these phenomena arises from the fact that although we regard spacetime as metrically Minkowskian, we continue to regard it as topologically Euclidean. As discussed in the preceding sections, the observed phenomena are more consistent with a completely Minkowskian spacetime, in which physical locality is directly induced by the pseudometric of spacetime. According to this view, spacetime operates on matter via interactions, and matter defines for spacetime the set of allowable interactions, i.e., consistent with conservation laws. A quantum interaction is considered to originate on (or be "mediated" by) the locus of spacetime points that are null-separated from each of the interacting sites. In general this locus is a quadratic surface in spacetime, and its surface area is inversely proportional to the mass of the transferred particle. For two timelike-separated events A and B the mediating locus is a closed surface as illustrated below (with one of the spatial dimensions suppressed)

The mediating surface is shown here as a dotted circle, but in 4D spacetime it's actually a closed surface, spherical and purely spacelike relative to the frame of the interval AB. This type of interaction corresponds to the transit of massive real particles. Of course, relative to a frame in which A and B are in different spatial locations, the locus of intersection has both timelike and spacelike extent, and is an ellipse (or rather an ellipsoidal surface in 4D) as illustrated below

The surface is purely spacelike and isotropic only when evaluated relative to its rest frame (i.e., the frame of the interval AB), whereas this surface maps to a spatial ellipsoid, consisting of points that are no longer simultaneous, relative to any other co-moving frame. The directionally asymmetric aspects of the surface area correspond precisely to the "relativistic mass" components of the corresponding particles as a function of the relative velocity of the frames. The propagation of a free massive particle along a timelike path through spacetime can be regarded as involving a series of surfaces, from which emanate inward-going "waves" along the nullcones in both the forward and backward direction, deducting the particle from the past focal point and adding it to the future focal point, as shown below for particles with different masses.

Recall that the frequency υ of the de Broglie matter wave of a particle of mass m is

where px, py, pz are the components of momentum in the three directions. For a (relatively) stationary particle the momentums vanish and the frequency is just υ = (mc2)/h sec-1. Hence the time per cycle is inversely proportional to the mass. So, since each cycle consists of an advanced and a retarded cone, the surface of intersection is a sphere (for a stationary mass particle) of radius r = h/mc, because this is how far along the null cones the wave propagates during one cycle. Of course, h/mc is just the Compton scattering wavelength of a particle of mass m, which characterizes the spatial expanse over which a particle tends to "scatter" incident photons in a characteristic way. This can be regarded as the effective size of a particle when "viewed" by means of gamma-rays. We may conceive of this effect being due to a high-energy photon getting close enough to the nominal worldline of the massive particle to interfere with the null surfaces of propagation, upsetting the phase coherence of the null waves and thereby diverting the particle from it's original path. For a massless particle the quantum phase frequency is zero, and a completely free photon (if such a thing existed) would just be represented by an entire null-cone. On the other hand, real photons are necessarily emitted and absorbed, so they corresponds to bounded null intervals. Consistent with quantum electrodynamics, the quantum phase of photon does not advance while in transit between its emission and absorption (unlike massive particles). According to this view, the oscillatory nature of macroscopic electromagnetic waves arises from the advancing phase of the source, rather than from any phase activity of an actual photon. The spatial volume swept out by a mediating surface is a maximum when evaluated with respect to it's rest frame. When evaluated relative to any other frame of reference, the spatial contraction causes the swept volume to be reduced. This is consistent with the idea that the effective mass of a particle is inversely proportional to the swept volume of the propagating surface, and it's also consistent with the effective range of mediating particles being inversely proportional to their mass, since the electromagnetic force

mediated by massless photons has infinite range, whereas the strong nuclear force has a very limited range because it is mediated by massive particles. Schematics of a stationary and a moving particle are shown below.

This is the same illustration that appeared in the discussion of Lorentz's "corresponding states" in Section 1.5, although in that context the shells were understood to be just electromagnetic waves, and Lorentz simply conjectured that all physical phenomena conform to this same structure and transform similarly. In a sense, the relativistic Schrodinger wave equation and Dirac's general argument for light-like propagation of all physical entities based on the combination of relativity and quantum mechanics (as discussed in Section 9.10) provide the modern justification for Lorentz's conjecture. Looking back even further, we see that by conceiving of a particle as a sequence of surfaces of finite extent, it is finally possible to answer Zeno's question about how a moving particle differs from a stationary particle in "a single instant". The difference is that the mediating surfaces of a moving particle are skewed in spacetime relative to those of a stationary particle, corresponding to their respective planes of simultaneity. Some quantum interactions involve more than two particles. For example, if two coupled particles separate at point A and interact with particles at points B and C respectively, the interaction (viewed straight from the side) looks like this:

The mediating surface for the pair AB intersects with the mediating surface for AC at the two points of intersection of the dotted circles, but in full 4D spacetime the intersection of the two mediating spheres is a closed circle. (It's worth noting that these two surfaces intersect if and only if B and C are spacelike separated. This circle enforces a particular kind of consistency on any coherent waves that are generated on the two mediating surfaces, and are responsible for "EPR" type correlation effects.) The locus of null-separated points for two lightlike-separated events is a degenerate quadratic surface, namely, a straight line as represented by the segment AB below:

The "surface area" of this locus (the intersection of the two cones) is necessarily zero, so these interactions represent the transits of massless particles. For two spacelike-separated events the mediating locus is a two-part hyperboloid surface, represented by the hyperbola shown at the intersection of two null cones below

This hyperboloid surface has infinite area, which suggests that any interaction between spacelike separated events would correspond to the transit of an infinitely massive particle. On this basis it seems that these interactions can be ruled out. There is, however, a limited sense in which such interactions might be considered. Recall that a pseudosphere can be represented as a sphere with purely imaginary radius. It's conceivable that observed interactions involving virtual (conjugate) pairs of particles over spacelike intervals (within the limits imposed by the uncertainty relations) may correspond to hyperboloid mediating surfaces. (It's also been suggested that in a closed universe the "open" hyperboloid surfaces might need to be regarded as finite, albeit extremely huge. For example, they might be 35 orders of magnitude larger than the mediating surfaces for timelike interactions. This is related to vague notions that "h" is in some sense the "inverse" of the size of a finite universe. In a much smaller closed universe (as existed immediately following the big bang) there may be have been an era in which the "hyperboloid" surfaces had areas comparable to the ellipsoid surfaces, in which case the distinction between spacelike and time-like interactions would have been less significant.) An interesting feature of this interpretation is that, in addition to the usual 3+1 dimensions, spacetime requires two more "curled up" dimensions of angular orientation to represent the possible directions in space. The need to treat these as dimensions in their own right arises from the non-transitive topology of the pseudo-Riemannian manifold. Each point [t,x,y,z] actually consists of a two-dimensional orientation space, which can be parameterized (for any fixed frame) in terms of ordinary angular coordinates θ and ϕ. Then each point in the six-dimensional space with coordinates [x,y,z,t,θ,ϕ] is a terminus for a unique pair of spacetime rays, one forward and one backward in time. A simple mechanistic visualization of this situation is to imagine a tiny computer at each of these points, reading its input from the two rays and sending (matched conservative) outputs on

the two rays. This is illustrated below in the xyt space:

The point at the origin of these two views is on the mediating surface of events A and B. Each point in this space acts purely locally on the basis of purely local information. Specifying a preferred polarity for the two null rays terminating at each point in the 6D space, we automatically preclude causal loops and restrict information flow to the future null cone, while still preserving the symmetry of wave propagation. (Note that an essential feature of spacetime mediation is that both components of a wave-pair are "advanced", in the sense that they originate on a spherical surface, one emanating forward and one backward in time, but both converge inward on the particles involved in the interaction. According to this view, the "unoccupied points" of spacetime are elements of the 6D space, whereas an event or particle is an element of the 4D space (t,x,y,z). If effect an event is the union of all the pairs of rays terminating at each point (x,y,z). We saw in Chapter 3.5 that the transformations of θ and ϕ under Lorentzian boosts are beautifully handled by linear fractional functions applied to their stereometric mappings on the complex plane. One common objection to the idea that quantum interactions occur locally between nullseparated points is based on the observation that, although every point on the mediating surface is null-separated from each of the interacting events, they are spacelike-separated from each other, and hence unable to communicate or coordinate the generation of two equal and opposite outgoing quantum waves (one forward in time and one backward in time). The answer to this objection is that no communication is required, because the "coordination" arises naturally from the context. The points on the mediating locus are not communicating with each other, but each of them is in receipt of identical bits of information from the two interaction events A and B. Each point responds independently based on its local input, but the combined effect of the entire locus responding to the same information is a coherent pair of waves. Another objection to the "spacetime mediation" view of quantum mechanics is that it relies on temporally symmetric propagation of quantum waves. Of course, this objection

can't be made on strictly mathematical grounds, because both Maxwell's equations and the (relativistic) Schrodinger equation actually are temporally symmetric. The objection seems to be motivated by the idea that the admittance of temporally symmetric waves automatically implies that every event is causally implicated in every other event, if not directly by individual interactions then by a chain of interactions, resulting in a nonsensical mess. However, as we've seen, the spacetime mediation view leads naturally to the conclusion that interactions between spacelike-separated events are either impossible or else of a very different (virtual) character than interactions along time-like intervals. Moreover, the stipulation of a preferred polarity for the ray pairs terminating at each point is sufficient to preclude causal loops. Conclusion I have made no more progress in the general theory of relativity. The electric field still remains unconnected. Overdeterminism does not work. Nor have I produced anything for the electron problem. Does the reason have to do with my hardening brain mass, or is the redeeming idea really so far away? Einstein to Ehrenfest, 1920

Despite the spectacular success of Einstein's theory of relativity, it is sometimes said that tests of Bell's inequalities and similar quantum phenomena have demonstrated that nature is, on a fundamental level, incompatible with the local realism on which relativity is based. However, as we saw in Section 9.7, Bell's inequalities apply only to strictly nondeterministic theories, so, as Bell himself noted, they do not preclude "local realism" for a fully deterministic theory. The entire framework of classical relativity, with its unified spacetime and partial ordering of events, is founded on a strictly deterministic basis, so Bell's inequalities do not apply. Admittedly the phenomena of quantum mechanics are incompatible with at least some aspect of our classical (metrical) idea of locality, but this should not be surprising, because (as discussed in the preceding sections) our metrical idea of locality is already inconsistent with the pseudo-Riemannian metrical structure of spacetime itself, which forms the basis of modern relativity. It's tempting to conclude that while modern relativity initiated a revolution in our thinking about the (pseudo-Riemannian) metrical structure of spacetime, with its singular null rays and non-transitive equivalencies, the concomitant revolution in our thinking about the topology of spacetime has lagged behind. Although we long ago decided that the physically measurable intervals between the events of spacetime cannot be accurately represented as the distances between the points of a Euclidean metric space, we continue to assume that the topology of the set of spacetime events is (locally) Euclidean. This incongruous state of affairs may be due in part to the historical circumstance that Einstein's special relativity was originally viewed as simply an elegant interpretation of the existing Lorentz ether theory. According to Lorentz, spacetime really was a Euclidean manifold with the metric and topology of E4, on top of which was superimposed a set of functions representing the operational temporal and spatial components of intervals. It was possible to conceive of this because the singularities in the mapping between the

"real" and "operational" components along null directions implied by the Minkowski line element were not necessarily believed to be physical. The validity of Lorentz invariance was just being established "one order at a time", and it wasn't clear that it would be valid to all orders. The situation was somewhat akin to the view of some people today, who believe that although the field equations of general relativity predict a genuine singularity at the center of a black hole, we may imagine that somehow the laws break down at some point, or some other unknown effect takes over and the singularity is averted. Around 1905 people could think similar things about the implied singularity in the full n-order Lorentz-Fitzgerald mapping between Lorentz's "real spacetime" and his operational electromagnetic spacetime, i.e., they could imagine that the Lorentz invariance might break down at some point short of the singularities. On this basis, we can make sense of continuing to use the topology of E4. The original Euclidean topology of Lorentz's absolute spacetime still lurks just beneath the surface of modern relativity. However, if we make the judgement that Lorentz invariance applies strictly to all orders (as Poincare suggested and Einstein brashly asserted in 1905), and the light-like singularities of the Lorentz-Fitzgerald mapping are genuine physical singularities, albeit in some unfamiliar non-transitive sense, and if we thoroughly disavow Lorentz's underlying "real spacetime" (which plays no role in the theory) and treat the "operational spacetime" itself as the primary ontological entity, then there seems reason to question whether the assumption of E4 topology is still suitable. This is particularly true if a topology more in accord with Lorentz invariance would also help to clarify some of the puzzling phenomena of quantum mechanics. Of course, it's entirely possible that the theory of relativity is simply wrong on some fundamental level where quantum mechanics "takes over". In fact, this is probably the majority view among physicists today, who hope that eventually a theory uniting gravity and quantum mechanics will be found which will explain precisely how and in what circumstances the classical theory of relativity fails to accurately represent the operations of nature, while at the same time explaining why it seems to work as well as it does. However, it may be worthwhile to remember previous periods in the history of physics when the principle of relativity was judged to be fundamentally inadequate to account for the observed phenomena. Recall Ptolemy's arguments against a moving Earth, or the 19th century belief that electromagnetism necessitated a luminiferous ether, or the early-20th century view that special relativity could never be reconciled with gravity. In each case a truly satisfactory resolution of the difficulties was eventually achieved, not by discarding relativity, but by re-interpreting and extending it, thereby gaining a fuller understanding of its logical content and consequences. Appendix: Mathematical Miscellany 1. Vector Products The dot and cross products are often introduced via trigonometric functions and/or matrix operations, but they also arise quite naturally from simple considerations of Pythagoras'

theorem. Given two points a and b in the three-dimensional vector space with Cartesian coordinates (ax,ay,az) and (bx,by,bz) respectively, the squared distance between these two points is

If (and only if) these two vectors are perpendicular, the distance between them is the hypotenuse of a right triangle with edge lengths equal to the lengths of the two vectors, so we have

if and only if a and b are perpendicular. Equating these two expressions and canceling terms, we arrive at the necessary and sufficient condition for a and b to be perpendicular

This motivates the definition of the left hand quantity as the "dot product" (also called the scalar product) of the arbitrary vectors a = (ax,ay,az) and b = (bx,by,bz) as the scalar quantity

At the other extreme, suppose we seek an indicator of whether or not the vectors a and b are parallel. In any case we know the squared length of the vector sum of these two vectors is

We also know that S = |a| + |b| if and only if a and b are parallel, in which case we have

Equating these two expressions for S2, canceling terms, and squaring both sides gives the necessary and sufficient condition for a and b to be parallel

Expanding these expressions and canceling terms, this becomes

Notice that we can gather terms and re-write this equality as

Obviously a sum of squares can equal zero only if each term is individually zero, which of course was to be expected, because two vectors are parallel if and only if their components are in the same proportions to each other, i.e.,

which represents the vanishing of the three terms in the previous expression. This motivates the definition of the cross product (also known as the vector product) of two vectors a = (ax,ay,az) and b = (bx,by,bz) as consisting of those three components, ordered symmetrically, so that each component is defined in terms of the other two components of the arguments, as follows

By construction, this vector is null if and only if a and b are parallel. Furthermore, notice that the dot products of this cross product and each of the vectors a and b are identically zero, i.e.,

As we saw previously, the dot product of two vectors is 0 if and only if the vectors are perpendicular, so this shows that a  b is perpendicular to both a and b. There is, however, an arbitrary choice of sign, which is conventionally resolved by the "right-hand rule". It can be shown that if θ is the angle between a and b, then ab is a vector with magnitude |a||b|sin(θ) and direction perpendicular to both a and b, according to the righthand rule. Similarly the scalar ab equals |a||b|cos(θ). 2. Differentials In Chapter 5.2 we gave an intuitive description of differentials such as dx and dy as incremental quantities, but strictly speaking the actual values of differentials are arbitrary,

because only the ratios between them are significant. Differentials for functions of multiple variables are just a generalization of the usual definitions for functions of a single variable. For example, if we have z = f(x) then the differentials dz and dx are defined as arbitrary quantities whose ratio equals the derivative of f(x) with respect to x. Consequently we have dz/dx = f '(x) where f '(x) signifies the partial derivative z/x, so we can express this in the form

In this case the partial derivative is identical to the total derivative, because this f is entirely a function of the single variable x. If, now, we consider a differentiable function z = f(x,y) with two independent variables, we can expand this into a power series consisting of a sum of (perhaps infinitely many) terms of the form Axmyn. Since x and y are independent variables we can suppose they are each functions of a parameter t, so we can differentiate the power series term-by-term, with respect to t, and each term will contribute a quantity of the form

where, again, the differentials dx,dy,dz,dt are arbitrary variables whose ratios only are constrained by this relation. The coefficient of dy/dt is the partial derivative of Axmyn, with respect to y, and the coefficient of dx/dt is the partial with respect to x, and this will apply to every term of the series. So we can multiply through by dt to arrive at the result

The same approach can be applied to functions of arbitrarily many independent variables. A simple application of total differentials occurs in Section 3 of Einstein's 1905 paper "On the Electrodynamics of Moving Bodies". In the process of deriving the function τ (x',y,z,t) as part of the Lorentz transformation, Einstein arives at his equation 3.1

where I've replaced his "t" with t0 to emphasize that this is just the arbitrary value of t at the origin of the light pulse. At this point Einstein says "Hence, if x' be chosen infintesimally small," and then he writes his equation 3.2

Various explications of this step have appeared in the literature. For example, Miller says "Einstein took x' to be infintesimal and expanded both sides of [3.1] into a series in x'. Neglecting terms higher than first order the result is [3.2]." To put this differently, Einstein simply evaluated the total differentials of both sides of the equation. For any arbitrary continuous function τ(x',y,z,t) we have

Since the arguments of the first τ function on the left hand side of 3.1 are all constants, we have dx' = dy = dz = dt = 0, so it contributes nothing to the total differential of the left hand side. The arguments of the second τ function on the left are all constants except for the t argument, which equals

so we have

It follows that the total differential of the second τ function is

Likewise the total differential of the τ function on the right hand side of 3.1 is

So, equating the total differentials of the two sides of 3.1 gives

and dividing out the factor of dx' gives Einstein's equation 3.2. 3. Differential Operators

The standard differential operators are commonly expressed as formal "vector products" involving the  ("del") symbol, which is defined as

where ux, uy, uz are again unit vectors in the x,y,z directions. The scalar product of  with an arbitrary vector field V is called the divergence of V, and is written explicitly as

The vector product of  with an arbitrary vector field V is called the curl, given explicitly by

Note that the curl is applied to a vector field and returns a vector, whereas the divergence is applied to a vector field but returns a scalar. For completeness, we note that a scalar field Q(x,y,z) can be simply multiplied by the  operator to give a vector, called the gradient, as follows

Another common expression is the sum of the second derivatives of a scalar field with respect to the three directions, since this sum appears in the Laplace and Poisson equations. Using the "del" operator this can be expressed as the divergence of the gradient (or the "div grad") of the scalar field, as shown below.

For convenience, this operation is often written as 2, and is called the Laplacian operator. All the above operators apply to 3-vectors, but when dealing with 4-vectors in Minkowski spacetime the analog of the Laplacian operator is the d'Alembertian operator

4. Differentiation of Vectors and Tensors

The easiest way to understand the motivation for the definitions of absolute and covariant differentiation is to begin by considering the derivative of a vector field in threedimensional Euclidean space. Such a vector can be expressed in either contravariant or covariant form as a linear combination of, respectively, the basis vectors u1, u2, u3 or the dual basis vectors u1, u2, u3, as follows

where Ai are the contravariant components and Ai are the covariant components of A, and the two sets of basis vectors satisfy the relations

where gij and gij are the covariant and contravariant metric tensors. The differential of A can be found by applying the chain rule to either of the two forms, as follows

If the basis vectors ui and ui have a constant direction relative to a fixed Cartesian frame, then dui = dui = 0, so the second term on the right vanishes, and we are left with the familiar differential of a vector as the differential of its components. However, if the basis vectors vary from place to place, the second term on the right is non-zero, so we must not neglect this term if we are to allow curvilinear coordinates. As we saw in Part 2 of this Appendix, for any quantity Q = f(x) and coordinate xi we have

so we can substitute for the three differentials in (1) and re-arrange terms to write the resulting expressions as

Since these relations must hold for all possible combinations of dxi , the quantities inside parentheses must vanish, so we have the following relations between partial derivatives

If we now let Aij and Aij denote the projections of the ith components of (2a) and (2b) respectively onto the jth basis vector, we have

and it can be verified that these are the components of second-order tensors of the types indicated by their indices (superscripts being contravariant indices and subscripts being covariant indices). If we multiply through (using the dot product) each term of (2a) by ui, and each term of (2b) by ui, and recall that uiuj = δij, we have

For convenience we now define the three-index symbol

which is called the Christoffel symbol of the second kind. Although the Christoffel symbol is not a tensor, it is very useful for expressing results on a metrical manifold with a given system of coordinates. We also note that since the components of uiuj are constants (either 0 or 1), it follows that (uiuj)/xk = 0, and expanding this partial derivative by the chain rule we find that

Therefore, equations (3) can be written in terms of the Christoffel symbol as

These are the covariant derivatives of, respectively, the contravariant and covariant forms of the vector A. Obviously if the basis vectors are constant (as in Cartesian or oblique coordinate systems) the Christoffel symbols vanish, and we are left with just the first terms on the right sides of these equations. The second terms are needed only to account for the change in basis with position of general curvilinear coordinates. It might seem that these definitions of covariant differentiation depend on the fact that we worked in a fixed Euclidean space, which enabled us to assign absolute meaning to the components of the basis vectors in terms of an underlying Cartesian coordinate system. However, it can be shown that the Christoffel symbols we've used here are the same as the ones defined in Section 5.4 in the derivation of the extremal (geodesic) paths on a curved manifold, wholly in terms of the intrinsic metric coefficients gij and their partial derivatives with respect to the general coordinates on the manifold. This should not be surprising, considering that the definition of the Christoffel symbols given above was in terms of the basis vectors uj and their derivatives with respect to the general coordinates, and noting that the metric tensor is just gij = uiuj . Thus, with a bit of algebra we can show that

in agreement with Section 5.4. We regard equations (4) as the appropriate generalization of differentiation on an arbitrary Riemannian manifold essentially by formal analogy with the flat manifold case, by the fact that applying this operation to a tensor yields another tensor, and perhaps most importantly by the fact that in conjunction with the developments of Section 5.4 we find that the extremal metrical path (i.e., the geodesic path) between two points is given by using this definition of "parallel transport" of a vector pointed in the direction of the path, so the geodesic paths are locally "straight". Of course, when we allow curved manifolds, some new phenomena arise. On a flat manifold the metric components may vary from place to place, but we can still determine that the manifold is flat, by means of the Riemann curvature tensor described in Section 5.7. One consequence of flatness, obvious from the above derivation, is that if a vector is transported parallel to itself around a closed path, it assumes its original orientation when it returns to its original location. However, if the metric coefficients vary in such a way that the Riemann curvature tensor is non-zero, then in general a vector that has been transported parallel to itself around a closed loop will undergo a change in orientation. Indeed, Gauss showed that the amount of deflection experienced by a vector as a result of being parallel-transported around a closed loop is exactly proportional to the integral of the curvature over the enclosed region. The above definition of covariant differentiation immediately generalizes to tensors of any order. In general, the covariant derivative of a mixed tensor T consists of the ordinary partial derivative of the tensor itself with respect to the coordinates xk, plus a term involving a Christoffel symbol for each contravariant index of T, minus a term

involving a Christoffel symbol for each covariant index of T. For example, if r is a contravariant index and s is a covariant index, we have

It's convenient to remember that each Christoffel symbol in this expression has the index of xk in one of its lower positions, and also that the relevant index from T is carried by the corresponding Christoffel symbol at the same level (upper or lower), and the remaining index of the Christoffel symbol is a dummy that matches with the relevant index position in T. One very important result involving the covariant derivative is known as Ricci's Theorem. The covariant derivative of the metric tensor is gij is

If we substitute for the Christoffel symbols from equation (5), and recall that

we find that all the terms cancel out and we're left with gij,k = 0. Thus the covariant derivative of the metric tensor is identically zero, which is what prompted Einstein to identify it with the gravitational potential, whose divergence vanishes, as discussed in Section 5.8. 5. Notes on Curvature Derivations Direct substitution of the principal q values into the curvature formula of Section 5.3 gives a somewhat complicated expression, and it may not be obvious that it reduces to the expression given in the text. Even some symbolic processors seem to be unable to accomplish the reduction. So, to verify the result, recall that we have

where m = (ca)/b. The roots of the quadratic in q are

and of course qq' = 1. From the 2nd equation we have q2 = 1 + 2mq, so we can

substitute this into the curvature equation to give

Adding and subtracting c in the numerator, this can be written as

Now, our assertion in the text is that this quantity equals (a+c) + b . If we subtract 2c from both of these quantities and multuply through by 1 + mq, our assertion is

Since q = m + the right hand term in the square brackets can be written as bq  bm, so we claim that

Expanding the right hand side and cancelling terms and dividing by m gives

Now we multiply by the conjugate quantity q' to give

The quantities bq' cancel, and we are left with m = (c  a)/b, which is the definition of m. Of course the same derivation applies to the other principle curvature if we swap q and q'. Section 5.3 also states that the Gaussian curvature of the surface of a sphere of radius R is 1/R2. To verify this, note that the surface of a sphere of radius R is described by x2 + y2 + z2 = R2, and we can consider a point at the South pole, tangent to a plane of constant z. Then we have

Taking the negative root (for the South Pole), factoring out R, and expanding the radical into a power series in the quantity (x2 + y2) / R2 gives

Without changing the shape of the surface, we can elevate the sphere so the South pole is just tangent to the xy plane at the origin by adding R to all the z values. Omitting all powers of x and y above the 2nd, this gives the quadratic equation of the surface at this point

Thus we have z = ax2 + bxy + cx2 where

from which we compute the curvature of the surface

as expected. 6. Odd Compositions It's interesting to review the purely formal constraints on a velocity composition law (such as discussed in Section 1.9) to clarify what distinguishes the formulae that work from those that don't. Letting v12, v23, and v13 denote the pairwise velocities (in geometric units) between three co-linear particles P1, P2, P3, a composition formula relating these speeds can generally be expressed in the form

where f is some function that transforms speeds into a domain where they are simply additive. It's clear that f must be an "odd" function, i.e., f(-x) = -f(x), to ensure that the same composition formula works for both positive and negative speeds. This rules out transforms such as f(x) = x2, f(x) = cos(x), and all other "even" functions. The general "odd" function expressed as a power series is a linear combination of odd powers, i.e.,

so we can express any such function in terms of the coefficients [c1,c3,...]. For example, if we take the coefficients [1,0,0,...] we have the simple transform f(x) = x, which gives the Galilean composition formula v13 = v12 + v23. For another example, suppose we "weight" each term in inverse proportion to the exponent by using the coefficients [1, 1/3, 1/5, 1/7,...]. This gives the transform

leading to Einstein's relativistic composition formula

From the identity atanh(x) = ln[(1+x)/(1x)]/2 we also have the equivalent multiplicative form

which is arguably the most natural form of the relativistic speed composition law. The velocity parameter p = (1+v)/(1-v) also gives very natural expressions for other observables as well, including the relativistic Doppler shift, which equals , and the spacetime interval between two inertial particles each one unit of proper time past their point of intersection, which equals p1/4  p-1/4. Incidentally, to give an equilateral triangle in spacetime, this last equation shows that two particles must have a mutual speed of = 0.745... 7. Independent Components of the Curvature Tensor As shown in Section 5.7, the fully covariant Riemann curvature tensor at the origin of Riemann normal coordinates, or more generally in terms of any “tangent” coordinate system with respect to which the first derivatives of the metric coefficients are zero, has the symmetries

These symmetries imply that although the curvature tensor in four dimensions has 256 components, there are only 20 algebraic degrees of freedom. To prove this, we first note that the anti-symmetry in the first two indices and in the last two indices implies that all the components of the form Raaaa, Raabb, Raabc, Rabcc, and all permutations of Raaab are zero, because they equal the negation of themselves when we transpose either the first two or

the last two indices. The only remaining components with fewer than three distinct indices are of the form Rabab and Rabba, but these are the negatives of each other by transposition of the last two incides, so we have only six independent components of this form (which is the number of ways of choosing two of four indices). The only non-zero components with exactly three distinct indices are of the forms Rabac = Rbaac = Rabca = Rbaca, so we have twelve independent components of this form (because there are four choices for the excluded index, and then three choices for the repeated index). The remaining components have four distinct indices, but each component with a given permutation of indices actually determines the values of eight components because of the three symmetries and anti-symmetries of order two. Thus, on the basis of these three symmetries there are only 24/8 = 3 independent components of this form, which may be represented by the three components R1234, R1342, and R1432. However, the skew symmetry implies that these three components sum to zero, so they represent only two degrees of freedom. Hence we can fully specify the Riemann curvature tensor (with respect to “tangent” coordinates) by giving the values of the six components of the form Rabab, the twelve components of the form Rabac, and the values of R1234 and R1342, which implies that the curvature tensor (with respect to any coordinate sytem) has 6 + 12 + 2 = 20 algebraic degrees of freedom. The same reasoning can be applied in any number of dimensions. For a manifold of N dimensions, the number of independent non-zero curvature components with just two distinct indices is equal to the number of ways of choosing 2 out of N indices. Also, the number of independent non-zero curvature components with 3 distinct indices is equal to the number of ways of choosing the N-3 excluded indices out of N indices, multiplied by 3 for the number of choices of the repeated index. This leaves the components with 4 distinct indices, of which there are 4! times the number of ways of choosing 4 of N indices, but again each of these represents 8 components because of the symmetries and anti-symmetries. Also, these components can be arranged in sets of three that satisfy the three-way skew symmetry, so the number of independent components of this form is reduced by a factor of 2/3. Therefore, the total number of algebraically independent components of the curvature tensor in N dimensions is

Bibliography

Aristotle, "The Physics", (trans by Wicksteed and Cornford), Harvard Univ. Press, 1957. Armstrong, M. A., "Groups and Symmetry", Springer-Verlag, 1988. Baierlein, Ralph, "Newton to Einstein, The Trail of Light", Cambridge Univ Press, 1992. Barrow, John, "Theories of Everything, The Quest for Ultimate Explanation", Clarendon Press, 1991. Barut, A., "Electrodynamics and Classical Theory of Fields and Particles", Dover, 1964. Bate, Roger R., et al, "Fundamentals of Astrodynamics", Dover, 1971. Beck, Anna (translator), “The Collected Papers of Albert Einstein”, Princeton University Press, 1989. Bell, J. S., "Speakable and Unspeakable in Quantum Mechanics", Cambridge Univ. Press, 1993. Bergmann, Peter, "Introduction to the Theory of Relativity", Dover, 1976. Bergmann, Peter, "The Riddle of Gravitation", Dover, 1968. Bonola, Roberto, "Non-Euclidean Geometry", Dover, 1955. Borisenko, A.I., and Tarapov, I.E., "Vector and Tensor Analysis with Applications", Dover, 1968. Born, Max, "Einstein's Theory of Relativity", Dover, 1962. Boas, Mary, "Mathematical Methods in the Physical Sciences", 2nd ed., Wiley, 1983. Boyer, Carl, "A History of Mathematics", Princeton Univ Press, 1985. Bryant, Victor, "Metric Spaces", Cambridge Univ. Press, 1985. Buhler, W. K., "Gauss, A Biographical Study", Springer-Verlag, 1981. Caspar, Max, "Kepler", Dover, 1993. Christianson, Gale E., "In the Presence of the Creator, Isaac Newton and His Times", The Free Press, 1984. Ciufolini and Wheeler, "Gravitation and Inertia", Princeton Univ. Press, 1995. Clark, Ronald, "Einstein, The Life and Times", Avon Books, 1971. Copernicus, Nicolaus, "On the Revolutions of Heavenly Spheres", Prometheus Books, 1995. Cushing, James, "Philosophical Concepts in Physics", Cambridge Univ. Press, 1998. Das, Anadijiban, "The Special Theory of Relativity", Springer-Verlag, 1993. Davies, Paul, "The Accidental Universe", Cambridge Univ. Press, 1982. Davies, Paul, "About Time", Simon & Schuster, 1996. D'Inverno, Ray, "Introducing Einstein's Relativity", Clarendon Press, 1992. Dirac, P. A. M., "The Principles of Quantum Mechanics", 4th ed., Oxford Science Publications, 1957. Doughty, Noel, "Lagrangian Interaction", Perseus Books, 1990. Duncan, Ronald, and M. Weston-Smith (ed.), "The Encyclopedia of Ignorance", Pocket Books, 1977. Earman, John, "World Enough and Space-Time", MIT Press, 1989. Ebbinghaus, H.D.,et al., "Mathematical Logic", Springer-Verlag, 1994. Einstein, Albert, "The Meaning of Relativity", Princeton Univ. Press, 1956. Einstein, Albert, "Sidelights on Relativity", Dover, 1983. Einstein, Albert, "Relativity, The Special and General Theory", Crown Trade, 1961. Einstein, Albert, "The Theory of Relativity and Other Essays", Citadel Press, 1996. Einstein, et al, "The Principle of Relativity", Dover, 1952. Eisberg and Resnick, "Quantum Physics", John Wiley & Sons, 1985.

Euclid, "The Elements" (translated by Thomas Heath), Dover, 1956. Feynman, Richard, “Feynman Lectures on Gravitation”, Addison-Wesley Publishing, 1995. Feynman, Richard, "QED, The Strange Theory of Light and Matter", Princeton Univ Press, 1985. Feynman, Richard, "The Character of Physical Law", M.I.T. Press, 1965. Fowles, Grant, "Introduction to Modern Optics", Dover, 1975. Friedman, Michael, "Foundations of Spacetime Theories", Princeton Univ. Press, 1983. Frauenfelder, Hans, and Ernest, M. Henley, "Subatomic Physics", Prentice-Hall, Inc., 1974. Galilei, Galileo, "Sidereus Nuncius", Univ. of Chicago Press, 1989. Galilei, Galileo, "Dialogue Concerning the Two Chief World Systems", Univ. of Cal. Press, 2nd ed., 1967. Gemignani, Michael, "Elementary Topology", 2nd ed., Dover, 1972. Gibbins, Peter, "Particles and Paradoxes", Cambridge Univ. Press, 1987. Goldsmith, Donald, "The Evolving Universe", Benjamin/Cummings Publishing, 1985. Goodman, Lawrence E., and Warner, William H., "Dynamics", Wadsworth Publishing Co. Inc., 1965. Greenwood, Donald T., "Principles of Dynamics", Prentice-Hall, 1965. Guggenheimer, Heinrich, "Differential Geometry", Dover, 1977. Halliday and Resnick, "Physics", John Wiley & Sons, 1978. Hawking S.W. and Ellis G.F.R., "The Large Scale Structure of Spacetime", Cambridge Univ. Press, 1973. Hay, G.E., "Vector and Tensor Analysis", Dover, 1953. Heath, Thomas, "A History of Greek Mathematics", Dover, 1981. Hecht, Eugene, "Optics", 3rd ed.,Addison-Wesley, 1998. Heisenberg, Werner, "The Physical Principles of the Quantum Theory", Dover, 1949. Hilbert, David, "Foundations of Geometry", Open Court, 1992. Huggett and Tod, "An Introduction to Twistor Theory", Cambridge Univ Press, 1985. Hughes, R. I. G., "The Structure and Interpretation of Quantum Mechanics", Harvard Univ. Press, 1989. Joshi, A. W., "Matrices and Tensors In Physics", Halstead Press, 1975. Jones and Singerman, "Complex Functions", Cambridge Univ. Press, 1987. Judson, Lindsay (ed.), "Aristotle's Physics", Oxford Univ. Press, 1991. Kennnefick, D. , "Controversies in History of Reaction problem in GR", preprint gr-qc 9704002, Apr 1997. Kline, Morris, "Mathematical Throught from Ancient to Modern Times", Oxford Univ. Press, 1972. Kramer, Edna, "The Nature and Growth of Modern Mathematics", Princeton Univ. Press, 1982. Kuhn, Thomas S., "The Copernican Revolution", Harvard University Press, 1957. Liepmann, H. W., and Roshko, A., "Elements of Gas Dynamics", John Wiley & Sons, 1957. Lindley, David, “Degrees Kelvin”, Joseph Henry Press, 2004. Lindsay and Margenau, "Foundations of Physics", Ox Bow Press, 1981. Lloyd, G. E. R., “Greek Science After Aristotle”, W. W. Norton & Co., 1973.

Lorentz, H. A., “The Theory of Electrons”, 2nd edition (1915), Dover, 1952. Lovelock and Rund, "Tensors, Differential Forms, and Variational Principles", Dover, 1989. Lucas and Hodgson, "Spacetime & Electromagnetism", Oxford Univ Press, 1990. McConnell, A.J., "Applications of Tensor Analysis", Dover, 1957. Menzel, "Fundamental Formulas of Physics", Dover, 1960. Miller, Arthur, "Albert Einstein's Special Theory of Relativity", Springer-Verlag, 1998. Misner, Thorne, and Wheeler, "Gravitation", W.H. Freeman & Co, 1973. Mahoney, Michael, "The Mathematical Career of Pierre de Fermat", 2nd ed, Princeton Univ Press, 1994. Maxwell, James Clerk, "A Treatise on Electricity and Magnetism", Dover 1954. Nagel, Ernest and Newman, James R., "Godel's Proof", New York Univ. Press, 1958. Neumann, John von, "Mathematical Foundations of Quantum Mechanics", Princeton Univ. Press, 1955. Newton, Isaac, "Principia" (trans by Motte and Cajori), Univ of Calif Press, 1962. Newton, Isaac, "Principia" (trans by Cohen and Whitman), Univ of Calif Press, 1999. Newton, Isaac, "Opticks", Dover, 1979. Ohanian and Ruffini, "Gravitation and Spacetime", 2nd ed., W.W Norton & Co., 1994. Olson, Reuben, "Essentials of Engineering Fluid Mechanics", 3rd ed., Intext Press, 1973. Pais, Abraham, "Subtle is the Lord", Oxford Univ Press, 1982. Pannekoek, A. "A History of Astronomy", Dover, 1989. Peat, F. David, "Superstrings and the Search for the Theory of Everything", Contemporary Books, 1988. Pedoe, Dan, "Geometry, A Comprehensive Course", Dover, 1988. Penrose, Roger, "The Emperor's New Mind", Oxford Univ Press, 1989. Poincare, Henri, "Science and Hypothesis", Dover, 1952. Prakash, Nirmala, "Differential Geometry, An Integrated Approach", Tata McGraw-Hill, 1981. Price, Huw, "Time's Arrow and Archimedes Point", Oxford Univ Press, 1996. Ridley, B.K., "Space, Time, and Things", Penguin Books, 1976. Rindler, Wolfgang, "Essential Relativity", Springer-Verlag, 1977. Ray, Christopher, "Time, Space, and Philosophy", Routledge, 1992. Reichenbach, Hans, "The Philosophy of Space and Time", Dover, 1958. Reichenbach, Hans, "From Copernicus to Einstein", Dover, 1980. Ronchi, "Optics, The Science of Vision", Dover, 1991. Roseveare, N. T., "Mercury's Perihelion from Le Verrier to Einstein", Oxford Univ. Press, 1982. Savitt, Steven F., "Time's Arrow Today", Cambridge Univ Press, 1995. Schey, H. M., "Div, Grad, Curl, and All That", W.W.Norton & Co, 1973. Schwartz, Melvin, "Principles of Electrodynamics", Dover, 1987. Schwarzschild, Karl, "On the Gravitational Field of a Mass Point According to Einstein's Theory", Procedings of the Prussian Academy, 13 Jan 1916. Shilov, Georgi, "Linear Algebra", Dover, 1977. Smith, David Eugene, "A Source Book In Mathematics", Dover, 1959. Spivak, Michael, "Differential Geometry", Publish or Perish, 1979. Squires, Euan, "The Mystery of the Quantum World", 2nd ed., Institute of Physics, 1994.

Stachel, John (ed.), "Einstein's Miraculous Year", Princeton Univ. Press, 1998. Steen, Lynn Arthur, "Mathematics Today", Vintage Books, 1980. Stillwell, John, "Mathematics and Its History", Springer-Verlag, 1989. Synge and Schild, "Tensor Calculus", Dover, 1949. Taylor and Mann, "Advanced Calculus", Wiley, 3rd ed, 1983. Thorne, Kip, "Black Holes and Time Warps", W.W. Norton & Co, 1994. Torretti, Roberto, "Relativity and Geometry", Dover, 1996. Visser, Matt, "Lorentzian Wormholes", AIP Press, 1996. Wald, Robert, "General Relativity", Univ of Chicago Press, 1984. Weinberg, Steven, "Gravitation and Cosmology", John Wiley & Sons, 1972. Weinstock, Robert, "Calculus of Variations", Dover, 1974. Westfall, Richard S., "Never At Rest, A Biography of Isaac Newton", Cambridge Univ. Press, 1980. Weyl, "Space, Time, Matter", Dover, 1952. Whittaker, E. T., “A History of the Theories of Aether and Electricity”, 2nd ed., Harper & Brothers, 1951. Wick, David, "The Infamous Boundary", Birkhauser, 1995. Yourgrau and Mandelstam, "Variational Principles in Dynamics and Quantum Theory", Dover, 1979. Zahar, Elie, "Einstein's Revolution, A Study in Heuristic", Open Court, 1989.

Related Documents


More Documents from "Muhammad Noman Hameed"

Secrets Of A Super Hacker
February 2021 2
Translation By Faith
January 2021 2
January 2021 0