Special Relativity for Human Beings

The Basic Axiom
Coordinates and Coordinate Transformations
Simultaneity and Lorentz Transformations
The "boost": $\gamma (\beta )$
Space-Time and Invariants
Brief aside on the speed of light ($c$)
Time Dilation
Lorentz Contraction
Lorentz Transformations
Addition of Lorentz Transformations
Relativistic Doppler Shift
Relativistic Velocity
Energy and momentum
$E=mc^2$
$c=1$
Space-time Diagrams
Barn and Ladder paradox
Proper Time (and the "Twin Paradox")
Prelude to General Relativity

The Basic Axiom of Relativity

Relativity theory starts with considerations about how electromagnetic waves propagate. For the physicists, this means using the Maxwell equations that govern how electric and magnetic fields are related to sources (charges and currents), and how in a vacuum with no sources the equations describe traveling electromagnetic (EM) waves: $$\frac{\partial E}{\partial z^2} - \mu_0\epsilon_0\frac{\partial E}{\partial t^2} = 0\label{eqn_e}$$ $$\frac{\partial B}{\partial z^2} - \mu_0\epsilon_0\frac{\partial B}{\partial t^2} = 0\label{eqn_b}$$ where $\mu_0$ and $\epsilon_0$ are the scale factors for the magnetic and electric fields in MKS units: $$\mu_0 = 4\pi\times 10^{-7}N/A^2\nonumber$$ $$\frac{1}{4\pi\epsilon_0} = 9\times 10^9 Nm^2/C^2\nonumber$$ $N$ is for newtons, $A$ is for amperes, and $C$ is for coulombs. These 2 equations describe electric ($E$) and magnetic ($B$) waves traveling along the $z$-axis moving with a velocity given by the coefficient in front of the 2nd term (time derivatives): $$\frac{1}{v^2} = \mu_0\epsilon_0 = 4\pi\times 10^{-7}N/A^2\frac{1}{4\pi\times 9\times 10^{9}}C^2/Nm^2 = \frac{1}{9\times 10^{16}}s^2/m^2\label{eqn_v}$$ which gives us $v=3\times 10^8 m/s$, recognizable as the speed of light. That Maxwell's equations gives us an EM wave (aka light in the visible part of the frequency spectrum) with a constant velocity did not really bother anyone, since what everyone knew at the time was that traveling waves are the manifestation of some disturbance in some medium, and the "stiffer" the medium, the faster the distrubance. So traveling EM waves must be traveling through some medium, called the "ether", which representing the "fabric" of the universe. Which means that the ether must be the special place where there was no motion - a "rest frame" where if you were in that frame, you were at the one true absolute zero velocity in the universe, with everything moving relative to that. All of these ideas came out in the late 19th century.

Ever since Galileo people have accepted a "principle of relativity" which said that the laws of physics are the same in all rest frames moving with a constant velocity. That principle was just common sense - if you were in a train traveling at constant velocity, if you bounced a ball or played catch in the train, it would be just like you were standing on the earth, so the constant velocity frame must have the same laws of physics. That principle tells you that if you in a train moving with some velocity $u$, and on the train you throw a ball in the direction of motion with velocity $v$ (relative to your standing still on the train), that the velocity of the ball as meausred by a person standing on the tracks, would be $u+v$. For instance, if you can throw a ball at $60mph$, and you do it while standing still on a train moving at $50mph$, releasing it as you pass someone standing still on a platform, then that person on the platform will see the ball moving away at $110mph$.

Putting these two ideas together (the ether, and rest frames with relative velocities), efforts started in the late 19th century to actually measure the effect of an ether by looking for the "ether wind" effect. This wind should make the velocity of light different in the direction of motion than it would be in a perpendicular direction, and this could be measured quite accurately with an interferometer (the Michelson-Morley experiment). By the early 20th century, the curious thing was that even though most people believed in the existence of an ether, and hence an absolute velocity frame of reference, there was no experimental evidence for it.

Meanwhile, a youthful Albert Einstein puzzled over what it would be like to run alongside a light wave at the speed of light: would you see an oscillating E and B field in the plane transverse to your motion? The lack of any experimental evidence to detect the effects of an ether, along with puzzles concerning the relative motion of magnets near wire loops led Einstein to form the theory of relativity in a 1905 paper based on 2 postulates:

There is no ether (Einstein says that the ether is "superfluous" - you don't need it for anything), and no such thing as absolute motion
The speed of light in any rest frame is the same as all others

Why these two postulates? How are they related? The answer comes from thinking about the following situation, a so-called "thought experiment", or "gedanken experiment", a technique that Einstein used to a great degree in his amazing career of accomplishments. Imagine that you are in a spaceship far away from anything that could generate a gravity force (like a planet or a star), and you are moving at a constant velocity (no acceleration), but you don't know how fast you are going. Imagine also that the ship has no windows, and no sensors, so basically you see nothing outside, and you can't interact with the outside world at all. Now imagine that there's an ether, and you want to do an experiment to measure it, to meausure your absolute velocity relative to the cosmic stillness. Now imagine that the way you decide to do this experiment was to measure the "ether wind", and you employ the usual technique of shining a beam of light in perpendicular directions inside your space ship and measuring the different light velocities. If there was an ether, then you should be able to measure it. If there is no ether, then the velocities you measure should be the same. What would happen would be that, since there is no ether and no absolute velocities, the value you get should for measuring the speed of light in your spaceship would ge the same $3\times 10^8 m/s$ ($186,000 mph$), in all directionss. Which means that if there is no ether, then the speed of light is the same in all reference frames. Which means that if you are moving past an observer who is standing still (in his/her frame) and you shine a light beam along the direction of motion, you would both measure the same velocity, defying intuition. adding the velocity of the train to the velocity of the ball in the train to get the velocity of the ball relative to an observer on the track is something we can measure, and it it's always verified. Yet what Einstein was saying was that for light, it is not like that at all, which was equivalent to saying that no matter how fast you throw the ball on the train, the observer on the ground will still measure it to be the same as you would measure it while on the train. And that ran counter to intuition from experience. And that is an incredible thing, because intuition is built up from experience, and deduction. How Einstein figured this out is another (very good) story (and Walter Isaacson's book is my favorite on this), but all the problems that led to relativity were "in the air" by time he took it up seriously, and solving it took someone willing to think hard and abandon orthodoxy.

This pretty much sums up the situation in the early 1900s when the young Einstein started thinking about how to reconcile things. And by doing so, and coming up with these two postulates, he brought about the revolution in science that was special relativity.

Coordinates and Coordinate Transformations

To understand special relativity we must first get up to speed on how physicists deal with things like the Galilean velocity formula, in formal mathematical ways. This starts with the idea of a "coordinate system", a structure used to describe the position of any object. This is needed in order to have a well defined unambiguous way to all locations of everything inside the isolated space-ship, or anything else. To make it easy, let's imagine we are in the space-ship, which is built so that all the walls form right angles. Like in the following:

The labels "x", "y", and "z" are variables. Imagine that you want to tell some small robot to go to how to some point inside the ship. How would you specify this? The way this is done is get the robot to agree on the "origin" (marked as "O" in the figure) and the 3 directions. For instance, you might make the origin be a corner of the square ship, and the 3 directions are along the 3 places where the floors and the walls meet (along the dotted lines in the figure), and have the floor be in the $xy$ plane. Then all you have to do is to specify how far to walk along "x", how far along "y", and then how far above the floor, along "z". By specifying those 3 numbers, the robot can find it's way to any location.

For example, if you just want to get the robot to the end of the "y" axis, then you tell it "0 along x", "how much along y", and "0 along z". So you have to give it 3 numbers: (0,y,0) to get to that specific point, or in general (x,y,z) to get to any arbitrary point inside the volume. This is what a coordinate system is, and this is how you would use it.

And so the ship is the framework for the coordinate system, and you use the coordinate system to reference any point inside the framework. What physicists usually do is call this a "reference frame". And in our particular case the reference frame is moving at a constant velocity somewhere far away from any effects of gravity. And this reference frame, like all reference frames, concern itself with the 3 dimensions of space. This is important - space has 3 dimensions.

Back to your space ship. You might decide that no matter what anyone else says, you think there's an ether and you want to measure your absolute velocity by measuring the speed of light - if you measured anything other than $3\times 10^8m/s$, then that would tell you what your absolute velocity is! What you decide to do is to measure the speed of light along your direction of motion, then briefly turn on your rocket engines, change your velocity, measure it again. You could even turn the rocket around and do the experiment again. If there's an ether, and an absolute reference frame, the speed of light in the ship should be greatest when you finally got the rocket into that frame, and that special frame would be "absolute zero" as far as velocity is concerned. Too bad, because you would find that no matter what velocity you gave the ship, the measurement of the speed of light inside the ship would always be $3\times 10^8m/s$.

Ok now we can get a little bit more complicated. Imagine you have a reference frame called $O$ with $x$ and $y$ axes as in the figure below. And imagine another frame called $O'$ (in blue) that is moving parallel to the $x$-axis with some velocity $v$. We will invent a point in $O'$ labeled ($x',y'$ in blue) where the values of $x'$ and $y'$ measure the coordinates in frame $O'$, that is, the distance from the point to the $x'$ and $y'$ axes. If we know the coordinates $x',y'$ in $O'$, then how do find the values of the same point but as measured in the $O$ frame: $x,y$? This is illustrated below. The yellow box is like the train, which contains (it is the rest frame $O'$) it's moving with velocity Click to run example, and let it go to the end of the frame move, and click to reset it back to the beginning.

It should be easy to come up with such a formula: the y-coordinates are the same in both frames and do not change (they are drawn in the above demo with a small offset only so that you can see the different frames clearly, just pretend the frames have exactly the same horizontal axis), so $y =y'$. The x-coordinates are different only by the distance $D$ in the time $t$ that the frame $O'$ has travelled, and since velocity is uniform (constant), we should have $D = vt$, so the equation for "transforming" from $O$ to $O'$ will be $$x=x'+vt\label{eq1}$$ $$y=y'\label{eq2}$$ These equations are called the "Galilean transformation". Now here's the addition of velocity problem: if the point at ($x',y'$) was not stationary in $O'$, but instead moves with some velocity $u'$ along the $x'$ direction as measured in $O'$, what would someone in $O$ measure for that velocity ($u$)?

By calculus, if you just took the derivative with respect to time in equation (\ref{eq1}) above, you would get $u = u' + v$. And this illustrates how if Einstein's 2 postulates of relativity are right, then the Galilean transformation is not correct: if both $u'$ and $v$ were say 0.6c, then $u=u'+v=1.2c$, larger than the velocity of light. So this transformation can't be right. What to do? Where did we go wrong? A hint: we left out a 3rd equation above which was implicit, that time is measured the same in both frames: $$t=t'\label{eq3}$$ But is time the same in both frames? Here is a simple example that should make you question that, courtesy of Albert Einstein. It was his recognition that something had to give that brought him to consider whether time was, as everyone believed, absolute. And that changed everything.

Simultaneity and Lorentz Transformations

Imagine two physicists, both standing still, but one prepared to run to the right (RED). Then we place 2 light bulbs equidistant from the physicist standing still (BLUE), one on the left and one on the right, and start the running moving. When the running physicist is parallel to the standing physicist, we have the standing physicist push a button that sets off each light bulb so that each one emits some light for a brief (very brief) time, and each flash will then move away from each bulb. Imagine that the standing physicist pushes a button that causes a current to flow in two wires, each wire is the same length, and so the bulbs turn on at the same time. We will then mark when the flash from each bulb gets to the running and standing still physicist. Click to run the simulation and to reset it.

What you should be able to notice is that for the non-runner, the waves from either side reach him at the same time: simultaneously. But will the runner agree? No, he will not, he will say that he saw the flash from the bulb on the right (the one he's running towards) first, then the one from the left second. In other words, he will say that the two events (seeing the flash from each bulb) did not happen simultaneously. Now, we have to be very careful here or we can get easily confused. First we need to define the frames: let $O$ be the frame of the physicist standing still, and this happens to be the reference frame of the 2 light bulbs. When the standing physicists sees both flashes, he and the bulbs are in the same frame, so he would measure zero time difference between seeing the two flashes. That is what is meant by "simultaneous". The running physicist will be in frame $O'$, moving with some velocity (call it $v$ but we do not need to know its value) relative to $O$. The physicist in reference frame $O'$ is measuring the time between two events that are taking place in $O$, but he is doing the measurement in his frame, $O'$. When he does this, he does not get a zero time difference: he will say that the two events did not happen simultaneously in his frame. Which is correct? The answer is that the entire idea of simultaneity exists, but it is evidently relative, and depends on what frame you are referring to. So simulatenity exists, but absolute simultaneity does not exist. And so this is telling us that indeed, equation (\ref{eq3}) is probably what we should be thinking about!

Of course, this simulation is not very realistic, because in the world we are familiar with, you can't run very fast compared to the speed of light, so as soon as the runner gets parallel to the standing physicist, the light flashes, and they will both pretty much agree on simultaneity. This tells us that simultaneity holds pretty well when we are talking about reference frames moving with velocities (relative to each other) that are small compared to the velocity of light. Which is the case here on earth. Light travels $186,000$ miles in a second, and the moon is $234,000$ miles from the surface of the earth, so it takes only $2.5$ seconds for light to go from the earth's surface to the moon and back. Compare that to terrestrial speeds, where even the International Space Station, which is $17,100 mph = 4.76 mi/s$ is $0.0026\%$ of the speed of light.

So, to come up with a new set of transformation equations, we clearly need to rethink equations $\ref{eq1}$ and $\ref{eq2}$. What do you use for a guiding principle? How do we start? Given what we learned above in the simultaneity simulation, we know that any new equations we come up with for the transformation between two reference frames will have to reduce to the above equations in the limit of $v\lt\!\lt c$. So let's write down a possible solution that might be the easiest and simplest way to go:

First, define $\beta\equiv v/c$, which will turn out to be a very useful quantity. $\beta$ is between 0 and 1, with $\beta\sim 0$ describing our world where things move slowly compared to light, and $\beta\sim 1$ describing the "relativistic" world. For light, $\beta=1$. Also define a new function $\gamma(\beta)$ (a function of $\beta$) such that in the limit $\beta\to 0$, then $\gamma\to 1$. Then as a guess to the form of the correct equations, the easiest thing would be to modify equations $\ref{eq1},\ref{eq2},\ref{eq3}$ something like this:

$$\begin{align} x & =\gamma (x'+\beta ct')\label{eq4}\\ y & =y'\label{eq5}\\ ct & = \gamma (ct'+\beta x')\label{eq6}\\ \end{align}$$

with the constraints that $0\le \beta\equiv v/c\le 1$, and $\gamma\to 1$ as $\beta\to 0$ (so that we reproduce the Galilean equations $\ref{eq1}-\ref{eq3}$).

Notice several things here:

Everywhere there is a $t$ we replace it by $ct$ so that we have the same dimensions as distance, just as a convenience.

The transverse dimension $y$ (transverse to the direction of motion) is unchanged, as it should be.

Our new thing $\gamma$ is such that if we take the limit $\beta\to 0$, equations (\ref{eq4}) and (\ref{eq6}) would reduce back to (\ref{eq1}) and (\ref{eq3}). In fact, $\gamma$ has to be a function of $\beta$ (actually it's a function of $\beta^2$) and not just $v$ because when we take a limit of a quantity that has a dimension, we have to ask "limit compared to what"? But since $\beta$ is dimensionless, it's easy to take the limit: just set $\beta$ to $0$, equivalent of answering the question "limit compared to what" as "limit compared to the speed of light $c$".

These equations give $x(x',t')$ and $t(x',t')$. This is pretty amazing when you think about it - it mixes space and time between two frames, and treats time as an independent but equivalent dimensional variable. But then, that's what the whole idea of the relativity of simultaneity is telling us.

Given that we now have a guess for $x(x',t')$ and $t(x',t')$, what about the inverse equations $x'(x,t)$ and $t'(x,t)$? Easy - remember the principle of relativity, that only relative velocities matter? To go from $O$ where we have $x(x',t')$ and $t(x',t')$ to $O'$ all we have to do is changed $v$ to $-v$ in equations (\ref{eq4})-(\ref{eq6}) and swap primed for unprimed and vice versa. This gives the equations: $$\begin{align} x' & = \gamma (x-\beta ct)\label{eq4p} \\ y' & = y\label{eq5p} \\ ct' & = \gamma (ct-\beta x)\label{eq6p}\\ \end{align}$$

An interesting way to look at equations (\ref{eq4})-(\ref{eq6}) is to take the differential, giving:

$$\begin{align} \Delta x & = \gamma (\Delta x'+\beta c\Delta t)\label{eq4pp}\\ \Delta y & = \Delta y'\label{eq5pp}\\ c\Delta t & = \gamma (c\Delta t'+\beta \Delta x')\label{eq6pp}\\ \end{align}$$

This is telling us about intervals, and how measurements are made, and this is important (see below).

With these guesses, we should check to see how velocities transform, and even more importantly whether Einstein's second postulate holds, and whether that constrains anything. That is, if we have something moving with velocity $u'$ in $O'$, what would someone in $O$ measure for $u$? For example, if you are on an airplane moving with velocity $v$, and the airplane is the $O'$ frame, and you throw a ball with velocity $u'$ on the plane, what would a person on the ground ($O$ frame) measure for $u$?

First note that $u=dx\!/\!dt$ and $u'=dx'\!/\!dt'$, and what we are after is how to calculate $u$. Next take equations (\ref{eq4pp}-\ref{eq6pp}) and change all the $\Delta 's$ to differentials, $dt's$: $$dx=\gamma (dx'+\beta c\cdot dt)\label{eqd4}$$ $$dy=dy'\label{eqd5}$$ $$c\cdot dt=\gamma (c\cdot dt'+\beta dx')\label{eqd6}$$ Then by the chain rule: $$u = \frac{dx}{dt}=\frac{dx}{dt'}\cdot\frac{dt'}{dt}\nonumber$$ We then can calculate $$\frac{dx}{dt'}=\frac{d}{dt'}[\gamma(x'+\beta c t')]=\gamma(u'+\beta c)\nonumber$$ Remember that $u'=dx'/dt'$ and $d\gamma/dt=0$ because $\gamma$ is only a function of $\beta$. We can also calculate $dt'/dt$ by calculating $dt/dt'$ using equation (\ref{eqd6}) to get $c\cdot dt/dt'=\gamma(c+\beta u')$ and inverting. This gives:

$$u=\frac{dx}{dt}=\frac{\gamma(u'+\beta c)}{\gamma(c+\beta u')/c}=c\frac{u'+\beta c}{c+\beta u'} =c\frac{\beta'+\beta}{1+\beta\beta'}\label{eaddbeta}$$ where we are using the notation $\beta' = u'/c$ (for reasons more to do with what's in a section below).

Now we have to check if we get the result that the speed of light $c$ is the same in both frames: set $u'=c$ ($\beta'=1$) and we can see easily that we get $u=c$ as well. So it looks good, the minimal change to the Galilean transformation by adding the factor of $\gamma(\beta)$ and treating time as an equivalent dimension works.

But we still need to figure out $\gamma(\beta)$! And by pulling that string the whole edifice of special relativity falls into place!

$\gamma (\beta )$

If simultaneity is relative, then we should be able to calculate how much two physicists in two different reference frames would disagree about how long things take. So we can do another simulation, also involving two reference frames: frame $O'$ (in blue) moves at some constant relative velocity $v$ in frame $O$ (in red). The red stick figure (RED) is standing still in $O$, and the blue stick figure (BLUE) is moving in frame $O$ with velocity $v$. In the moving frame $O'$ we have a laser and a mirror. Here we exaggerate the simulation and use a colored ball to represent the light beam. In frame each frame the observer sees the laser fire, and the light beam bounces off the mirror on the ceiling a distance $H$ above the laser, making a round trip. The light path is colored according to what the observer sees.

Click to run the simulation and to reset it.

In frame $O'$, on the right, the light travels up and back, a total distance $\Delta y'=2H$ in a time period $\Delta t'$ ($\Delta y'$ and $\Delta t'$ are as measured in $O'$, the blue frame). Since the speed of light is constant in all reference frames, we would then have $c=\Delta y'\!/\!\Delta t'$, or $2H=c\Delta t'$ using $\Delta y'=2H$.

On the left we have frame $O$, and in that frame, RED watches the path of the light in his frame ($O$) as BLUE moves to the right with velocity $v$. RED sees the light go at an angle, whereas BLUE sees it go straight up and back down. For RED, the light travels a total distance $2L$ in a time period $\Delta t$. We can calculate the total distance $2L$ in terms of the vertical and horizontal distance using the Pythagorean theorem and the distances $H$ (which is the same in both frames) and $v\Delta t/2$, the horizontal distance that BLUE moves when the light beam has hit the ceiling:

$$L^2=(v\!\frac{\Delta t}{2}\!)^2+H^2\label{e3}$$

Using the same rule that the speed of light is the same in all reference frames, we can calculate the total distance traveled in $O$ is equal to the velocity ($c$) times the time it takes ($\Delta t$), or $2L=c\Delta t$.

Here's the punchline: if the speed of light is the same in all reference frames, then if in the RED frame the light went a longer distance (along diagonals), then in his frame the light has to take a longer time getting there. And this is exactly what makes Einstein's special theory of relativity different from the Galilean theory.

To complete the calculation for $\gamma(\beta)$, we can substitute for $L=c\Delta t/2$ and $H=c\Delta t'/2$ into equation $\ref{e3}$ to get

$$(\!c\frac{\Delta t}{2}\!)^2=(\!v\frac{\Delta t}{2}\!)^2 + (\!c\frac{\Delta t'}{2}\!)^2\nonumber$$

Rearranging terms and getting rid of the factor of 1/2 gives:

$$(c^2-v^2)\Delta t^2 = c^2\Delta t'^2\nonumber$$

Now, divide by $c^2$ and use $\beta\equiv v/c$ to get

$$\Delta t'^2=(1-\beta^2)\Delta t^2\nonumber$$ or equivalently: $$\Delta t = \frac{\Delta t'}{\sqrt{1-\beta^2}}\label{etd}$$

This is a very interesting result, but how is it useful? To see that, let's do a calculation from first principles using equations (\ref{eq4})-(\ref{eq6}), and maybe we can use that to find what $\gamma(\beta)$ is.

Start with equation (\ref{eq6pp}), the differential form of equation $\ref{eq6}$: $$c\Delta t=\gamma (c\Delta t'+\beta \Delta x')\nonumber$$ Why? Because that equation relates the time interval $\Delta t$ in frame $O$ to both time and space intervals in the moving frame $O'$. These intervals measure the starting and ending of what we can call an "event", meaning the time and place when the laser light starts, and when it ends. This equation is very convenient because in $O'$, $\Delta x'=0$ (the light returns to it's original position) which gives us

$$\Delta t=\gamma \Delta t'\label{eq7}$$ Voila! Comparing to equation $\ref{etd}$ we now know the last piece: $$\gamma = \frac{1}{\sqrt{1-\beta^2}}\label{eqgamma}$$ This has the right form: $\gamma\to 1$ as $\beta\to 0$, as seen in the following plot. As you can see, even when the velocity of an object is as high as $v=0.2c$, or $6\times 10^7m/s=37,200$miles/sec, (not exactly crawling!) $\gamma$ is still near 1 to within $2\%$, which is basically the nonrelativistic Galilean regime. Note that you can click on any point on the graph and it should tell you the value for $\beta$ and $\gamma$.

Before moving on, it's important to understand what equation (\ref{eq7}) is telling us. In this situation, we are measuring a pure time interval in a reference frame (here $O'$). It's like looking at a stopwatch and measuring the time between two events in that frame. Equation (\ref{eq7}) tells us what someone in another frame would get if they were to measure the time between the two events. Now be careful: the events in $O'$ are not moving in that frame. $O'$ has a velocity relative to $O$, but in $O'$ these events are stationary in space, so we say that $O'$ is the proper frame for the events in $O'$, and the time difference $\Delta t'$ between the two events as measured in $O'$ is called the proper time. What equation (\ref{eq7}) is saying is that since simultaneity is relative, not absolute, that when someone in frame $O$ measures the time between the events that take place in $O'$, one will get a different answer, because 1) $O'$ is moving with respect to $O$; and 2) the transformations tell us that space and time are mixed up. This is the famous time dilation, and says that the proper time between two events is always minimal, which means that the time measured between events in anything other than the proper frame will be greater than the proper time.

Space-Time and Invariants

The above introduces us to the concept that space and time are all mixed up. The mathematician Hermann Minkowski wrote a beautiful paper in 1907 (he died of appendicitis in 1909, quite a loss), where he took Einstein's relativity theory ideas and mathematically unified the idea of space and time into a 4-dimensional space-time. That is, Minkowski showed how in space-time, "events" are things that happen at a particular 4 dimensional space-time point: $x,y,z,t$. Indeed, Minkowski wrote the following beautiful sentence that sums it up:

Henceforth space by itself, and time by itself, are doomed to fade away into mere shadows, and only a kind of union of the two will preserve an independent reality.

This broadening of our thinking from 3 dimensions plus time, to 4 dimensions, will be important to understanding General Relativity and gravity.

Let's consider how we go from 3-dimensional vectors to 4-dimensional objects. In regular space, we are familiar with the concept of vectors. These are objects that have a starting point, a direction, and a length. They point from one spatial location to another. What's important about vectors is that they can be represented in may different ways using many different coordinate systems (for instance, an infinite number of Cartesian coordinates that differ only by the angle between the various x-axes) but they still have as their "invariant" that they point along some direction, and have a definite length. So it doesn't matter how you represent the vector - the representation won't change the length, and that it points from one point to another. In the figure below you can see for yourself - hit the blue up or down arrow and you will see the axes rotated (relative to the grey axes). The vector that points from the unchanging origin (0,0) to the point does not change, however, because that point is a real point in space. But since the axes change, then the coordinates of the point will change because coordinates are always relative to the axes.

Angle: 0

We can define the quantity $\Delta r$ as the distance between the origin and the space point: $$\Delta r^2=\Delta x^2+\Delta y^2\label{eq8}$$ What's important to note is that even though the values of the space point coordinates change as the axes rotate, the distance does not. We say that that distance is "invariant".

Now, how do we extend this idea into space-time? We need to come up with an invariant! This is not hard to do, since we know a few properties of the new invariant:

It has to depend not only on $\Delta x$ and $\Delta y$, but also on $\Delta t$
For simultaneous measurements of spatial coordinates in a proper frame ($\Delta t = 0$), it has to reduce to the usual 3-d invariant of equation (\ref{eq8})
It can't depend on the relative velocity $v$, since it has to be an invariant!

There are several ways to figure out the true 4-dimensional invariant. One way that would give not only a space-time invariant but also a way to derive equation $\ref{eqgamma}$ starts with the Galilean transformation equations $\ref{eq1},\ref{eq2},\ref{eq3}$. Along the same lines as above, knowing that the Galilean transformation violates Einstein's postulates, and that since time is not absolute then time and space are mixed into a space-time, you introduce an unknown factor $\gamma$, and this gets you to equations $\ref{eq4},\ref{eq5},\ref{eq6}$. Then, following the lead in 3 spatial dimensions, you can postulate that the space-time invariance is given by $$\Delta s^2 = \Delta x^2 + a^2(c\Delta t)^2\nonumber$$ where $a^2$ is unknown, and we are using $s$ for the 4-dimensional invariant, and $r$ for the 3-dimensional length invariant.

Given $\Delta s^2$ is invariant, if you use the transformation equations $\ref{eq4pp}-\ref{eq6pp}$ and substitute for $\Delta x$ and $c\Delta t$ you would get $$\begin{align} \Delta s^2 &= \Delta x^2 + a^2(c\Delta t)^2\nonumber\\ &=[\gamma(\Delta x' + c\beta\Delta t')]^2 + a^2[\gamma(c\Delta t' + \beta\Delta x')]^2\nonumber\\ &=\gamma^2(\Delta x'^2 + 2c\beta\Delta x'\Delta t' + c^2\beta^2\Delta t'^2) + a^2\gamma^2(c^2\Delta t'^2 + 2c\beta\Delta x'\Delta t' + \beta^2\Delta x'^2)\nonumber\\ &=\gamma^2\Delta x'^2(1+a^2\beta^2) + \gamma^2c^2\Delta t'^2(\beta^2 + a^2) + \gamma^2(1+a^2)(2c\beta\Delta x'\Delta t')\nonumber\\ \end{align}$$ If $\Delta s^2 =\Delta x^2 + a^2c^2\Delta t^2$ is invariant, then clearly the last term (linear in $\Delta x'\Delta t'$) has to vanish, which gives you $a^2=-1$, which says that your space-time invariant is $\Delta s^2=\Delta x^2 - c^2\Delta t^2$. Substituting that gives: $$\Delta s^2 = \Delta x^2 - c^2\Delta t^2 = \gamma^2\Delta x'^2(1-\beta^2) + \gamma^2c^2\Delta t'^2(\beta^2 - 1)\nonumber$$ Now if you set $\gamma^2(1-\beta^2)=1$, you recover equation $\ref{eqgamma}$, and your guess as to the invariance holds: $$\Delta s^2 = \Delta x^2 - (c\Delta t)^2 = \Delta x'^2 - (c\Delta t')^2\nonumber$$ Note that this invariance can also be written as $$\Delta s^2 = (c\Delta t)^2 - \Delta x^2 = (c\Delta t')^2 - \Delta x'^2\nonumber$$ which differs from the version just before it only by a minus sign. But that's ok, because if $\Delta r^2$ is invariant, then so is $-\Delta s^2$ because of course $-1$ is the same in all reference frames! You will see both of these definititions of the space-time invariant, and of course they are equivalent. Many prefer the latter definition, only because it's straight forward to consider $\Delta r^2$ to be a positive number, and since the speed of light $c$ is so large, the term $c\Delta t$ is for most purposes greater than the term $\Delta x$. But whatever version you use is not all that important as long as you are consistent.

To be completely explicit, the 4-dimensional space-time invariant is written as: $$\Delta s^2=c^2\Delta t^2-\Delta x^2-\Delta y^2-\Delta z^2\label{eq9}$$ (we added $\Delta z^2$ since in fact there are 3 spatial dimensions!)

Since we are now forced to think in 4 dimensions (space-time), we need a new notation for a 4-dimensional vector, analgous to the 3-dimension $\vec r = x\hat i + y\hat j + z\hat k$, or in even more compact form, $\vec r = \sum_{i=0}^3 r_i \hat i$ whre $r_0 = x, r_1=y, r_2=z$ and the unit vectors are given by $\hat i$. An even more compact form is to say that $r=(r_0,r_1,r_2)$, or even better, $r_i$ where it's understood that $i$ goes from $0$ to $1$. We can extend this easily to 4 dimension, and define: $$x^\mu = (x^0,x^1,x^2,x^3)\label{4vec}$$ where $x^0 = ct, x^2=x, x^2=y, x^3=z$. One can think of the 4-vector $x^\mu$ as having a time component $x^0=ct$ and a spacial component consisting of the 3 directions $x$, $y$, $z$ that constitute the components of the space vector $\vec r = x\hat i + y\hat j + z\hat k$. This formulation of the 4-vector will become very useful, below.

Space-like and Time-like

Now for some terminology. If $\Delta s^2 \gt 0$, then the two space-time events $1$ and $2$ are said to be "time-like separated". Here's why: event $1$ has coordinates $x_1^\mu=(ct_1,x_1,y_1,z_1)$ and etc for event $2$. These two sets of coordinates are related by the Lorentz transformation, which means that we can have event $1$ be in a frame $O$, event $2$ in frame $O'$, and we can always set up these coordinates so that frame $O'$ is moving with respect to frame $O$ along some direction (say the $x$ direction). Since $\Delta s^2$ is invariant, then $\Delta s^2 = c^2\Delta t^2 - \Delta x^2 = c^2\Delta t'^2 - \Delta x'^2 \gt 0$ which means that $\Delta x/\Delta t\lt c$ and $\Delta x'/\Delta t'\lt c$, which means if you were in say $O$, you could boost into frame $O'$ because the velocity you would need to acquire would be $\Delta x/\Delta t\lt c$. And vice versa for if you start in $O'$ and want to boost into $O$.

But what if $\Delta s^2 \lt 0$? If the relative velocity is along the $x$ direction, then the condition $\Delta s^2 = c^2\Delta t^2 - \Delta x^2 = c^2\Delta t'^2 - \Delta x'^2 \lt 0$ means that $\Delta x^2\gt c^2\Delta t^2$ and $\Delta x'^2\gt c^2\Delta t'^2$. Which means that $\Delta x/\Delta t\gt c$ and $\Delta x'/\Delta t'\gt c$, which violates Einstein's postulates. In other words, if you have two events that are space-like separated, and you are in the rest frame of one of the events, it will be impossible for you to boost into the rest frame of the other, because to do so you would have to travel faster than light.

This can be seen clearly if we imagine space-time with 2 dimensions: time $t$; and the spacial dimension $x$ along the direction of motion). And ignore the other 2 "transverse" directions $y$ and $z$. If we were to plot $ct$ along the vertical and $x$ along the horizontal of a 2-d graph, it would look like this:

The blue diagonal lines show the path in space-time for light: $\Delta x=c\Delta t$, or $\beta=1$. Time-like regions are shown in light blue, and these are the regions where $c^2(\Delta t)^2\gt \Delta x^2$. If this diagram described your particular frame, where you were centered at $x=0$, all possible points in your past are in the time-like region below $ct=0$ and all possible points in your future are above. In the space-like region (outside of the time-like), you have no hope of getting to any space-time point unless you could either move faster than light (which you cannot) or you get there by some other method (which is, at the moment, science fiction!).

Proper Time and Invariants

In your rest frame, the proper time is the time. Picturing the space-time figure above, if you are not moving in that frame, then you would be following a vertical trajectory, as time changes. Any two space-time events that occur at the same place (like where you are) in that frame and at different times will have the invariant $$\Delta s^2 = c^2\Delta t^2 - \Delta x^2 = c^2\Delta t^2 = c^2\Delta \tau^2\nonumber$$ where as before, $\tau$ is the proper time and $\tau = t$. So in the proper frame, the invariant is the proper time $\tau$ (multiplied by $c$). But if it's invariant, that means it's the same in all rest frames. This is a very important concept in relativity, both special and general.

Brief aside on $c$

As discussed, and seen in equation (\ref{eq8}) (among others), space and time are unified into a thing called space-time. But space and time each have different dimensional units: meters and seconds (or take your pick). The key thing here is that the speed of light, $c$, actually does the unifying! One can think of $c$ as telling you what your choice for the unit of space means to the unit of time. So you can choose for instance the unit of space to be meters, which at the end of the 18th century was set by fiat to be 1/10,000,000th the distance between the north and south pole. Now, you can choose the unit of time to be 1 second, and that's what humanity did (at least that's what it did up until very recently), defining the second in terms of basic units of time like the length of a day. In fact up until the 1960s, the second was defined such that 86,400 of them so that 3600 of them make an hour and 24 of those make a day. Once you have the meter and second set to some unit scale, the speed of light is automatically determined: it's telling you how many of the meters light will travel in a vacuum in some number of seconds: it's very very close to 300,000,000 in fact!

But this is not very accurate! The distance between the poles is hard to define since the actual location of the poles can change, and the shape of the earth can change, all due to gravitational and geophysical effects. And the length of a day is something that can also change from some of the same effects as above. An alternative approach, and in fact the approach used today, is to first accurately define the unit of time (aka second) to be the duration of 9192631770 periods of the radiation corresponding to the transition between the two hyperfine levels of the ground state of the caesium 133 atom (see http://www.bipm.org/en/publications/si-brochure/second.html).

Once you have the second well defined, and the above definition is exceedingly accurately determined using atomic clocks (gadgets that are accurate to 1 second in 100,000,000 years), then we need to define the meter. Here's what we do now: we decide that the speed of light is exactly 299,792,458 meters/sec, so the meter is defined by definition as the length that a beam of light travels in 1 second. But for most earthly purposes, $c=3\!\times\!10^8$m/s is quite good enough.

It's useful to have a feel for the speed of light in other units. One of the most useful to physicists is $c=.3$m/nsec, or an even more useful value is $c\sim 1$foot/nsec (1 nsec is 1 billionth, or $10^{-9}$, of a second). That's pretty approximate, but it's very useful if you have to deal with electronics since signals propagate at around $\half c$ to $\twothirds c$, or at around $\half$ to $\twothirds$ of a foot per nsec.

Another useful way of quantifying $c$ has to do with electromagnetic waves, where we know that the frequency $\nu$ and wavelength $\lambda$ are related by $c=\lambda\nu$. Rewriting $c=0.3$m/ns as $0.3$m$\times\!10^{-9}$/sec is equivalent to $c\sim 1$foot$\times\!1$GHz. This is very useful in any field where we have to convert from wavelength to frequency fast. For instance, the average FM signal is around 100MHz=0.1GHz, so that tells you the wavelength is around 10ft (wavelength in feet times frequency in GHz have to come to $\sim 1$ when multiplied together), whereas the average AM signal is around 1MHz=0.001GHz, requiring a 1000ft wavelength. This is why you can get FM signals in cities and under bridge overpasses on highways, but not AM signals - the FM signals will "fit" whereas the AM signals have a harder time (this has to do with diffraction but that's another story).

Time Dilation

The Lorentz equations $\ref{eq4pp}-\ref{eq6pp}$ relate time and space intervals $\Delta x$ and $\Delta t$ in frame $O$ to the time and space intervals $\Delta x'$ and $\Delta t'$ in frame a moving frame $O'$ where $O'$ is moving with velocity $v$ relative to $O$. In the section above, the simulation illustrated how time intervals $\Delta t$ in $O$ and $\Delta t'$ in $O'$ cannot have the same values if Einstein's postulates hold (especially number 2), and that the relationship can be extracted using the Lorentz equations. Here's how you do it: the key thing here is to be clear on which frame is which. To discuss the relativity of simultaneity, dream up 2 "events" that happen at the same time in, for instance, frame $O'$. Let's say that $O'$ is a vehicle moving with velocity $v$ relative to the earth, which would be frame $O$. In $O'$, the events happen at the same time but in a different location in $O'$, therefore $\Delta t'=0$ and $\Delta x'\ne 0$. If we want to know the time interval $\Delta t$ in $O$, we use the Lorentz equations $\ref{eq4pp}-\ref{eq6pp}$, reproduced here: $$\begin{align} \Delta x & = \gamma (\Delta x'+\beta c\Delta t) \nonumber \\ \Delta y & = \Delta y' \nonumber \\ c\Delta t & = \gamma (c\Delta t'+\beta \Delta x') \nonumber \\ \end{align}$$ The 3rd equation ($\ref{eq6pp}$) is especially useful since $\Delta x'=0$, which gives us $$\Delta t = \gamma\beta\Delta x'$$ This equation tells us that if you have two events that happen simultaneously in $O'$ ($\Delta t'=0$) but happen at two different locations such that $\Delta x'\ne 0$, then these two events will happen at 2 different space-time points in $O$, and in particular, the time interval $\Delta t$ in $O$ is proportional to (generated by) the distance $\Delta x'$ between events in $O'$.

Now lets see what happens when the space-time events in $O'$ happen at the same location there ($\Delta x'=0$) but at different times such that $\Delta t'\ne 0$. Again using the 3rd equation above (equation $\ref{eq6pp}$), we set $\Delta x'=0$ and get $$\Delta t = \gamma\Delta t'$$ This is the remarkable effect known as "time dilation". It postulates that the time between events in the proper frame (the frame where $\Delta x'=0$) will be seen as "dilated" in any other frame moving at some relative velocity, and that that time $\Delta t$ will be greater. It also says that the time interval in the proper frame will be the smallest, that the interval in all other frames will be larger. If we define the "proper time" as $\tau$, which we often do since it is a special time coordinate, then a useful equation is: $$\Delta t = \gamma\Delta\tau\label{eq10}$$ Given modern technology it's quite easy to test time dilation. Start with 2 atomic clocks that are synchronized. These clocks are amazingly precise, the latest ones (2020) are precise to 1 second every 100 million years, which comes to a precision of 10ns per year. One clock was sent on an airplane ride, the other stayed on earth. After the ride, the clocks were out of sync to a degree much greater than any random change. Note that there is also an effect due to gravity (more on that later), which was also taken into account. Anyway there have been numerous tests such as these, all confirming special relativity. See the following Wikipedia article on the Hafele-Keating experiment for more.

Time dilation is indeed counter intuitive, and not simply an artifact of mathematics.

Lorentz Contraction

We discovered the concept of time dilation by considering a process where two events (light bouncing off of a mirror) occur in $O'$ at the same location in that frame, which means $\Delta x'=0$, but at different times $\Delta t'$. In $O$, both events happened at different times $\Delta t$ and in different locations $\Delta x$.

Now we want to investigate an analogous situation where this time, the two events occur at the same time (simultaneously) in one frame, and compare the spatial intervals between the two frames to see the effect. This is exactly what you do when you are in frame $O$, and frame $O'$ is moving past you, and there's an object in $O'$ that has length $L_0$ that you want to measure. Since the object is at rest in $O'$, then $O'$ is the proper frame and the object has a proper length which we call $L_0$. What you want to know is what length would you measure (would you experience) in your frame, $O$?

What does it mean for you to measure the length of an object that is moving? It means that as it goes by, you mark the location of the endpoints simultaneously in your frame, and that tells you the length. But now we run up against the relativity of simultaneity - you can measure the ends simultaneous in your frame, but in $O'$, well they won't agree.

We start with equations $\ref{eq4pp}-\ref{eq6pp}$, which relate how to find space-time intervals in $O$ given intervals in $O'$. Since velocity is relative, we can get the equations that tell us how to find intervals in $O'$ given intervals in $O$ by swaping $x',y',z',t'$ with $x,y,z,t$ and setting $\beta\to -\beta$ to get: $$\begin{align} \Delta x' &=\gamma(\Delta x-\beta c\Delta t)\label{eq4ppp}\\ \Delta y'&=\Delta y\label{eq5ppp}\\ \Delta z'&=\Delta z\label{eq6ppp}\\ c\Delta t'&=\gamma(c\Delta t-\beta \Delta x)\label{eq7ppp}\\ \end{align}$$ (The equation for $y'$ and $z'$ are the same since both of those axes are perpendicular to the direction of motion, which is along $x'$).

The equation we would need now would be the 1st of the 3 above, equation $\ref{eq4ppp}$, which relates the distance interval $\Delta x$ in $O$ to the distance and time intervals $\Delta t$ and $\Delta x$ in $O$. We do that because we are trying to do the measurement in frame $O$ ($O'$ is the proper frame), and to measure the length of anything we measure the endpoints at the same time. So here we have $\Delta t=0$, which using equation $\ref{eq4ppp}$ gives us the equation $$\Delta x' = \gamma\Delta x\label{eq11}$$ Since $\Delta x'=L_0$ the proper length, this means that the length L as measured in any frame that is moving with respect to the proper frame will be smaller than the proper length $L_0$ by an amount $\gamma$: $$L = L_0/\gamma\nonumber$$ and this is the famous "Lorentz contraction".

To get an intuitive understanding, the next simulation shows things from the point of view of the person in $O$ (red):

The important thing here is that the events that are simultaneous take place in a frame that is not the proper frame of the object. The relativity of simultaneity will guarantee that when the person in $O$ measures the length (of the object in $O'$) at times in $O$ such that $\Delta t=0$, the person in $O'$ will not agree that $O$ measured both ends at the same time, and would claim that $\Delta t'\ne 0$. Hence they will not agree on the answer, because of the relative motion.

The added complication: the person in $O'$ has the object ($O'$ is the proper frame) but the ruler that is used to measure the length is in $O$. Imagine that the ruler in $O$ is 1 foot long, and that before $O'$ started moving, they both had the same ruler so they agreed on the length. Now, as $O'$ is moving and $O$ tries to measure the length of the object, from the point of view of someone in $O'$, not only do they say that the person in $O$ did not measure the ends of the object at the same time, they used a ruler that isn't even 1 foot long anymore because of the Lorentz contraction!

In the following, the simulation attemps to show what the person in $O'$ would see going on in $O$ and why that $O'$ person would say that the measurement was not "proper". Note that in the simulation, you will notice that the red ruler (ruler in $O$) is smaller than the red ruler in the simulation above. That is because the above simulation is what it would look like from inside frame $O$, whereas the situation below is what it would like like in frame $O'$. The ruler in $O$ as seen by the observer in $O'$ will be contracted.

Who is right? They are both right, or maybe it's more accurate to say that neither of them are wrong. Anwyay, we will just have to get used to the relativity of simultaneity, and the concept of space-time with space and time coordinates mixed together. And when we do, we have the following new and amazing effects:

Time Dilation			$\Delta t=\gamma\cdot\Delta t_{proper}$
Lorentz Contraction			$\Delta x=\Delta x_{proper}/\gamma$

Or in words, time intervals are shortest, and space intervals are largest, in the proper frame relative to any other reference frame.

Is this really true or is this an artifact of some mathematics? As usual, we need to ground these amazing concepts in the reality of experiment, and a great example of how time dilation and Lorentz contraction are manifest is in the case of the muon. This is a particle created when a high energy cosmic ray (mostly either a photon or a proton) hits the upper atmosphere and creates a shower of particles, including muons. The lifetime of this particle has been measured experimentally to be around $\tau = 2\mu$sec. The mass of the muon is pretty small compared to the energy of the cosmic ray, so muons created in shower (especially in the early part of the shower) will have velocities near the speed of light. Any particle that is traveling at $v=c$ should traverse an average distance $d=v\tau$ before decaying. For a $2\mu$sec lifetime, that means the muon should decay after around $2\times 10^{-6}s \cdot 3\times 10^8m/s = 600m$. Since the thickness of the earth's atmosphere is way more than 600m, one would not expect to see any muons by an experiment at the surface. Yet in fact, they are quite plentiful, and if you hold you hand out palm up, there will be around 1 muon per second going through it. To understand this, we need special relativity, and we can use either time dilation or Lorentz contraction to do so.

Using time dilation....note that when we say that the muon decays in $2\mu s$, we are saying something about what happens in the rest frame of the muon (it's own proper frame). If the muon is moving at a velocity $\beta=0.999$, it will have a value of $\gamma=22$. In the rest frame of the earth, it's lifetime will then be time dilated to be $\Delta t = 22\times \tau=44\mu$sec, and will travel a distance of $22\times 600m=13.2km$. That muon will start where the cosmic ray interacts with the atmosphere, and that will be where the atmosphere becomes thick enough (dense enough) to cause an interaction, and most of the mass of the atmosphere is below $13.2km$ ($95\%$ is below $20km$ and around $85\%$ is below $13.2km$). So from the point of view of someone standing on the ground, a clock starting when the muon is born and stopping when it decays will see a time dilation long enough for many muons to make it to the surface (in subsequent sections here we will discuss the relativistic energy and momentum).

Using Lorentz contraction....in the muon's rest frame, it sees the earth's surface rushing up at it. If the muon were to start $13.2km$ up at a velocity of $\beta=0.999$, it would "measure" the length of the atmosphere that it goes through to Lorentz contracted by an amount $\gamma=22$, or a thickness of $13.2/22=600m$. And that means it has a good chance of making it to the surface.

Another very interesting manifestation of Lorentz contraction concerns magnetic fields due to currents in wires, and the force on a moving test charge. To see this in the context of relativity, keep in mind that metallic wires are electrically neutral to great accuracy. A current in a wire consists of negative conductive electrons (around 1 per atom in most metals) moving along the wire. Imagine the situation where the test charge is positive, moving parallel to the wire in the same direction as the electrons, and consider the Lorentz contraction of the spacing between the electrons in the wire, and the spacing between the positive ions in the wire. The test charge and the electrons are in the same reference frame, but from the point of view of the test charge, the positive ions are moving in the opposite direction. Therefore the spacing between the positive ions is Lorentz contracted, which causes a higher positive charge density (linear density) than the negative linear density. Thus a positive force on the positive test charge and it's deflected away from the wire. If you work out the right hand rules, you will find that this is exactly what a vxB force would do - the Lorentz contraction is surely real!

Lorentz Transformations

Special relativity forces us to think in 4 dimensions. As we have seen above, we can define the rest frame $O$ using the 4-vector $x^\mu$ where $\mu$ is the index that runs from 0 to 3, with 0 being the time component $x^0=ct$ and $1,2,3$ being the 3 special components of the vector $\vec r=x\hat i+y\hat j+z\hat k$. Equations $\ref{eq4}-\ref{eq6}$ tells us how to transform from the coordinates $x,y,z,ct$ in $O$ to the coordinates $x',y',z',ct'$ in frame $O'$ which moves with a velocity $\beta=v/c$ along the $x$ direction. This transformation is called the "Lorentz transformation": $$\begin{align} x & = \gamma(x'+\beta ct')\nonumber \\ ct & = \gamma(ct'+\beta x')\nonumber \\ y & = y' \nonumber\\ z & = z' \nonumber \\ \end{align}\nonumber$$ More generally, using the 4-vector notation $x^\mu$, we can write the Lorentz transformation from coordinates $x^\mu=(ct,x,y,z)$ in $O$ to the coordinates $x'^\mu=(ct',x',y',z')$ in frame $O'$ moving with velocity $\beta=v/c$ along the $x^1$ spatial direction as $$\begin{align} x^0 & = \gamma(x'^0+\beta x'^1)\label{ltx0} \\ x^1 & = \gamma(x'^1+\beta x'^0)\label{ltx1} \\ x^2 & = x'^2 \label{ltx2}\\ x^3 & = x'^3 \label{ltx3}\\ \end{align}$$ Since $x^\mu$ is a 4-dimensional vector, it can be represented as a matrix with 1 row and 4 columns: $$x^\mu = \begin{pmatrix} ct & x & y & z\end{pmatrix}\label{xrow}$$ We can then write the invariant in 4-vector notation using matrices. However there's a hitch. If $x^\mu$ is a 1x4 matrix, then $x^\mu x^\mu$ would be trying to multiply two 1x4 matrices together and that won't give you a scalar quantity, which is necessary since the invariant is a scalar. So we would need to transpose $x^\mu$ from a 1x4 into a 4x1 (4 rows, 1 column) to make it work. It is easy to form the column matrix, but now we have a notation problem, since we can't have $x^\mu$ represent both a row anda column 4-vector. So we introduce the new notation $x_\mu$ to be the column vector, which gives us: $$x_\mu = \begin{pmatrix} ct \\ x \\ y \\ z\end{pmatrix}\nonumber$$ That way, the quantity $\sum_{\mu=0}^3 x^\mu x_\mu$ would describe multiplying a 1x4 by a 4x1 to get a 1x1 scalar. However, there's another hitch: the 4-vector invariant $s^2=(ct)^2-|\vec r|^2$, so in order to be able to write $s^2 = \sum_{\mu=0}^3 x^\mu x_\mu$, we would have to have the row vector $x_\mu$ defined as $$x_\mu = \begin{pmatrix} ct \\ -x \\ -y \\ -z\end{pmatrix}\label{xcolumn}$$ Note: it is traditional to use the following notation: whenever you see 2 vectors multiplied together where there's an "upper" (row vector) and "lower" (column vector) index repeated (as in $x^\mu x_\mu$), it is assumed that this means a sum over $\mu=0,1,2,3$.

Another notation that is very useful is to introduce a 4x4 matrix (aka tensor) that can transform $x^\mu$ into $x_\mu$. We call this tensor the "Minkowski metric" for reasons that are not important now, and we use the symbol $\etamunu$ where $\mu$ and $\nu$ are 2 indices, both of which of course run from 0 to 3. If you define $\etamunu$ as: $$\etamunu = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & -1 & 0 & 0 \\ 0 & 0 & -1 & 0 \\ 0 & 0 & 0 & -1 \\ \end{pmatrix}\label{minkow}$$ then it's easy to see that $x^\mu\etamunu = x_\nu$. Note that careful indice gymnastic: $x^\mu$ times $\etamunu$ implies a sum over the index $\mu$ to get each index $\nu$ of the resulting 4x1 vector $x_\nu$. You can think of the Minkowski metric as a gadget that takes a vector with an upper index to one with a lower. In the world of mathematics, a vector with an upper index is often referred to as a "contravariant" form and a lower as a "covariant" form.

This allows us to write the invariant $s^2$ as $$s^2 = x^\mu x_\mu\label{s2}$$ where we use the implicit notation that the sum is over the index $\mu$.

What does "contravariant" and "covariant" mean? To simplify, imagine that we have a coordinate system with axes labeled $\alpha$ and $\beta$ that are not necessarily perpendicular. Then, we specify a point in this weird plane as having some location a distance $r$ from the origin, as in the following figure:

There are two ways to specify the location of the point, and these corresdpond to contravariant and covariant forms. The covariant way would be to drop a perpendicular from the point to each axis, and measure the distance from each axis as $x$ and $y$ in the following figure:

The contravariant way would be to draw a line parallel to each axis from the point to the other axis, and label those distances $x'$ and $y'$

The following figure shows both.

The distance $r$ should be invariant. That is, it's the same whether we measure it with either method. To show that $r$ is invariant, we start with the 2 triangles that has lengths perpendicular to each axis to get: $$\begin{align} r^2 & = a^2 + y^2\label{ecc1} \\ r^2 & = b^2 + x^2\label{ecc2} \\ \end{align}\nonumber$$ We also have 2 other right triangles using the blue lines, and these give us: $$\begin{align} x'^2 & = a^2 + (y-y')^2\label{ecc3} \\ y'^2 & = b^2 + (x-x')^2\label{ecc4} \\ \end{align}\nonumber$$ If we substitute equation $\ref{ecc1}$ into $\ref{ecc3}$ to eliminate $a^2$, and $\ref{ecc2}$ into $\ref{ecc4}$ to eliminate $b^2$, we get the following 2 equations: $$\begin{align} r^2 & = (x-x')^2 - (y-y')^2 + 2yy'\label{ecc5} \\ r^2 & = -(x-x')^2 + (y-y')^2 + 2xx'\label{ecc6} \\ \end{align}\nonumber$$ Combining equations $\ref{ecc5}$ and $\ref{ecc6}$ yields $$r^2 = xx' + yy'\nonumber$$ which shows how the contrvariant and covariant forms complement each other to give you the invariant quantity, which should be independent of whatever choice you make for $\alpha$ and $\beta$.

This example gives us a more physical feeling for what contravariant and covariant mean. The original meaning comes from the mid 19th century, and has to do with how coordinates transform when you transform the axes. So a coordinate that is "covariant" changes linearly with a transformation. For instance, imagine that you rotated the $\beta$ axis so that the perpendicular distance to the red point reduced. In the covariant representation, where you have coordinates that measure perpendicular distance from the point to the axes, the value for the $y$ coordinate will get larger as the angle decreases, so it is as if it scales linearly with some value: $y\to y\times g$ where $g$ is some scale factor. If you use the contravariant form, with coordinates derived from measuring the distance along the line parallel to each axis, as you make the angle smaller you will make the distances smaller, so its as if you are dividing the coordinate by a number: $y\to y/g$. Contravariant means "opposite of covariant", but in special (and general) relativity, what you really need to know about are matrices and vectors, rows and columns, and upper and lower indices: covariant vectors have lower indices and are column vectors, contravariant vectors have upper indices and are row vectors.

Now we can write the Lorentz transformation in matrix form using equations $\ref{ltx0}-\ref{ltx3}$: $$x^\mu = \Lambda^\mu_\nu x'^\nu\label{elorentz}$$ Here again we sum over index $\nu$, and so $\Lambda$ has to have both an upper and a lower index that gets summed over. The specific form of the Lorentz transformation $\lambdamunu$ that gives us equations $\ref{ltx0}-\ref{ltx3}$ is: $$ \begin{pmatrix} ct & x & y & z\\ \end{pmatrix} = \begin{pmatrix} ct' & x' & y' & z'\\ \end{pmatrix} \begin{pmatrix} \gamma & \gamma\beta & 0 & 0 \\ \gamma\beta & \gamma & 0 & 0 \\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1\\ \end{pmatrix} \label{lammunu}$$

Addition of Lorentz Transformations

Equations $\ref{ltx0}-\ref{ltx3}$ shows the Lorentz transformation equations needed to calculate the coordinates in the frames $O$ and $O'$ given a relative velocity $\beta$. For this section, let's change notation to make things easier, and say that frame $O_1$ is moving with velocity $\beta_1$ in frame $O$. The equations that relate coordinates in $O$ to coordinates in $O_1$ are: $$\begin{align} x & = \gamma_1(x_1+\beta_1 ct_1)\nonumber \\ ct & = \gamma_1(ct_1+\beta_1 x_1)\nonumber \\ y & = y_1 \nonumber\\ z & = z_1 \nonumber \\ \end{align}\nonumber$$ where $\gamma_1=1/\sqrt{1-\beta_1^2}$. You can also use those equations to calculate the coordinates in a frame $O_2$ that has a velocity $\beta_{12}$ respect to the frame $O_1$: $$\begin{align} x_1 & = \gamma_{12}(x_2+\beta_{12} ct_2)\nonumber \\ ct_1 & = \gamma_{12}(ct_2+\beta_{12} x_2)\nonumber \\ y_1 & = y_2 \nonumber\\ z_1 & = z_2 \nonumber \\ \end{align}\nonumber$$ where again $\gamma_{12}=1/\sqrt{1-\beta_{12}^2}$. The question is, what is the transformation that takes you from frame $O$ to frame $O_2$ directly? Let's say frame $O_2$ moves with velocity $\beta_2$ relative to $O$. Then the transformation would have to be: $$\begin{align} x & = \gamma_2(x_2+\beta ct_2)\label{ex2} \\ ct & = \gamma_2(ct_2+\beta x_2)\label{et2} \\ y & = y_2 \nonumber\\ z & = z_2 \nonumber \\ \end{align}\nonumber$$ But we should also be able to get there by substituting the questions for $O_1\to O_2$ into $O\to O_1$. (We will leave the equations for $y$ and $z$, since it's obvious that $y=y_2$ and $z=z_2$.) This gives us: $$\begin{align} x & = \gamma_1(x_1+\beta_1 ct_1) \nonumber \\ & = \gamma_1[\gamma_{12}(x_2+\beta_{12} ct_2)+\beta_1\gamma_{12}(ct_2+\beta_{12} x_2)] \nonumber \\ & = \gamma_1\gamma_{12}(1+\beta_1\beta_{12})x_2 + \gamma_1\gamma_{12}(\beta_1+\beta_{12})ct_2 \nonumber \\ & = \gamma_1\gamma_{12}(1+\beta_1\beta_{12})(x_2 + \frac{\beta_1+\beta_{12}}{1+\beta_1\beta_{12}}ct_2)\label{ex12}\\ ct & = \gamma_1(ct_1+\beta_1 x_1) \nonumber \\ & = \gamma_1[\gamma_{12}(ct_2+\beta_{12} x_2)+\beta_1\gamma_{12}(x_2+\beta_{12} ct_2)] \nonumber \\ & = \gamma_1\gamma_{12}(1+\beta_1\beta_{12})ct_2 + \gamma_1\gamma_{12}(\beta_1+\beta_{12})x_2\nonumber \\ & = \gamma_1\gamma_{12}(1+\beta_1\beta_{12})(ct_2 + \frac{\beta_1+\beta_{12}}{1+\beta_1\beta_{12}}x_2)\label{et12} \end{align}\nonumber$$ Equating equation $\ref{ex2}$ with $\ref{ex12}$ (or equation $\ref{et2}$ with $\ref{et12}$) gives us $$\gamma_2 = \gamma_1\gamma_{12}(1+\beta_1\beta_{12})\label{egg}$$ $$\beta_2 = \frac{\beta_1+\beta_{12}}{1+\beta_1\beta_{12}}\label{ebb}$$ Equation $\ref{ebb}$ is the same equation for the addition of velocites as what we derived for equation $\ref{eaddbeta}$ above. Equation $\ref{egg}$ can be rewritten as $\gamma_2 = \gamma_1\gamma_{12} + \gamma_1\gamma_{12}\beta_1\beta_{12}$.

Equation $\ref{ebb}$ has a form similar to something familar in the equations for hyperbolic functions: $$\sinh\theta = \frac{e^\theta-e^{-\theta}}{2}\nonumber$$ $$\cosh\theta = \frac{e^\theta+e^{-\theta}}{2}\nonumber$$ $$\tanh\theta = \frac{\sinh\theta}{\cosh\theta}\nonumber$$ The equations for the hyperbolic sums of $\theta_1+\theta_2$ can be easily derived to be: $$\sinh(\theta_1\pm\theta_2) = \sinh\theta_1\cosh\theta_2\pm\cosh\theta_1\sinh\theta_2\nonumber$$ $$\cosh(\theta_1\pm\theta_2) = \cosh\theta_1\cosh\theta_2\pm\sinh\theta_1\sinh\theta_2\nonumber$$ $$\tanh(\theta_1\pm\theta_2) = \frac{\tanh\theta_1\pm\tanh\theta_2}{1\pm\tanh\theta_1\tanh\theta_2}\nonumber$$ So if we write $\beta_1\equiv\tanh\eta_1$, $\beta_2\equiv\tanh\eta_2$, and $\beta_{12}\equiv\tanh\eta_{12}$, then we have the simple formula $$\eta_2 = \eta_1 + \eta_{12}\nonumber$$ which looks like the Galilean equation for the addition of velocities. If $\beta\equiv\tanh\eta$, then $\gamma$ is given by $$\begin{align} \gamma & = \frac{1}{\sqrt{1-\beta^2}}\nonumber \\ & = \frac{1}{\sqrt{1-\tanh^2\eta}}\nonumber \\ & = \frac{\cosh\eta}{\sqrt{\cosh^2\eta-\sinh^2\eta}}\nonumber \\ & = \cosh\eta \end{align}$$ The quantity $\gamma_2$ is given by $$\begin{align} \gamma_2 =\cosh\eta_2 & = \gamma_1\gamma_{12}(1+\beta_1\beta_{12})\nonumber \\ &= \cosh\eta\cosh\eta_{12}(1+\tanh\eta_1\tanh\eta_2)\nonumber \\ &= \cosh\eta\cosh\eta_{12} + \sinh\eta\sinh\eta_{12}\nonumber\\ &= \cosh(\eta_1+\eta_{12})\end{align}$$ which again says $\eta_2 = \eta_1 + \eta_{12}$.

The quantity $\eta$ is known as "rapidity", and becomes quite useful in the world of particle physics.

We can use $\beta=\tanh\eta$, $\gamma=\cosh\eta$, and $\gamma\beta=\sinh\eta$ to write the Lorentz equations as $$\begin{align} x &= x'\cosh\eta + ct'\sinh\eta \nonumber\\ ct &=x'\sinh\eta + ct'\cosh\eta \nonumber\\ \end{align}$$ This looks very much like the transformation equations for a rotation in the xy plane, which kind of suggests that rapidity is related to the angle of "rotation" in a 4-dimensional space, and in this space, rapidities add linearly. What is also fun is to make the substitution $\eta=i\theta$ where $i$ is the imaginary number $\sqrt{-1}$. Remembering that we can write the trig functions as $$\cos\theta = \frac{e^{i\theta}+e^{-i\theta}}{2}\nonumber$$ $$\sin\theta = \frac{e^{i\theta}-e^{-i\theta}}{2i}\nonumber$$ we then have $$\cosh\eta=\frac{e^{i\theta}+e^{-i\theta}}{2}=\cos\theta\nonumber$$ $$\sinh\eta=\frac{e^{i\theta}-e^{-i\theta}}{2}=-i\sin\theta\nonumber$$ This gives us the complex Lorentz equations: $$\begin{align} x &= &+&x'\cos\theta &-& ct'i\sin\theta \nonumber\\ ct &= &-&x'i\sin\theta &+& ct'\cos\theta \nonumber\\ \end{align}$$ This is an elegant way to talk about how the Lorentz equations are a rotation in a complex 4-dimensional space, however it is not all that useful in particle physics when it comes to actually making measurements (more on that below). But it is beautiful!

Relativistic Doppler

The doppler effect is a well known phenomena that describes the effect of a moving source of any kind of waves, including electromagnetic. The effect tells us what the observer would measure for the frequency of such a source. To derive the true relativistically correct formula that relates the frequency that is observed to the frequency emitted by the source, we first consider the wave in the frame of the emitter. In that frame, the wave propogates away from the source at the speed of light, with a frequency $f$ and wavelength $\lambda$, related through the expression $c=f\lambda$. The period of the wave is $T$, defined as the time between repetitions. In the figure below, you will see the situation where frame $O'$ is moving with velocity $\beta$ with respect to the red stick figure in frame $O$ on the bottom, and where $O'$ is not moving with respect to $O$ on the top. The waves are emitted at the same time in $O'$, top and bottom, and travel to the left with velocity $c$. (Things are exaggerated in the simulation, the velocities are not to scale!). The red figure on the top will see the waves arrive between times $\Delta t_1$ on the top, and $\Delta t_2$ on the bottom, where $\Delta t_2\gt\Delta t_1$.

In frame $O'$, the period $T'$ of the wave is the time between pulses, or $T'\equiv\Delta t'$. On the bottom, where $O'$ has a velocity to the left, these pulses are emitted as the blue figure travels to the right, at various positions $x'$. On the top simulation, where $O'$ is not moving, the distance between pulses is given by $\Delta x'_1=c\Delta t'$. However on the bottom, the distance between pulses is larger than on the top because of the velocity of frame $O'$. It will move a distance $\beta c\Delta t'$ between sending out pulses, so the overall distance between pulses will be the sum $\Delta x'_2=c\Delta t_2'+c\beta\Delta t'_2$.

In frame $O$, on the bottom, the distance between pulses is Lorentz contracted by an amount $\gamma$, so $$\begin{align} \Delta x & = \frac{c\Delta t_2'+c\beta\Delta t'_2}{\gamma} \nonumber \\ & = c\Delta t'_2\frac{1+\beta}{\gamma}\nonumber \\ \end{align}\nonumber$$ $\Delta x $ is the wavelength $\lambda$ as measured in frame $O$, and $\lambda = c/f$. In frame $O'$, $\Delta t'_2 = 1/f'$, so we can write $$\frac{c}{f} = \frac{1}{f}\frac{1+\beta}{\gamma}\nonumber$$ Finally, using the fact that we can write $1-\beta^2=(1-\beta)(1+\beta)$, and inverting the above equation, we have $$f_- = f'\sqrt{\frac{1-\beta}{1+\beta}}\label{edoppa}$$ where we denote the frequency in $O$ as $f_-$ because $O'$ is moving away from $O$, and that reduces the frequency as measured in $O$. If $O'$ was moving towards $O$, we just reverse the sign of $\beta$ to get $$f_+ = f'\sqrt{\frac{1+\beta}{1-\beta}}\label{edoppt}$$ If we expand $f_+$ in the low velocity limit $\beta \lt\lt 1$, we get $$\begin{align} f_+ & = f'\sqrt{\frac{1+\beta}{1-\beta}}\nonumber \\ & = f'(1+\beta)^{\half}(1-\beta)^{-\half}\nonumber\\ &\to f'(1+\half\beta)(1+\half\beta)\nonumber\\ & = f'(1+\half\beta)^2\nonumber\\ & = f'(1+\beta)\nonumber \end{align}\nonumber$$ which gives us what we would expect classically (see any decent undergraduate text in physics).

Relativistic Velocity

When we talk about the relative velocities of frames, we describe the velocity of frames (e.g. frame $O'$) moving in frame $O$. For instance, $O'$ can be a vehicle moving at 70mph in frame $O$. The velocity is given by: $$\vec v = \frac{d\vec r}{dt}\nonumber$$ where $\vec r$ and time $dt$ are as measured in frame $O$ by someone standing on the road watching the vehicle speed by. Now consider this from the perspective of someone in frame $O'$, moving in frame $O$, sitting in the vehicle. What that person might be interested in is their velocity in $O$, but as measured by their own clock. The velocity they are interested in is $$\vec u = \frac{d\vec r}{dt'}\nonumber$$ where $\vec r$ is still the distance that they travel in $O$, but now "how fast" is relative to the clock time $t'$ in frame $O'$. And since $O'$ is the proper frame (realtive to the the vehicle and people in it), then we can use the proper time $\tau$ which gives $$\vec u = \frac{d\vec r}{d\tau}\nonumber$$ Time dilation (equation $\ref{eq10}$) tell us that time intervals are minimum when measured in the proper frame, or $dt = \gamma d\tau$, which tell that $$\vec u = \gamma \frac{d\vec r}{dt'} = \gamma\vec v\nonumber$$ If we go back to the 4-dimensional definition $x^\mu = (ct,\vec v)$, then we recognize $\vec u$ as being the derivative of the spacial components of $x^\mu$ with respect to proper time. The equivalent derivative of the time component will be $$\frac {dx^0}{d\tau} = \frac{d(ct)}{d\tau} = c\gamma\nonumber$$ This completes the 4-vector for the relativistic velocity as $$u^\mu = (\gamma c,\gamma\vec v)\label{ev4}$$ To be a real 4-vector, however, we have to know what the invariant is such that for any transformation, we get the same answer for $$\begin{align} u^2 & =(u^0)^2 - (\vec u)^2 & \nonumber\\ &= \gamma^2 c^2 - \gamma^2 \vec v^2 \nonumber\\ &= c^2\gamma^2(1-\beta^2) \nonumber\\ &= c^2\nonumber \end{align}$$ which is consistent with Einstein's postulate that the speed of light is the same in all reference frames.

Energy and Momentum

What we have learned above is that space and time merge into space-time, that coordinates in reference frames that move at constant relative velocity are related through the Lorentz transformation, that true space-time coordinates are described by a 4-vector that has time and spacial components, and that 4-vectors have invariant "lengths" in space-time analogous to the invariant length of a 3-vector in space described by some coordinate system.

We can explore this a bit more, and it will take us somewhere great, starting with equation $\ref{eqgamma}$: $$\gamma = \frac{1}{\sqrt{1-\beta^2}}\nonumber$$ If we square and get rid of the fraction, we have the following equation: $$\gamma^2-\gamma^2\beta^2=1\nonumber$$ This is beginning to look like an invariant when compared to equation $\ref{eq9}$. In fact, if we multiply by $c^2$ and use the fact that $\beta c=v$ ($v$ is the relative velocity), and by $v$ we mean $\sqrt{v_x^2+v_y^2+v_z^2}$, we have the following equation: $$c^2 = (\gamma c)^2 - (\gamma v_x)^2- (\gamma v_y)^2- (\gamma v_y)^2\nonumber$$ This strongly hints that we can invent a 4-vector that has units of velocity: $$u^\mu = (\gamma c,\gamma \vec v)\nonumber$$ with an invariant $c^2$. But this is the relativistic velocity as defined in the preceeding section, which means what we have shown here just gives more justification to how the 4-velocity is a real 4-vector. But does it have physical significance? That is not so clear - what does it mean that a 4-vector invariant is the speed of light?

To see that significance more clearly, imagine that that 4-velocity describes a particle with mass $m$ moving with velocity $\beta$ along some direction. If we multiply the 4-velocity by the mass $m$, we have the following quantity: $$p^\mu \equiv m u^\mu = (\gamma mc,\gamma \vec p)\nonumber$$ where we are using $\vec p = m\vec v$. Actually, the way most people define the 4-momentum is the following: $$p^\mu \equiv (\gamma mc^2,\gamma c\vec p)\nonumber$$ If we write the time component as $$E\equiv p^0 = \gamma mc^2\label{eq20}$$ and the spatial component of momentum as $$p=m\gamma\beta c\label{eq21}$$ we would have the 4-vector $$p^\mu = (E,\vec pc)\label{e4p}$$ and the very useful equation $$\beta = pc/E\label{ebep}$$ $p^\mu$ would then have units of energy (mass times velocity squared), and has as its invariant $p^\mu p_\mu = mcu^\mu u_\mu = mc^2$. So if you take $p^\mu p_\mu$ you would get the equation: $$(mc^2)^2 = E^2 - (pc)^2\nonumber$$ or more usefully: $$E^2 = (pc)^2 + (mc^2)^2\label{eepm}$$

This is a very different form than the classical equation $$KE = \half mv^2\nonumber$$ but then that should not be a surprise, because the equation for $KE$ is really only valid for the non-relativistic situation where $v\lt\lt c$. Now, equation $\ref{eepm}$ is pretty interesting when you consider the scale of the 3 terms especially in the nonrelativistic limit. Let's say you have a particle that weighs $1 kg$ moving with velocity $v=3,600mph = 1mi/sec$. The momentum of such a particle would be pretty large: $$p = 1kg\times 1mi/s\times 5280ft/mi\times 1m/3.281ft = 1,609 kg\cdot m/s\nonumber$$ To compare with the mass term, we would take quantity $mc=1kg\times 3\times 10^8m/s=3\times 10^8 kg\cdot m/s$, which is around $10^5$ larger. The energy associated with the mass would be $mc^2=9\times 10^{16}Joules$, or $25\times 10^9 kW\cdot hr$ (kilowatt-hours) of energy, roughly equivalent to the total amount of energy consumed in the US in 8 hours in the year 2018, an unbelievably enormous amount of energy from $1kg$!

Another example of the incredible consequence of $mc^2$ is to consider the energy in nuclear explosions. In WW2, Hiroshima Japan was subjected to a Plutonium bomb where approximately 700 mg ($0.7 kg$) of mass was converted to energy. Using Eistein's famous formula, that released $E=0.7\times (3\times 10^8)^2=63\times 10^{15} Joules$ of energy. That is a very large amount of energy, equivalent to the amount of energy generated by a 2-gigawatt nuclear power plant in an entire year.

So it makes sense to expand equation $\ref{eepm}$ in the limit $p\lt\lt mc$, which we can do by writing it as $$\begin{align} E & = mc^2\sqrt{1+(p/mc)^2} \nonumber \\ & = mc^2(1+(p/mc)^2)^{\half} \nonumber \\ & \sim mc^2(1+\half(p/mc)^2)\nonumber \\ & = mc^2 + \frac{p^2}{2m}\nonumber \end{align}$$ If we use $KE=p^2/2m$, then we have the relativistic relation $E=mc^2 + KE$ in the nonrelativistic limit. The quantity $mc^2$ is usually referred to as the "rest mass", but it can also be referred to as the "proper mass" since in the rest frame of the particle, $\vec v=0$ and so $\vec p=0$ and so the energy is given by $E=mc^2$. To be more general, the true energy of a particle is given by $$E=\gamma mc^2\label{emc2}$$ Some people like to write $m=m_0\gamma$ where $m_0$ denotes the rest mass, in which case $E=mc^2$ is correct for all values of $\gamma$, and then to say that as you approach the speed of light, your mass increases such that at the speed of light your mass would be infinite. This is a bit of a stretch: you mass certainly does not increase as you go faster, however if someone on the ground were to find a way to measure your mass, they would measure a "time-dilated" mass that increases as $m\gamma$. What's really happening here is that as you add energy to a system to make it go faster, at some point you find that the velocity stops increasing and all the energy goes into increasing $\gamma$.

Momentum 4-vector

Given the 4-dimensional momentum, we can apply the Lorentz equations to find the relationship between the momenum in frame $O$ to the momentum in frame $O'$. We do this by using the same Lorentz transformation that we used for the position 4-vector $x^\mu = (ct,x,y,z)$ to get the equations: $$\begin{align} p_x &=\gamma (p'_x+\beta E'/c)\label{eq15p}\\ p_y &=p'_y\label{eq16p}\\ p_z &=p'_z\label{eq17p}\\ E/c&=\gamma (E'/c+\beta p'_x)\label{eq18p} \end{align}$$ and the corresponding reverse transformation $$\begin{align} p'_x &=\gamma (p_x-\beta E/c)\label{eq15}\\ p'_y &=p_y\label{eq16}\\ p'_z &=p_z\label{eq17}\\ E'/c &=\gamma (E/c-\beta p_x)\label{eq18} \end{align}$$ Let's check this by considering the decay of particle (1) into two particles (2) and (3): 1→2+3. Let the frame $O'$ be the $proper frame$ of the particle, which would mean that we would have $E'=m$ and $\vec p\!'=0$ in that frame, or in our notation $p'^\mu =(m,\vec 0)$. In the lab frame where we measure the momentum of the two "daughter" particles, we would have

$$p^\mu_2=(E_2,\vec p_2)\nonumber$$ $$p^\mu_3=(E_3,\vec p_3)\nonumber$$

In frame $O$, the daughter particles add as 4-vectors to give the total 4-vector $p_{tot}^\mu = (E_2+E_3,\vec p_2 + \vec p_3)$. If momentum is conserved in 3 spatial dimensions, and if the 4-momentum has an invariant, then it's natural to expect that 4-vector momentum is also conserved (which is really just a short-hand way of saying energy and the momentum vector are both conserved). So in frame $O'$ the invariant is $mc^2$ We can now make use of the postulates of relativity and the property of invariants to get $(mc^2)^2=(E_2+E_3)^2- c^2[(p_{x2}+p_{x3})^2+ (p_{y2}+p_{y3})^2+ (p_{z2}+p_{z3})^2]$

This can be easily checked by particle physicists measuring such things as for example the decay $\psi\to\mu^+ \mu^-$ where we measure the 4-momentum of the 2 muons and see if they form the "invariant mass" of the neutral $\psi$ meson. As you can imagine, this has been verified to a very high precision for any measurable decay in such experiments, and particle physicists have tested special relativity to great accuracy.

One interesting aside: if a particle has no mass, then $E=pc$ . This is consistent with $\beta =\frac{pc}{E}=1$,$\gamma\to\infty$, so we cannot use $p=\gamma mv$. Evidently the correct way to think of momentum is via equation (\ref{eq21}): instead of $p=mv$ we can use $pc=E\beta$ where $E$ is the relativistic energy, reducing to $E=mc^2$ in the proper frame.

$c=1$

That the speed of light is the same in all reference frames is remarkable. That the absolute value of the speed of light being so large is simply a reflection of the units that we humans are used to: meters and seconds. If we were to define the meter as being $30,000,000$ times larger than the meter we use today (let's call it an albert, abbreviate it as 1 "al", and $1 al = 30,000,000 m$), then the speed of light would be $c=1 al/s$. And since $c=1$, we can leave it out of any equation and not have to write it. This is actually what we do in physics all the time, just to make things easier. This gives us the 4-momentum $$p^\mu = (E,\vec p)\nonumber$$ and the invariant equation $$E^2 = p^2 + m^2\nonumber$$ Easy! The way we keep track of things is that when we want to actually calculate, and we get an answer that has units of mass, we multiply by $c^2$, and if we have units of momentum, we multiply by $c$ to get units of energy. So mass, energy, and momentum are all the same units. And same for position: time has units of length, and if we calculate a formula that we know is a time (as in $t=...$), we multiply by $c$ as needed. So to keep things simple, for everything below, we will work in the units $c=1$.

Space-time Diagrams

The demise of both absolute simultaneity, and the concept of an absolute reference frame, means that the universe is full of an infinite number of equally relevant reference frames all moving with relative velocity, and all with their own proper times. Seems like a mess, especially when it comes to understanding accelerations. What Minkowski did in his seminal paper in 1907 was to address this, and come with a framework for understanding this, built on the concept of not just space and time, but a unification called space-time.

As discussed above, in space-time, we deal with "events" as having 4 coordinates: 3 spatial, 1 temporal. As noted in equations such as (\ref{eq4})-(\ref{eq6}), the spatial and time coordinates are mixed up, but not completely: the coordinates transverse to the direction of motion remain unchanged when the reference frame is changed (when you "boost" into a different frame). So it's really the longitudinal direction (along the direction of motion) and time coordinates that are mixed up, and this suggests we can analyze space-time understand the Lorentz transformations visually by considering the distance (along the direction of motion) vs time plot. Actually for historical and esthetic reasons (and some not very important technical reasons), we show the plot as $t$ vs $x$ instead of $x$ vs $t$ where $x$ is along the direction of motion (see the diagram below).

The velocity of a particle in frame $O$ is given by the ratio of the distance $\delta x$ traveled over a time interval $\delta t$ as measured in the $O$. If you were to plot distance along the vertical and time along the horizontal, the velocity would be the slope of the curve (for constant velocities, the curve would be a straight line). In our space-time plot, since we are plotting time along the vertical and distance along the horizontal, the velocity would be the inverse of the slope of the curve. This should be pretty easy to picture - a particle that has stopped will be at constant position (constant "x"), and with time ticking on, the curve that traces out such a path would be vertical with an infinite slope.

Interestingly, a particle at constant time that is also moving would trace out a horizontal line parallel to the $x$-axis. Such a particle would be traveling at an infinite velocity - this is not allowed! If we use units of $c=1$, then the fastest velocity would be $v=1$, which means a line with a slope of 1: $\delta x = \delta t$ (remember, $c=1$).

Let's look at all the features of this new ways of visualizing space-time, as in the animation below. The dashed blue lines show the path of a beam of light, bisecting the x and t axes at 45° ($\delta x=\delta t$). The dashed lines go through the origin ($x=t=0$) by construction. In the proper frame of the particle, it is standing still so if we set $x=0$ in that frame, it will stay that way. Time, however, always keeps marching on even in the proper frame, and so the origin represents a particular space-time event for this particle. The beam of light could be going from the origin towards positive or negative positions, so we need to draw 2 bisectors. And since the beams come from the past and go into the future, they have to cover times for which $t\lt 0$ and $t\ge 0$. So the upper yellow part shows the positions of all points in space-time that a particle at the origin at t=0 could conceivably reach if it could go fast enough, but never faster than light. That's why all of the yellow positions are following $v\lt 1$, or $x \lt t$ (remember, $c=1$) and represent the possible future space-time positions of that particle that started at our origin. The bottom yellow area shows all of the positions that could have gotten to the origin of our plot with $v \lt 1$, so it represents the past. In the figure below, you can vary the value of $\beta$, which represents the velocity of $O'$ in the $O$ frame.

$\beta$: 0.20

The space-time plot above can be used to visualize Lorentz transformation in a geometrical way. That plot shows perpendicular space and time axes (we are suppressing the other 2 spatial dimensions to make things manageable). What "perpendicular" means is that space and time are independent - constant space coordinates and constant time coordinates are both possible independently of each other. For instance, in the above plot, a vertical line at some position $x$ shows the collection of space-time events that all happen at the same spatial location. Similarly, a horizontal line shows the space-time curve for a series of events that all happen at the same time. The 45° line shows the curve for a collection of space-time points that are all connected to the (arbitrary) origin by the velocity of light: all points along the curve are where a light ray could get to (along this 1 dimension) in any given time.

Now what we want to do is to understand what Lorentz transformations look like in terms of space-time curves. Let's start with the stationary reference frame $O$, and the moving frame $O'$ just like in the section above (moving with velocity $\beta$ along the $x$ axis). The equations relating position and time between the two frames are the same equations (\ref{eq4})-(\ref{eq6}) above, and the inverse equations (\ref{eq4p}-\ref{eq6p}), where we are using the units where $c=1$.

Let's start with an easy straightforward question: What are the locus of all points in O that are at a constant position x' in O'?

Or in other words, if we have a curve in $O'$ where all of the points are at a constant position, how does the curve transform to $O$? Of course in frame $O'$, that locus would be a vertical line - and so would could represent the $t'$ axis - but we want to see the space-time curve in frame $O$, since we we know that space and time are "mixed up" in going from one reference frame to another.

An easy way to answer this question is to start with equation (\ref{eq4p}) and form the difference equation:

$\delta x' = \gamma (\delta x - \beta \delta t)$

If $x'$ is constant, then $\delta x'=0$ and the above reduces to the equation

$\delta x = \beta \delta t$

This is clearly a straight line with slope $\beta $, or in the $t-x$ plane, a straight line with slope $1\!/\!\beta$.

$\delta t' = \gamma (\delta t - \beta \delta x)$

and set $\delta t'=0$ (constant $t'$) to get the equation

$\delta t = \beta \delta x$

This is another straight line, this time with slope $\beta $ in the $t-x$ plane. Click the button labeled "Toggle $t$" above to see such a line drawn in red onto the space-time plot above. (Also, $\beta$ is set arbitrarily to 0.2 here.)

A few things worth pointing out here:

The two red curves represent constant $x'$ and constant $t'$, so we can set them to be at $x'=0$ and $t'=0$ arbitrarily, just to show that these two straight lines represent the Lorentz transformation! Evidently, a Lorentz transformation is not a rotation in the $t-x$ plane, but more of a "squeezing" of the space-time axes of $O'$ relative to the vantage point of $O$.

This makes sense if you want to keep one of the basic premises of relativity intact: the speed of light $c$ is constant in all frames. You can see this clearly here: the speed of light bisects the angle between the $t-x$ and the $t'-x'$ axes!

In the above, we are only drawing the positive $t'$ and $x'$ axes.

What do events that are simultaneous in $O'$ look like on the space-time plot of frame $O$? Simultaneous means that they all happen "at the same time" (in that frame). That means that the time $t'$ is constant, no longer a variable, and that simplifies the Lorentz equations tremendously. To see this, start with equations (\ref{eq4}) and (\ref{eq6}), holding $t'$ as some constant number, eliminate $x'$, and solve for $t=t(x)$. When you do this, you should get the following interesting equation:

$t = \beta x + t'$

This equation is completely understandable: if we are considering frames where the relative velocity $\beta $ is very small, even 0, then we get $t=t'$ as we should. If we consider simultaneous events in $O'$ where we set $t'=0$, then we recover the boosted $x'$ axis as derived above. As we change $t'$ to some other arbitrary value, the higher the value the higher the "y-intercept" of the function $t(x)$, but the slope is the same: $\beta $. This makes perfect sense - after all, $O'$ is moving with velocity $\beta $ with respect to $O$!

The following simulation generates a bunch of random points $x'$ (in yellow), with a fixed $t'$ and boost $\beta$, both programmable (use the sliders below). You can get a good feel for the idea of simultaneity by playing with the parameters. The button labelled "World Lines" will draw the world line of each point (in red), which would be a straight line along the $O'$ axis (just like if it were a world line in $O$ it would be a vertical straight line). Each world line is at a constant value of $x'$ in $O'$ but has a slope in $O$ parallel to the $t'$ axis.

$\beta$: 0.20 $t'$: 0.20

Before leaving this subject, it is interesting to consider space time curves relative to the speed of light $c$. Since the principles of relativity tell us that $c$ is the maximum velocity in space-time, we should consider characterizing space-time intervals according to whether the slope $\delta x\!/\!\delta t$ is greater or less than $c$ (or $1$ in the units $c=1$). The former are called "space-like", and the latter "time-like". Why these names? Because for paths that are space-like, you could always boost into a reference frame where the entire interval is along the $x$-axis, and for paths that are time-like, you could always boost into a reference frame (the "proper frame") where the entire interval is along the $t$-axis. One can restate the principle of relativity to say that objects are only allowed to follow time-like curves.

Barn and Ladder Paradox

The classic special relativity paradox involves a guy running with a ladder, into a barn, with the latter horizontal along the direction of motion. When you lay the ladder down beside the barn, it is longer than the length of the bard along that direction. Like this:

Both of the barn doors are closed (thick blue lines). The ladder is clearly longer than the length of the bard along the ladder's "direction", so one can conclude that the ladder will not fit inside the bard.

Now comes the paradox. Let the barn be in the $O$ frame, and the the ladder is in frame $O'$ moving along the $x$ axis (here left to right) with velocity $\beta$. In frame $O$, an observer would measure the ladder to be "smaller" along the $x$ axis by an amount $\gamma $ from the Lorentz contraction. That person would conclude that it is indeed possible to have the ladder completely enclosed inside the barn. But the person in frame $O'$ who is running with the ladder would see the barn moving towards them with velocity $\beta $, and so would measure the barn to have a length that is also Lorentz contracted by the same amount $\gamma$. That person would conclude that there's no way the ladder can fit inside the bard! Such is the paradox, and such is the power of space-time diagrams to resolve it!

The key to understanding the paradox has to do with the idea of simultaneity, since being "entirely inside the barn" means that the ladder is inside the barn with both doors closed at the same time.

Below, we can draw the space-time situation for the ladder and the barn, reviewing what we've learned about how to visualize the Lorentz transformation.

Frame $O$ is drawn with perpendicular $x$ and $t$ axes. Frame $O'$ is moving with velocity $\beta$ with respect to $O$.
The $x'$ axis will have a slope $\beta$ with respect to the $x$ axis, and $t'$ will have the same slope $\beta$ with respect to the $t$ axis, as drawn in $O$.
Objects that are sitting still in $O'$ are parallel to the $x'$ axis, reflecting the fact that lengths are determined by measuring the endpoints in $O'$ at the same time. Those objects will sweep out "World sheets" that have slopes parallel to the $t'$ axis. Note that this is just saying that objects sitting still in $O'$ will be moving with velocity $\beta $ relative to $O$.
Objects that are sitting still in $O$ will be draw parellel to the horizontal $x$ axis in the space-time plot. Note that the endpoints of those objects are determined by measuring the coordinates of the endpoints in $O$ at the same time (same time in $O$).

$\beta$: 0.25

The $x'$ and $t'$ axes are drawn in dashed lines. The barn is an object that is not moving with respect to $O$ (it is stationary in $O$) so we draw the world lines of both sides of the barn as vertical dotted blue lines. The ladder, which has a proper length that is longer than the width of the barn (as above) is stationary in $O'$, so we draw the endpoints as dotted red lines with slope $\beta $: parallel to the $t'$ axis. Now comes the important part: what we mean when we describe the situation where the ladder is completely inside the barn is that we can have both doors closed at the same time (in $O$, the proper frame of the barn) with the ladder inside the barn. Let's define "front" and "back" of the ladder relative to the direction of motion, which is along increasing $x$, and the same for the barn. We set up the initial situation where the front door of the barn is closed, and the back door is open - so the ladder can enter the barn front side first. Just at the point where the front of the ladder is about ready to crash into the front door (which is initially closed), we look to see where the back of the ladder is. If it's between the blue horizontal lines, that means that we can close the back door, the ladder is completely inside the barn, and then we can open the front door to let it keep going before it crashes into the door! In between the blue lines means that we note the coordinates of the back side of the ladder, which means that the front and back are both at the same time $t$ - they are "simultaneous", which of course is relative. But what this amounts to is look at where the world line of the back of the ladder intersects the simultaneous world line of the observation (made at constant $t$), and if that intersection is between the blue horizontal lines, the ladder is inside the barn with both doors closed. This horizontal line is drawn in yellow.

You can play with the simulation and change the velocity $\beta $ and see that unless the ladder is going fast enough the Lorentz contraction is not great enough to have both doors closed at the same time in $O$ (the default value of $\beta =0.25$ is not fast enough!) The trick is to increase the velocity so that the horizontal yellow line is completely between the vertical blue lines. But the bottom line is that the ladder $can$ fit into the barn, because of the relativity of simultaneity - the person in frame $O$ will say that both doors were shut at the same time with the ladder inside, whereas the person in frame $O'$ will say that the ladder entered the barn with the front door shut, then the front door opened, the ladder moved and poked out of the barn with the back inside the back door, and $then$ the back door shut: the back door shut at a $later$ time $t'$ (in $O'$)! Relativity of simultaneity makes things weird!

Proper Time (and the "Twin Paradox")

We now have a decent idea of the implications of the to the 4-dimensional space-time coordinates, how they transform, and implications of mixing space and time in any reference frame. This section concerns the interval between two space-time events.

Let's start with 2 events in frame $O$. Event 1 is at coordinate ($x_1,t_1$) and event 2 is at coordinate ($x_2,t_2$). We can then define the intervals $\delta x = x_2-x_1$, $\delta t = t_2-t_1$. Now let's introduce frame $O'$, moving with velocity $\beta $ with respect to $O$ along the $x'$-axis, and that the two axes $x$ and $x'$ are parallel. From the point of view of an observer in $O'$, the two events will be measured to be at coordinates ($x'_1,t'_1$) and ($x'_2,t'_2$), and similarly we can define the intervals $\delta x' = x'_2-x'_1$, $\delta t' = t'_2-t'_1$. We know that the relationship between an event in $O$ and $O'$ is given by the Lorentz equations:

$x=\gamma(x'+\beta t')$
$t=\gamma(t'+\beta x')$

We also know that there is an invariant $\delta R$ such that

$\delta R^2 = \delta t'^2-\delta x'^2-\delta y'^2-\delta z'^2$

From considering the very specific case where the events in $O'$ occur such that $\delta x' = 0$ (in the same place in $O'$), then $O'$ is the proper frame, and $\delta\tau = \delta t'$ is the proper time. And, the relationship between $\delta\tau$ and $\delta t$ is given by the equation $\delta t=\gamma\cdot\delta\tau$, which means that the time interval between events is minimal in the proper frame. This minimal time, the time in the proper frame, is called the proper time and the above equation demonstrates the phenomena called time dilation.

We can calculate the relativistic invariant in the frame $O'$ where the position does not change ($\delta x'=0$, and ignoring the $y'$ and $z'$ coordinates):

$\delta R^2 = \delta t'^2 - \delta x'^2 = \delta t'^2 \equiv \delta \tau^2$

Since the relativistic invariant $\delta R$ is the same no matter which reference frame you choose, which means that we could also use frame $O$ to get:

$\delta R^2 = \delta t^2 - \delta x^2$

which means that $\delta\tau^2=\delta t'^2 - \delta x'^2 = \delta t^2 - \delta x^2$

This is the origin of the common notion that the proper time is the relativistic invariant, which is accurate if you define the proper time as the time in the frame where the object is not moving.

Note that in full 4-dimensional space, we can write the infinitesimal proper time interval $d\tau$ as

$d\tau = \sqrt{dt^2 - dx^2 - dy^2 - dz^2}$

What is clear, however, is that straight time-like paths (see above) maximize the time interval $\delta t$ (the "proper time" as defined in this section).

As an example, consider the situation of two observers, one in $O$ and the other in $O'$, and $O'$ moves with a velocity $\beta _+$ relative to $O$ along co-parallel $x$-axes. The observer in $O'$ sits in some kind of space-ship, so the position $x'$ in $O'$ doesn't change - all space-time motion in $O'$ is along the $t'$ axis. The spaceship goes along for a time $\delta\tau$ and turns around, and the return velocity $\beta_-$ is the same as $\beta_+$ only in the opposite direction (in $O$): $\vec{\beta_-}=-\vec{\beta_+}$. Let's set the situation such that the spaceship eventually turns around and comes back to the starting place, and the velocity of the returning $O'$ is the same as the initial $O'$. This is the famous "twin paradox", and the space-time diagram is shown directly below.

In the figure below, we see the path of the spaceship: it starts at the origin ("A"), and moves with constant velocity $\beta $ for some time $\delta t'$ (as measured in the frame of the spaceship, $O'$) reaching point "B", the destination. At that point, the spaceship decelerates to $\beta =0$ (in $O$), turns around, and accelerates back up to a constant velocity $\beta$ and heads back to the origin, $x=0$. (Note: the deceleration would leave the spaceship world line as a vertical line in spacetime, but this is not shown - we just reverse direction at point "B").

As you can see in the diagram, $\delta t'$ is the "distance" from A to B along the $t'$ axis (this is the proper time for the person in the space ship), $\delta x$ is the distance along the horizontal $x$-axis (this is how far the person traveled in the $O$ frame, presumably on earth), and $\delta t$ is the "distance" along the vertical $t$-axis (this is how long the person traveled in the $O$ frame.

The units here are $c=1$, which means 1 second of time is equivalent to $3\times 10^8$m, or 186,000 miles. What we can see is that given an initial velocity starting at $\beta =0.25$ (and changeable via the blue arrows), for every 50 years of time that the person in the spaceship in $O'$ spends, the person in $O$ ages 51.64 years. If you increase the velocity to $\beta =0.95$, when the astronaut returns, everyone in $O$ will be over 158 years older. So if the astronaut has a twin who is left behind, and when the journey starts the two are 20 years old, when the astronaut returns he will be 120 and the twin, if still alive would have been over 178 years old. Note that in this case, the spaceship would have had to have traveled a distance of 12.9 light-years. Alpha Centauri is around 4.35 light-years from earth, Sirius A is 8.6, and given the density of stars near the sun to be at about 0.004 per cubic light-year, there should be around 36 stars within 12.9 light-years.

(Note: The total energy of a space ship moving at a velocity of $\beta =0.95$ would be given by $E=\gamma mc^2$, and the kinetic energy would be the total minus the rest mass energy: $KE=(\gamma -1)mc^2$. At $\beta =0.95$, $\gamma =3.20$, and if the spaceship is even as small as 1000kg (about 1 ton, the size of a VW bug), you would need around $2\times 10^{20}$ joules, or about $5.5\times 10^{14}$kW-hr, or over $6,000$GW-years. Note that 6,000 GWatts is almost twice the total amount of power consumed by humans in 2010.)

This phenomena is called the "twin paradox". The paradox part comes from the fact that along each leg A→B and B→C, each observer would measure the same time dilation effect. So in frame $O$, the observer would measure the clock in frame $O'$ to be "slower", and the observer in $O'$ would measure the clock of the person in $O$ to be "slower" (time dilation), yet after the end of the round trip clearly the clocks are different. And of course, the resolution of the paradox comes from realizing that the two frames are not equivalent - at point B, the spaceship decelerates such that it is no longer moving with respect to an observer in $O$, turns around, and accelerates back to velocity $\beta $ moving towards the observer at point A in $O$. The observer in $O$ never decelerates or accelerates at any point. And this makes all the difference, resolving the "paradox".

$\beta$: 0.25 ($\gamma$=1.03)

Along:	$\delta t'=\delta\tau$	$\delta x$	$\delta t$
A→B:	0	0	0
B→C:	0	0	0
A→C:	0	0	0

Prelude to General Relativity

The first thing to do to get prepared for general relativity is do a bit of mathematics, and to start we consider the 4 components of space-time as a 4-dimensional vector. We can write the 4-vector as $x^\mu \equiv (t,\vec x)$ where $x^0=t$ and $\mu =1,2,3$ are the 3 components of the 3 dimensional vector $\vec x$. As is traditional for vectors, we write them with the superscript index.

It is interesting to write the space-time invariant definition equation (\ref{eq9}) for intervals in terms of some kind of sum over the components of the interval 4-vector:

$\delta x^\mu\equiv(\delta t,\delta \vec{x}\!)$

The invariant is given by:

$\delta R^2 = \delta t^2-\delta x^2-\delta y^2 - \delta z^2 = (\delta x^0)^2-(\delta x^1)^2-(\delta x^2)^2-(\delta x^3)^2$

where we have to keep straight that in writing something like $(\delta x^m)^n$, the $m$ is an index and the $n$ is a power.

So the exercise here is to form the invariant $\delta R^2$ using these new 4-vectors, which means we need to have a way to mix $\delta x^\mu$ and $\delta x^\nu$ with a matrix $\eta_{\mu\nu}$ where when we sum over the indices $\mu$ and $\nu$ we get the right answer for the invariant $\delta R^2$:

$\delta R^2 = \sum\limits_{\mu,\nu=0}^3\eta_{\mu\nu}\delta x^\mu\delta x^\nu$ and we sum over the indices $\mu$ and $\nu$, running from 0-3 each.

Equating the two formulae for $\delta R^2$ we can see that $\eta_{00}=+1$, $\eta_{11}=\eta_{22}=\eta_{33}=-1$, and all other components of $\eta$ are identically $0$.

The matrix $\eta $ is called the metric, and we will follow the usual convention where we leave off the $\sum$ symbol if we see the same index as both a subscript and a superscript, and assume the sum is there (in other words, when we write something like $a^\mu b_\mu$, because we see the same index in both we know it really means $\sum\limits_{\mu=0}^3 a^\mu b_\mu$).

Note that it would be perfectly ok to have a relativistic invariant that is negative. That is, to have the invariant be given by $\delta R^2 = -\delta t^2+\delta x^2+\delta y^2 + \delta z^2$ in which case the metric would then have $\eta_{00}=-1$, $\eta_{11}=\eta_{22}=\eta_{33}=+1$, and all other components of $\eta $ identically $0$. The way we've defined it is often referred to as the "West Coast metric" ($+---$), with the "East Coast metric" being ($-+++$). I like the west coast metric since it favors time over distance, and given the large value of the speed of light, that makes sense: $c\delta t$ is pretty much always bigger than $\delta r$. Also it means that in the proper frame, the invariant $\delta R$ is the same as the proper time $\delta \tau $, whereas with the east coast metric, $\delta \tau =-\delta R$. But in the end, either one is ok, and the important thing is to be consistent.

If we want to use real matrix notation, we have to be a bit specific. The vector $x^\mu $ is a regular vector, and has either 4 rows and 1 column (4x1) or 1 row and 4 columns (1x4). Which one? What people usually do is to define the 4-vector $x^\mu $ as a 4x1 object:

$x^\mu = \begin{pmatrix} t \\ x \\ y \\ z\end{pmatrix}$

and write the metric $\eta $ as a 4x4 object: $\eta_{\mu\nu}= \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & -1 & 0 & 0 \\ 0 & 0 & -1 & 0 \\ 0 & 0 & 0 & -1 \end{pmatrix}$

To form an invariant, which is a scalar (a 1x1 object), we need to use the transpose:

$(x^\mu)^T=\begin{pmatrix} \delta t & \delta x & \delta y & \delta z\end{pmatrix}$

so that we get a scalar: (1x1) = (1x4)(4x4)(4x1), or equivalently:

$\delta R^2 = x^T\cdot\eta\cdot x$.

As stated above, some people call $\delta R^2$ the "proper time", and it certainly is true that $\delta R=\delta \tau $ in the "proper frame", but otherwise it's just a semantic convention.

Notice that the metric $\eta _{\mu\nu}$ transforms the vector $x^\mu$ in the following way:

$\eta_{\mu\nu}x^\nu = x_\mu$

so that we can write the invariant as

$\delta R^2 = \delta x_\nu\delta x^\nu$

which means that

$x_\mu\equiv(t,-\vec{x})$

or in matrix form:

$x_\mu = \begin{pmatrix} t & -x & -y & -z\end{pmatrix}$

Remember, to get a scalar invariant, we want to multiply a (1x4) and a (4x1), so $x_\nu$ is a (1x4) and $x^\mu$ is a (4x1).

We can also write the infinitesimal time proper time interval $d\tau $ as

$d\tau = \sqrt{\eta_{\mu\nu}dx^\mu dx^\nu}$.

This will turn out to be a very important formula in GR, but we can get started here just in considering paths in spacetime.

Motion in Space-time

Let's go back to the twin paradox. The path in spacetime for the person stationary on the earth yields a larger time interval (larger proper time in that person's frame) than the path in spacetime of the spaceship (the person on the spaceship ages less). So straight paths (earth) have larger proper time than paths that are not straight (spaceship turns around at the endpoint). Is this a general rule? If it is, then we should be able to form the proper time, apply variational principles, require an extremen, and see what comes out.

It's easier to restrict motion in 1 space dimension, so we start there:

$d\tau = \sqrt{\delta t^2 - \delta x^2}$

and apply the usual machinery to form the extremen in

$\delta\tau(A\to B)=\int_A^B d\tau = \int_A^B\sqrt{dt^2 - dx^2}= \int_{t_A}^{t_B}\sqrt{1-(\frac{dx}{dt})^2}dt$

over some path. The machinery is, of course, the Euler-Lagrange equations:

$\frac{d}{dt}\frac{\partial f(\dot x,x)}{\partial\dot x}-\frac{\partial f(\dot x,x)}{\partial x}=0$

where here, $f(\dot x,x)=\sqrt{1-\dot x^2}$

Since the function has no explicit $x$ dependence, we get the following equation from applying the Euler-Lagrange equations (and below we will use the notation $\beta = \dot x$):

$\frac{\beta}{\sqrt{1-\beta^2}}=$constant.

Recognizing that $\frac{1}{\sqrt{1-\beta^2}}=\gamma$, and multiplying both sides by the rest mass $m$ of the particle we have $m\gamma\beta\equiv p=$constant which is nothing more than momentum conservation.

Lorentz Transformation

The Lorentz transformation (see equations (\ref{eq4})-(\ref{eq6})) can also be written in similar notation like this:

$x^\nu = \Lambda^\nu_\mu x'^\mu$

where the object $\Lambda$ is also a 4x4 matrix given by:

$\Lambda^\nu_\mu = \begin{pmatrix} \gamma & \gamma\beta & 0 & 0 \\ \gamma\beta & \gamma & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix}$

Note that the Lorentz matrix has both an upper and lower index, and is a symmetric matrix so that in matrix notation $\Lambda = \Lambda^T$.

In matrix form, we would have $x=\Lambda\cdot x'$ where the vectors $x$ and $x'$ are column vectors (a 4x1 object) and $\Lambda$ is of course 4x4. If we want to apply the Lorentz transformation to a row vector (1x4 object), then the equation has to be $x^T=(x')^T\Lambda ^T=(x')^T\Lambda$.

We can form the Lorentz invariance and compare it in the two frames $O$ and $O'$ (leaving the transpose of $\Lambda$ in there just to be explicit) to get:

$x^T\eta x$=(x')^T\Lambda^T\eta\Lambda x'=(x')^Tx'$, which means that

$\Lambda^T\eta\Lambda = \eta$.

What does this mean? Best to look at it as 3 matrices, and remember that $\Lambda$ is a symmetric matrix ($\Lambda=\Lambda^T$):

$\Lambda^T\eta\Lambda = \begin{pmatrix} \gamma & \gamma\beta & 0 & 0 \\ \gamma\beta & \gamma & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & -1 & 0 & 0 \\ 0 & 0 & -1 & 0 \\ 0 & 0 & 0 & -1 \end{pmatrix} \begin{pmatrix} \gamma & \gamma\beta & 0 & 0 \\ \gamma\beta & \gamma & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} = \begin{pmatrix} \gamma & \gamma\beta & 0 & 0 \\ \gamma\beta & \gamma & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} \gamma & \gamma\beta & 0 & 0 \\ -\gamma\beta & -\gamma & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & -1 & 0 & 0 \\ 0 & 0 & -1 & 0 \\ 0 & 0 & 0 & -1 \end{pmatrix}$

as it should (making use of the definition of $\gamma$ which can be written as $\gamma^2 -\gamma^2\beta^2 = 1$.

What is really interesting here, among other things, is the significance of 4-vectors that have upper indices (a "contravariant" vector) and those that have lower indices (a "covariant" vector). And, on top of that, the correspondence between the 4-vector notation here (e.g. $x^\mu $ is a "thing" in Minkowski space) and the matrix notation where we suppress the indices, but have to keep track of rows and columns and use the transpose "T" concept. For instance, to form a scalar with the two Minkowski 4-vectors $a^\mu $ and $b^\mu $, both contravariant vectors, we have to first lower one of them to make a covariant object, e.g. $a_\mu =\eta _{\mu\nu}a^\nu $ to form the scalar object $a_\mu b^\mu $. In matrix notation, to make a scalar from a 4-vector, we need to multiply a 1x4 object on the left by a 4x1 object on the right. Note that the matrix representation of the vector $x$ is that it is a contravariant thing, because we defined it such that to get the covariant $a_\mu$ we contract it on with $\eta_{\mu\nu}$ on its left, equivalent to multiplying a 4x4 object by a 4x1 object (in that order). So 4x1 column vectors are contravariant, and 1x4 row vectors are covariant, and the relationship between a 1x4 and a 4x1 is that they are transposes of each other. In Minkowski space the metric rotates from one to the other, so in matrix space the metric is intimately tied up with making the transpose.

Why is this the case? Ultimately it boils down to the metric being needed to form a scalar in the first place, or in other words, the fact that the metric is not equal to the unit matrix (all 1's on the diagonal and 0's everywhere else).