It is not uncommon to hear physicists or mathematicians talk about the “beauty”, “simplicity” or “elegance” of equations or theorems, and even claim that they are sometimes led to a correct formula (or away from an incorrect one) by considering what is “simple” or “elegant”. Consider, for example, the words of the Nobel prize winning physicist Murray Gell-Mann:
“Three or four of us in 1957 put forward a partially complete theory of the weak [nuclear] force, in disagreement with the results of seven experiments. It was beautiful and so we dared to publish it, believing that all those experiments must be wrong. In fact, they were all wrong.”
and Albert Einstein’s remark:
- “I have deep faith that the principle of the universe will be beautiful and simple.”
- Could there be something to these remarkable claims? Is beauty in physics evidence for some kind of “intelligent” universe, or are there more mundane explanations? Is elegance in mathematics evidence for an underlying structure to reality? Or can this be explained away by psychological or practical considerations?
To begin answering these questions, an important thing to notice about the aesthetics of equations is that what appears to be simple or elegant may only be so because of the way that symbols are defined. For example, consider the remarkable and rather minimalist “heat equation”Δf = f ‘
which, when solved for the function f with a given condition on its boundary, will describe how heat would actually flow over time on any specified surface in any number of dimensions. Is it not astounding that we can describe such a powerful physical law with just these 5 characters? Even if you don’t understand the mathematics or physics, bear with me because you will still be able to understand my point.
A deeper look at this equation shows us that the apparent simplicity here is in large part an illusion. First of all, the Δ, which is known in this context as the Laplace operator, can be thought of as simply a short hand notation. If we replace Δf with its definition, we are left with the markedly less simple equation:
d2/dx2 f + d2/dy2 f + d2/dz2 f = f ‘
where d2/dx2, d2/dy2, and d2/dz2 are second derivatives with respect to the x,y and z dimensions of space. The right hand side of the equation can now be replaced with its definition, where the tick (‘) applied to f is understood to mean that we are taking one derivative with respect to time. This gives us:
d2/dx2 f + d2/dy2 f + d2/dz2 f = d/dt f.
Even without knowing what this equation means, you can see that things are starting to get fairly complicated and are looking quite a bit less elegant. Derivative operations (which are taken a total of seven times in the above equation) are not themselves trivial operations, and are (typically) defined via a limiting procedure. If we apply the definition of the derivative to the d/dt on the right hand side, we get:
d2/dx2 f + d2/dy2 f + d2/dz2 f = lim h→0 (f(x,t+h) – f(x,t))/h.
Now, if we are crazy enough to replace the remaining six derivative operations with the definition of the derivative, we are left with an equation which is just plain long and ugly, even after performing some simplification:
lim h→0 (1/h2) * (3 f(x,y,z,t) + f(x+2h,y,z,t) – 2 f(x+h,y,z,t) + f(x,y+2h,z,t) – 2 f(x,y+h,z,t) +f(x,y,z+2h,t) – 2 f(x,y,z+h,t) ) )
= lim h→0 (f(x,t+h) – f(x,t))/h.
The point to realize here is that mathematicians and physicists make very careful choices when selecting their notation to vastly compress very complicated ideas. Typically, they define symbols in such a way as to make important formulas easy to write down and work with. However, if they chose to, they could always pick notations which would make even the “simplest” formula look nasty. For example, whenever we find ‘1′ in an equation, we could (if we were completely crazy) replace it using the following formula:
1 = ∑ k≥0 (-1)k (π/2) 2k+1/(2k+1)!.
Doing so would not change any of our results, but it sure would confuse a lot of people and make the formulas much harder to work with.
All of this being said, notation is not the end of the story. Another important point to consider is that in many cases a single physical law can cause a multitude of different effects which may not, at first, appear to be related. To give some classic examples, before Newton’s era it was not at all obvious that the force that causes us to fall to the ground when we jump is the same force that keeps planets in orbit in our solar system. Likewise, before the 1800’s it was not known that electric fields, magnetic fields and light are in fact manifestations of a single phenomena now known as electromagnetism. Similarly, before the era of Einstein it was not understood that conservation of energy and conservation of momentum could be thought of as effectively being part of a single conservation law.
There are a number of cases in physics where simpler and more elegant theories have won out over more complex theories because they correctly identify seemingly unrelated phenomena as having a single cause. Theories which treat inherently connected ideas as being wholly different are destined to be replaced since their lack of unification creates redundancy and therefore unnecessary complexity in the theory. This is one important reason why ugly, complicated theories can often be outdone by what seem to be more beautiful ones. We find it more beautiful to have one explanation for two results that two have two distinct explanations, and if the results really are just caused by one phenomenon, the single explanation will typically be easier to express and work with mathematically than both of the other two.
Another, related reason why we might expect simplicity to win out over complexity comes from a rule of thumb known as Occam’s Razor. This idea, which is often bandied about as if it were obviously and unquestionably true, states that when given many possible explanations for something that are otherwise equally plausible, we should prefer the one that is the simplest (or that makes the fewest assumptions). While Occam’s Razor certainly makes some intuitive sense, we can place the idea on a slightly more rigorous footing by considering results from the now blossoming field of machine learning, which concerns itself (in large part) with getting computers to make intelligent predictions by learning from past examples.
When computer scientists attempt to estimate how good a particular learning algorithm is at making predictions, typically what they find is that the expected future error of the algorithm is dependent on what might be called an “Occam term”, which punishes models based on their complexity. The more complex a model is, the more of this kind of penalty it will incur, and so the less accurate the algorithm will tend to be when making predictions. Here, depending on the mathematical analysis carried out, “complexity” can be measured in a variety of different ways, including the number of free parameters in the model, the number of bits of information required to specify the model, or the maximum number of points the model will always be able to categorize without making an error. The idea is that while very complex models are good at explaining past data (i.e. data that is used to train the models), they tend to (all else being equal) make more errors than simple models on future data (i.e. data that is not available at the time when the models are trained).
Now, since Physicists are in the business of trying to guess (or predict) the rules of the universe from experiments (which are just like the “past examples” in the machine learning setting), it is intuitive to think that an “Occam term” will apply to them as well. Hence, while this is not by any means an air tight argument, we have some reason to think that in the scientific method, just as in the machine learning setting, simpler theories due truly tend to be more useful than complex ones, so long as both explain all of the currently available experimental evidence.
A good example of Occam’s Razor which came up in practice is the Ptolemaic explanation of the motion of the planets, which apparently was the “accepted theory” in some places for “over 13 centuries”†. The basic idea of this theory was that planetary motion consists of “epicycles” around the fixed planet earth. This means that planets were thought to make circular orbits around earth, but that during these circular orbits the planets orbited in smaller circles along the orbits, and along those smaller circular orbits they orbited in still smaller circles, etc. This model was intrinsically very complex because by adjusting the epicycles so that there were a sufficient number of circular orbits within circular orbits at appropriate speeds one could have described pretty much any shape of orbit whatsoever, real or imagined. In other words, the model had a large number of free variables which gave it enormous flexibility and therefore complexity. Copernicus eventually laid the Ptolemaic model to waste by replacing it with a far simpler model with far fewer free parameters, which he accomplished merely by shifting the center of the circular planetary orbits to be the sun rather than the earth. However, the basic form of his new theory still did not agree perfectly with observation, and so required some adhoc refinements that introduced extra complexity. This final complexity was eventually removed by Kepler who refined the model yet again by allowing for elliptical rather than circular orbits, which now is known to be an excellent explanation for the orbits that are observed. The key difference in these explanations for orbits is that the theory of epicycles is complex enough to explain almost any conceivable orbit you could ever think of, whereas Kepler’s idea of elliptical orbits with the sun at one focus of the ellipses was just complex enough to explain what was actually observed but without being complex enough to explain the universe had we observed substantively different orbits than actually exist. In other words, Kepler’s theory is precisely as complicated as it needs to be to explain reality.
There are a few more points about the relationship between beauty and truth in physics and math that I feel are worth mentioning. To begin with, as physicist Murray Gell-Mann (quoted above) mentions in his TED talk on beauty and truth in physics, symmetry plays a key role in simplicity. For example, since all of the known laws of physics treat the three dimensions of space equally, we can often greatly simplify equations by writing things such as
∇ f = some expression
rather than having to write an equivalent but much more cumbersome set of equations where we treat each dimension of space separately, as in:
df/dx = some expression
df/dy = some expression
df/dz = some expression.
The point here is that symmetry makes it easy to simplify equations. Of course, this argument goes beyond just the symmetry of the three dimensions of space, and applies also to symmetry in time, rotation, etc.
Another idea that should be mentioned is that typically mathematical expressions have a number of different equivalent forms. For example, we could define the exponential function ex using any of the following equivalent definitions:
f(x) = lim h→∞ (1+(x/h))h
f(x) = f ‘(x) & f(0) = 1f(x) = ∑ k≥0 xk/k!
f(ln(x)) = x
f(x+y) = f(x) f(y) & f(1) = e.
f(x) = Cosh(x) + Sinh(x)
None of these definitions for ex is intrinsically better than any other. Mathematicians have the choice to use whichever definition is more useful for any given purpose, and often times it is precisely the simpler or more “elegant” definitions that are used most commonly because they are easier to understand and manipulate.
As a final point, it is worth noting that much of the most theoretical mathematical work is driven more by the aesthetic and psychological appeal of the theorems produced than by the importance of those theorems in solving practical problems that arise in the real world. One prime example of this phenomenon is the field of number theory, which while popular and very elegant, found almost no practical applications before it was (unexpectedly) linked to the field of cryptography and secure online banking. It also should be notated that it is likely easier to publish results that strike the reviewers as elegant rather than clumsy and awkward. Keeping these ideas in mind, it is no surprise to find that some of the most researched areas of math even today have great beauty but few real world applications.
In conclusion, the relationship between beauty and truth in physics and math is a complicated one, which relates to practical considerations such as choices for notation and definitions, psychological phenomenon such as the personal preferences and aesthetic sensibilities of the practitioners, and deeper physical or mathematical ideas such as symmetry, the unification of seemingly unrelated results, and Occam’s razor. In the end, it is clear that beauty is an important, if not fundamental part of math and science.