What Galois Did

Standard

I think this is worth saving:

This is a question where historical context helps a lot, I think.

Certainly, one can write about the specific results that Galois proved without any reference to the context in which they appeared, and they are beautiful. I really love Galois theory

. But I fear that if one asks about the applications of Galois theory, the answers may seem very niche. And that does a grave injustice to a man (barely a man—he was only 20 when he died) who helped revolutionize how we think about mathematics.

So, let’s set the scene a little.

The 19th century was a time of revolution in Europe. The political revolutions of that time are certainly more famous, but it was a tumultuous time in science and mathematics as well.

Prior to the 19th century, mathematics was mostly very concrete. Whatever mathematical objects were considered were primarily pulled from real-world experience: all of Euclidean geometry arose from physical drawings with straightedges and compasses; the real numbers arose from ideas about magnitudes and measurements; e arose from studies of compound interest, and so on, and so on. What didn’t come from such concrete considerations was viewed with suspicion—complex numbers, for instance, were often treated as black magic, despite the fact that the rules for working with them are dead simple and they are incredibly useful.

In the 19th century, all of this changed. Mathematicians began exploring objects that didn’t seem like they could be pulled from real-world experience, even if you could describe their mathematical properties perfectly well. This was the era in which non-Euclidean geometry was discovered

, for example. While, today, we recognize that hyperbolic geometry is, in some ways, even more closely related to the actual geometry of space than Euclidean geometry is

, at the time it seemed like something utterly devoid of application. Mathematicians as a whole had to be dragged kicking and screaming to embrace these kinds of ideas, but ultimately they did, and the field became so much richer for it.

And at the same time as Gauss, Lobachevsky, Bolyai, and Riemann were reinventing what it means to study geometry, Galois was doing the same thing for algebra.

Picture it. It’s France, 1829. Politically, it is tense: a year from now, Charles X will be deposed—to be replaced by his cousin, Louis Philippe I. But, for the moment, there is something more pressing. A young man—only 17 years old—storms out from his examinations; he has been denied entry to the prestigious École Polytechnique for a second time. He finds his examiners inane and plodding; they find him sloppy and explosively hot-headed. He is filled with fire and rage; his father had committed suicide just days before. His head is bubbling with ideas that are decades ahead of his time.

This is Évariste Galois.

He had already published a paper the previous year; he would publish three more over the next few years while in and out of prison. (He was a fierce republican and made no attempts to hide his disdain for the monarchy.) In 1832, at just 20 years old, he was pulled into a duel and shot to death.

Having now a sense of the man and the environment in which he was, let’s actually discuss his work.

What was algebra prior to the work of Galois? As a field, it was almost entirely about solving equations. When can one get write down a solution to a polynomial equation in terms of radicals? When can one get integer solutions to such an equation? These were the kinds of questions that mathematicians were asking (and sometimes answering). To the extent that new algebraic objects were introduced, it was solely in the service of such concrete goals—this is how complex numbers came to be, and modular arithmetic, and so on.

Galois did produce results and solutions in this vein, but he also intuitively understood—in a way almost no one else did at that time—that one could study these algebraic objects themselves, their properties, and how they related to one another. When he published his paper on what is now called Galois theory, it did provide a way to prove that the roots of various polynomials cannot be written down in terms of radicals, yes—but Galois himself wrote that this was merely an application of the theory, and not what it was principally about. This was completely missed by his contemporaries, who tried to make sense of Galois’ work by focusing on this one element that they judged as actually important. It was only decades later that this fundamental shift in perspective was understood and appreciated.

The core of Galois theory—what he is principally remembered for—can be roughly explained thus, in modern language:

There are algebraic structures called groups and fields, and there is a fundamental connection between the two that allows you to transform difficult problems about one into easy problems about the other.

Of course, this is a little anachronistic, since Galois did not have a modern definition of either groups or fields—those didn’t come until the 1880s and 1890s, respectively. He still thought about them in much more concrete terms. But if I may be allowed to continue this anachronism for the ease of exposition, I can give a little insight into what these are.

What are groups? These are pretty much the most fundamental algebraic structures there are

, with many, many, many examples

 in very varied fields—they just seem to worm their way in pretty much everywhere. Instead of giving the general definition, though, let’s think about them very concretely, in a manner similar to how Galois himself would have. Start with a collection of points.

Now, consider any way of shuffling these points around—we call this a permutation. Here are some examples.

Given any two permutations, we can get a new one by just doing one and then the next. To help illustrate this, let’s draw our permutations a bit differently. For example, here the second one from our list above.

We’re moving each point to the left, other than the last one, which we move back to the beginning. But now, we can do this twice, and this amounts to stacking this permutation on top of itself. And this gives a new permutation.

Here’s an example with two different permutations.

If we call the bottom permutation g and the top permutation h, then we call the permutation obtained by doing g then h h∘g—it is their composition, or product.

Observe that in the second example above, we got that h∘g became the permutation where we don’t move any of our points at all. We call this “do-nothing” permutation the identity, and we say that if h∘g is the identity then h is the inverse of g—we write h=g−1. It is readily checked that g∘g−1 is then the identity as well.

We are finally ready to introduce groups. A group is a collection of permutations with the properties that:

  1. if gh are permutations in the group, then so is g∘h, and
  2. if g is a permutation in the group, then so is g−1.

That is, a group is closed under composition and inverses. You can check that the example we gave previously—i.e.

is a group.

I should note that this is not the modern definition of a group, although it does turn out to be equivalent to it, in the sense that any group (in the modern sense) can be realized as a group of permutations.

Let me give some additional examples of groups, aside from the one above:

  1. The integers …−3,−2,−1,0,1,2,3,… are a group, if we think of any integer as being shift left/right—e.g. we think of 3 as a shift of the number line three units to the right, and −3 as a shift of the number line three units to the left.
  2. Rotations of the plane around the origin form a group.
  3. Rotations of 3D space around the origin form a group.
  4. The real numbers are a group, if we think of any real number as being a shift left/right of the number line.
  5. Lorentz transformations of space-time form a group

    .

  6. The symmetries of a molecule form a group

    .

These examples should hopefully convince you that groups are everywhere, in mathematics, physics, chemistry, and beyond. Which is, perhaps, not very surprising: one way that you can think of a group is that it is a collection with an operation on it satisfying some very simple, very natural constraints. As I said in the introduction, they are just about the most fundamental algebraic structures there are.

What of fields?

A field is also a very fundamental algebraic structure, but unlike a group where you just have one operation, for a field you have two, usually called addition and multiplication. These have the properties that you expect—they have to be associative, commutative, etc., and multiplication must distribute over addition. You can find formal definitions easily

, but it is good to see some examples: the rational numbers, the real numbers, and the complex numbers are all different fields. (Whole numbers are not—you have to be able to perform division in any field.) There are infinitely many different fields of many kinds of varieties, although this was not yet known to mathematicians in the 19th century. Arguably, one of Galois’ important contributions was introducing finite fields (as opposed to the infinite ones listed here)—this is arguable not because this contribution was not important (finite fields are very important in computer science today and they are often called Galois fields in his memory), but because Galois lacked the formal definition of a field, it is a little difficult to definitively say whether or not he properly described finite fields or not.

One way to think about fields is that they are the sort of algebraic structure that you want to work with if you want to define polynomials of some kind, or study polynomial equations—this was certainly how Galois found himself studying them. But they have importance far outside of that—in the modern era, the concept of a field is central to linear algebra, which underpins an unfathomable amount of numerical methods and algorithms.

In any case, we are now ready to very loosely give the central idea of Galois theory. Suppose that you have two fields, one containing the other (such as the real numbers containing the rational numbers, or the complex numbers containing the real numbers). It turns out that to any such pair, there is a group that you can associate with it (called the automorphism group) and if two very natural and very common conditions are met, then we call this the Galois group, and it has many wonderful properties. To start, there is a direct correspondence between groups contained inside this Galois group and fields contained in the larger field and containing the smaller one. Various algebraic properties of the fields can be rephrased as algebraic properties of the groups.

And this is useful! For one thing, while the fields are usually infinite, the Galois groups are often finite. Thus, intractable problems about fields can sometimes get turned into straightforward computational problems about groups. This is, for example, how Galois proved that there exist polynomials with rational coefficients such that their roots cannot be written down in terms of radicals

. You can also use it to prove that various geometric constructions are impossible to do via straightedge and compass (such as angle trisection

), to quickly show that various seemingly complicated expressions are/are not rational

, etc.

But, again, as Galois himself said, all of these are just applications of his work. They are not the main idea (which Galois was never quite able to properly formulate, but which we understand very well now) that there are fundamental algebraic structures with deep connections between them, and it is important for us to study them for their own sake. This central lesson went squarely over the heads of his contemporaries, and it wasn’t until decades after his death that Galois’ work was properly understood in that context. There was a philosophical revolution that had to happen to make that possible.

And that I think is more the point of Galois’ importance: he was among the first algebraists in the modern sense of the word and he showed where mathematics had to go—imperfectly, impatiently, but still brilliantly.

Footnotes

Source: (2) Senia Sheydvasser’s answer to In layman’s terms, what did Evariste Galois give to humanity and how has it helped us? – Quora

The Cyclic Identity for Partial Derivatives

Standard

In  this article on his ‘Azimuth’ website, John Baez considers the fact that for any three related variables $#\frac{\partial u}{\partial v}|_{w}\frac{\partial v}{\partial w}|_{u}\frac{\partial w}{\partial u}|_{v} = -1#$ (which has an extra minus sign compared to what one might naively expect from “cancelling differentials”).

After giving an argument for the case where $#u#$, $#v#$, and $#w#$ are linear functions of two other variables $#x#$ and $#y#$, he asks for “a more symmetrical, conceptual proof” and goes on to promote one put forward by Jules Jacobs based on the anticommutativity of the wedge product. But I think the wedge product argument adds unnecessary formalism and infrastructure, without really clarifying the intuitive concept that makes things work out as they do.

First, the linear argument can easily be expressed more symmetrically in a way that generalizes the identity to any number of variables as follows:

Let the relation be $#\Sigma_{i\in{I}} a_i u_i =c#$. Then $#u_i =\frac{c-\Sigma_{j\ne i\in{I}} a_j u_j}{a_i}#$. and so $#\frac{\partial u_i}{\partial u_j}|_{u_k:k\ne i,j}=-\frac{a_j}{a_i}#$ and the cyclic identity follows easily (with the product being $#(-1)^n#$ for the case of $#n#$ variables).

And of course any second year calculus student should know that the linear argument can be applied locally to the case of any smooth relationship function $#f(u_i, i\in I)#$, giving $#\frac{\partial u_i}{\partial u_j}|_{f,u_k:k\ne i,j}=-\frac{\frac{\partial f}{\partial u_j}|_{u_k:k\ne j}}{\frac{\partial f}{\partial u_i}|_{u_k:k\ne i}}#$

So the intuition for where the minus signs come from is just the act of “moving variables to the other side of the equation”. And if the idea of partial differentials is to have any meaning then the place to start worrying about those minus signs is not in the general cyclical identity but in the simple two variable case of implicit differentiation where $#\frac{du}{dv}|_{f(u,v)=c}=-\frac{\frac{\partial f}{\partial v}|_u}{\frac{\partial f}{\partial u}|_v}#$.

I need to think some more about this – in particular how the minus signs from implicit diff in “my” argument (or from solving linear equations in the Baez linearization) relate to those from reversing wedge products in the Jacobs argument. (But looking at Jacobs’ X-post there is a reference to an article by Peter Joot on solving equations by use of wedge product which probably makes it all clear.)

Source: The Cyclic Identity for Partial Derivatives | Azimuth

Why SD uses squares (rather than abs val)

Standard

It’s not just for computational convenience.

rms SD minimizes expected distance from mean while avg of abs val does it for the median

Source: (1000) Nikolas Scholz’s answer to Why are standard deviations calculated the way they are? I understand the method (subtract the mean from each value, then get the square-root of the mean of the squared differences), but what fundamental principle does this method derive from? – Quora

(3) Alan Cooper’s answer to Question: A bar of length ℓ is broken into three pieces at two random spots. What is the probability that the length of at least one piece is less than ℓ/20? Can anyone hlp me? Thks a lot – Quora

Standard

The question is ill-posed because there are many different ways of choosing two spots “at random”. BUT if we assume that the cuts are made independently with each chosen according to a uniform probability per unit length then the answer is the fraction of the big square that is not shaded in the diagram below . (ie p = 1-(17/20)^2)

Source: (3) Alan Cooper’s answer to Question: A bar of length ℓ is broken into three pieces at two random spots. What is the probability that the length of at least one piece is less than ℓ/20? Can anyone hlp me? Thks a lot – Quora

(3) Alan Cooper’s answer to Alice and Bob flip a biased coin, best [math]n[/math] out of [math]2n-1[/math] win. If the probability of Alice winning a flip is [math]p[/math], what is her chance of winning the series? – Quora

Standard

Since this is tagged with “Puzzles and Trick Questions” it may be that I am missing something. But my answer would be [math]\Sigma_{m=0}^{n-1}p^{2n-1-m}(1-p)^m[/math] .

This follows the pattern of the best 2 out of 3 case where Alice has to win either two or three games – which happens in cases lww,wlw,wwl or www with probability [math]3p^2(1–p)+p^3=3p^2–2p^3[/math] (where the fact that the game may be stopped when she wins twice just corresponds to the fact that [math]pp(1-p)+ppp=p^2[/math] , and the same answer is obtained by taking the complement of the cases where Bob wins either 2 or 3 games).

Source: (3) Alan Cooper’s answer to Alice and Bob flip a biased coin, best [math]n[/math] out of [math]2n-1[/math] win. If the probability of Alice winning a flip is [math]p[/math], what is her chance of winning the series? – Quora

(3) Alan Cooper’s answer to How do I find the period of [math]e^{ix}[/math] without using trigonometry? – Quora

Standard

This question has been around for a while and has some decent answers. But I want to suggest a simpler and more intuitive version. (And it will be easier to follow if I replace the variable [math]x[/math] by [math]t[/math] so as not to confuse it with the real part of the complex function value.)

First, to define [math]f(z)=e^z[/math] without using trigonometry or ever mentioning trig functions, we can use either the power series or the complex differential equation [math]f’=f[/math] with [math]f(0)=1[/math]. And either way we get [math]\frac{d}{dt}e^{it}=ie^{it}[/math].

Now multiplication by [math]i[/math] just rotates the complex plane by a right angle, so the curve in the plane given parametrically by [math](x(t),y(t))[/math] with [math]x(t)+iy(t)=e^{it}[/math] has a tangential velocity vector which is always perpendicular to its position vector and equal in magnitude.

Since it starts at [math]t=0[/math] at [math](x,y)=(1,0)[/math] the curve is just the unit circle centred at the origin.

And since its velocity vector is always of length 1, if we think of the parameter [math]t[/math] as representing time, then the point moves with speed 1 and so the time taken to complete a circuit, ie the period of [math]e^{it}[/math], is just the same as the circumference of the unit circle (commonly denoted by [math]2\pi[/math] ).

Source: (3) Alan Cooper’s answer to How do I find the period of [math]e^{ix}[/math] without using trigonometry? – Quora

Hello world!

Standard

Welcome to Alan’s Math Notes. This is where I am planning to restore and make available various on-line notes and learning resources that I either developed myself and/or found useful and freely available from other sources.