Dickson and Robinson

I’ve profited from the AMS Publisher’s Sale to finally buy Dickson’s History of the Theory of Numbers and the Collected Works of Julia Robinson (for, respectively, 37$ instead of 146$ and 19$ compared with 61$, both are very good bargains), and I’ve started browsing randomly around both texts.

Looking in Volume I of Dickson work (which deals more particularly with “multiplicative” number theory and prime numbers in particular), I was happy to find a reference for the cute proof that there exist infinitely many prime numbers that follows from the combination of the following two facts:

(1) Euler’s identities (dating to around 1730)

$latex \prod_{p}{\frac{1}{1-p^{-2}}}=\sum_{n\geq 1}{\frac{1}{n^2}}=\frac{\pi^2}{6},$

(2) Legendre’s theorem: π2 is an irrational number (going back to around 1790).

I only heard this proof in 2005 (in a lecture by Iwaniec), and although I’ve asked a few people, no one seemed to know how far back this went. Dickson mentions an obscure survey of J. Braun in Wiss. Beilage Jahresbericht, Gymn. Trier, 1899, who

… attributes to Hacks a proof by means of Π(1-1/p2)-1=Σs-22/6 …

(Chapter XVIII, page 414, of Volume I). Nothing is said here about who Hacks was, but he reappears on page 427 for a paper in Acta Mathematica of 1893, where he proved for instance that for a prime p, we have

$latex \sum_{y=1}^{p-1}\sum_{s=1}^{p-1}\left[\frac{ys}{p}\right]=((p-1)/2)^2(p-2),$

and moreover this is characteristic of primes. (He is also referenced in a few other places in Volume I for similar facts).

Of course, although somewhat pointless, this argument is “obvious” enough that is it possible (in fact, likely) that this was discovered independently many times. But at least, I’m happy to know that this was noticed a long time ago.

A few people have (also independently) wondered if this proof could be made quantitative. The idea is to use explicit measures of irrationality of ζ(2): for some constants C>0 and α, we have

$latex (*)\ \ \ \ \ \ \ \ \ \ \ \ |\zeta(2)-p/q|\geq Cq^{-\alpha}$

for all coprime integers p, q. Then we look at the numerators and denominators of the Euler product over primes < X, and compare the quality of the rational approximation obtained with what is permitted by (*) (the best known version of this holds with α=5.441243…, due to Rhin and Viola).

This is also clearly art for art sake, but amusing facts emerge. First, the known arguments leading to (*) use information about primes, typically of the same quality as the Chebychev bounds. Thus the argument is dangerously close to circular, because the consequences are typically weaker! However, Miller, Schiffman and Wieland have looked at the arguments of Rhin and Viola and played cleverly with it to obtain non-circular consequences about the distribution of primes. The second amusing fact is that it is in a sense much better to prove that there are infinitely many primes using the irrationality of

$latex \frac{1}{\zeta(2)}=\prod_{p}{(1-p^{-2})}=\prod_{p}{\frac{p^2-1}{p^2}},$

because the denominator is not as hard to control: this was the idea of J. Sondow (although he didn’t deduce any explicit consequence, it follows that using an irrationality measure one gets

$latex \pi(X)\gg \log\log X,$

which my own trial only produced by appealing to Linnk’s theorem on the size of the smallest prime in an arithmetic progression — a much deeper theorem than even Chebychev’s bound!)

As for Julia Robinson, I am not even an amateur logician, and I don’t understand her purely logical works, but she is renowned for works that mix logical ideas and number-theoretic results, and these may be appreciated without much logical background.

Her contributions to the solution of Hilbert’s 10th Problem are probably the best known, but not far from this is her remarkably beautiful characterization of the integers as a subset of the rationals in the language of first order theory of rings. In other words, what she found was a logical formula φ(x), with one free variable x, involving only addition, multiplication, 0, 1, and the logical connectors “and”, “or”, “not” and quantifiers “for all” and “there exists” (which are interpreted as ranging only over variables restricted to the rationals), such that, given a rational number x, the formula φ(x) is “true” (with the standard interpretation of the various logical constructs) if and only if x is in fact an integer.

This is not easy at all; in fact, here is the formula in her original work (part of her PhD thesis; there may, of course, be simpler versions known today):

$latex \forall a,b\ (\{\psi(a,b,0)\ \text{and}\ \forall m\ [\psi(a,b,m)\rightarrow \psi(a,b,m+1)]\}\rightarrow \psi(a,b,x))$

where ψ(a,b,n) is the formula

$latex \exists x,y,z\ (2+abn^2+bz^2=x^2+ay^2).$

Proving that it does what it is supposed to do involves some substantial number theory (namely, the Hasse principle for quadratic forms in three variables with rational coefficients), but is a beautiful elementary argument apart from that.

Inaugural lecture

A nice tradition of ETH Zürich (and some other institutions) is the inaugural lecture that newly hired members of the faculty give to describe to a large audience some of their research topics (indeed, the lectures are open to the general public). I just gave my own Einführungsvorlesung, entitled “The art of sieving”. The actual lecture can be seen online, but it may also be interesting to have a look at the full beamer presentation, since I ended up presenting only about three-quarters of what I had prepared, and some of the most interesting things (at least, the closest to current research) were located towards the end. (Precisely, I stopped at the end of page 74 of the PDF).

Of course, my lecture does not compete — in terms of sheer viewability value — with the presentation on Monday of Marc Pollefeys, who showed the current state of the art of 3D reconstruction of objects and scenes using (2D) video and pictures, or with the (possibly apocryphal)  inaugural lecture of P. L. Nageoire for the Chaire d’études des alcools forts of Collège de France.

I still hope that some of the excitement of studying primes will come through. Encouragingly, two colleagues who are now retired, and who were not previously working in “Pure” Mathematics told me during the Apéro afterwards that they are happy to study prime numbers now as a hobby, and had liked my lecture.

Relations betweens roots of polynomial equations

It does not seem to be so well-known that there can be only very few linear relations between the roots of an irreducible polynomial which has a “large” Galois group (in a certain sense, which is made clearer below). For instance, given a monic polynomial P in Q[T], written

$latex P(T)=T^d+a_{d-1}T^{d-1}+\cdots+a_1T+a_0,$

it is easy to create a polynomial Q of the form

$latex Q(T)=T^d+b_{d-2}T^{d-2}+\cdots+b_1T+b_0$

whose roots generate the same field. If βi are the roots of Q, the vanishing of the coefficient of Td-1 means that they satisfy the Q-linear relation

$latex \beta_1+\cdots+\beta_d=0.$

If the Galois group of the splitting field of Q (or P) is the full symmetric group on d letters, then in fact this is the only possible relation (up to multiplication by a fixed scalar)!

In other words, it is impossible, for instance, that the roots αi of P satisfy

$latex \alpha_1-\alpha_2+\cdots +(-1)^d\alpha_d=0,$

or

$latex \alpha_1+2\alpha_2+\cdots +d\alpha_d=0.$

Proving this type of fact can be seen either as a fun exercise in the representation theory of finite groups, or as a good way of motivating its usefulness. (The basic knowledge needed is the existence of decomposition into irreducibles, and unicity of the isotypical space corresponding to a given irreducible representation; the treatment by J-P. Serre is probably the clearest introduction).

Indeed, if the question was presented to a student in the form of showing that either of the previous two relations are incompatible with the αi being roots of a polynomial with symmetric Galois group G=Sd, it is likely that it would be natural to expand the hypothetical relation by stating that, for any σ in G, we have (for the last case, say):

$latex \sigma(\alpha_1)+2\sigma(\alpha_2)+\cdots +d\sigma(\alpha_d)=0.$

If the Galois group is indeed Sd, we can choose σ so that

$latex \sigma(\alpha_1)=\alpha_2,\ \sigma(\alpha_2)=\alpha_1$

and the other roots are fixed; then subtracting the new relation from the first one, we get

$latex -\alpha_2+\alpha_1=0,$

which is impossible.

This was particularly easy, but to refute the possibility of all relations (except the sum of the roots being 0), something more systematic is needed. The type of argument used suggests to consider the vector space of all possible relations:

$latex R=\{(\lambda_1,\ldots,\lambda_d)\,\mid\,\lambda_1\alpha_1+\cdots+\lambda_d\alpha_d=0\}$

and the basic idea is that G acts on R by its action on the roots, in other words, R is a linear representation of G (over C, or a smaller field if one wants).

If we then take the point of view of trying to identify this representation among the representations of G, we are naturally led to remark that it is defined as a subrepresentation of the permutation representation of G acting on the roots

$latex F=\bigoplus_{1\leq i\leq d}{\mathbf{C}\cdot [\alpha_i]},\text{ with } \sigma([\alpha_i])=[\sigma(\alpha_i)],$

which, as a representation, doesn’t depend on the particular polynomial (and its roots), but only on the group G, as a transitive permutation group acting on d objects.

So, for any particular transitive group G acting on d objects, one can construct the representation F, decompose it into sums of irreducibles πi (this is a problem of group theory), and then it follows that R can only be isomorphic to a direct sum of those representations occuring in R, with multiplicities at most equal to that in F: if

$latex F\simeq n_1\pi_1\oplus\cdots \oplus n_k\pi_k,\ \text{ with } n_i\geq 1,$

then R can only be of the form

$latex R\simeq m_1\pi_1\oplus\cdots \oplus m_k\pi_k,\ \text{ with }\ 0\leq m_i\leq n_i.$

Then one can look at each possibility in turn, and try to see which ones can actually occur as relations of roots of a polynomial. This is clearly easiest if F has few summands, particularly if there is no multiplicity (i.e., if each ni is 1), for then the subspace corresponding to a summand that occurs in R must be equal to that in F, which is itself uniquely determined.

[If there is multiplicity, say n1=2, then in the π1-isotypic component F1 in F, there are infinitely many subspaces isomorphic to π1, with multiplicity one, for instance

$latex V_{\lambda}=\{(v,\lambda v)\,\mid\,v\in V\}$

for each (fixed) scalar λ, where V is a fixed subspace of F1, G stable and isomorphic to π1 under this action. Each of these subspaces can lead to different relations.]

In the case of the symmetric group, the decomposition of the natural permutation representation is well-known:

$latex F=\mathbb{1}\oplus \pi$

where 1 is the subspace generated by

$latex [\alpha_1]+\cdots +[\alpha_d]$

while the complement subspace

$latex \pi=\{(\lambda_1,\ldots,\lambda_d)\,\mid\, \sum_{i}{\lambda_i}=0\}$

is irreducible. Then, 1 is in R precisely when the sum of the roots is zero (and we already know that this possibility can occur), and π can not be (contained in) a relation module, because in particular

$latex (1,-1,0,\ldots,0)\in \pi$

which leads (as before, but now in full generality) to the absurd relation

$latex \alpha_1=\alpha_2.$

In my paper on relations between zeros of zeta functions over finite fields, I applied this method to understand the possible additive relations (and the multiplicative ones, using the same technique with the module of multiplicative relations instead of R) when the Galois group is the Weyl group of Sp(2g). Here also the situation was simple, with three summands in F, each with multiplicity 1.

In fact, experimenting with Magma, I can see that most transitive permutation groups of small degree seem to have the property of not having multiplicity in the permutation representation. The first counter-example is the symmetric group S3, acting on itself by multiplication (there is no multiplicity in the natural action on three letters); indeed, if G acts on itself by multiplication, the permutation representation is isomorphic to the regular representation, and each irreducible occurs with multiplicity equal to its dimension, so such a transitive action always has multiplicity if the group is non-abelian. But most transitive groups are given with an action on a smaller set, and one finds that among the 173 non-trivial transitive groups of degree at most 11, only 5 exhibit some multiplicity. (Note that it is not clear at all if this impression can be quantified precisely).

This type of problem has been investigated a lot by K. Girstmair, and this paper of his (in Acta Arithmetica, 1999) is probably the most complete introduction to the subject.

Projection to subspaces and a theorem of Chebychev

I just taught at ETH the basic result of the theory of Hilbert spaces that states that, given such a space H, a vector v in H and a closed, non-empty, convex subset C, there exists a unique vector y in C such that

$latex ||v-y||=\inf_{w\in C}{||v-w||},$

and which deserves to be called the orthogonal projection of v on C, as the picture tries to illustrate:

Projection

Having done this, I looked for counterexamples to illustrate that this natural statement does not extend to the Banach space setting; it is easy enough to find situations where the minimum is achieved more than once (in fact, infinitely often), but slightly more challenging to find one where the minimum is not achieved at all. Here is one (it was in one of the books I use for my course): consider the Banach space c0 of sequences (indexed by n>0) of complex numbers converging to 0, with the sup norm

$latex ||(u_n)||=\sup_{n\geq 1}{|u_n|}.$

Consider then the linear map

$latex \lambda(u)=\sum_{n\geq 1}{2^{-n}u_n}$

on c0. It is clear that this is a continuous linear map since it is bounded by 1 on the unit ball, and a moment’s thought shows that in fact we have

$latex |\lambda(u)|<1\quad\text{ if } \quad ||u||\leq 1$

because equality would require that u be a constant sequence, and this is impossible if the sequence is to converge to zero, except (of course) when the constant is zero, but then λ(0)=0 anyway….

This is a case where a supremum is not achieved, and so it is easily transformed to what we want by taking, e.g., v=0 and the convex set

$latex C=\{v\in c_0\,\mid\, \lambda(v)=1\}.$

Then there does not exist a vector in C minimizing the distance to the origin.

Now, this is probably not too surprising to anyone, but while preparing these counterexamples for exercises for the students of my course, I was reminded of a beautiful approximation result of Chebychev which is probably not well-known (outside the numerical analysis community, where it seems to be standard knowledge) and which gives a particular (important) case outside of the Hilbert space context where the minimization process works.

Consider the (real) Banach space B of real-valued continuous functions on [0,1] with the supremum norm (i.e., the norm of uniform convergence of functions). Fix then a degree d>0. In B, the subspace Pold of real polynomials of degree at most d is a closed (non-empty) convex subset. The approximation problem is therefore, for a given continuous function f, to find the polynomials P of degree at most d which minimize the distance to f:

$latex ||f-P||=\sup_{0\leq x\leq 1}\ \ |f(x)-P(x)|$

must be equal to

$latex D_d=\inf_{P\in \text{Pol}_d}{||f-P||}.$

It is not very difficult to prove the existence of at least one such polynomial P (the idea being to show that the minimization can be restricted to a compact subset of the subspace Pold). It is more difficult to prove unicity, and even more subtle to find the following characterization (which was found by Chebychev): a polynomial P (of degree at most d) is the best approximation to f in this sense if and only if, there exist at least d+2 points, say

$latex x_1<x_2<\ldots<x_n,\quad n\geq d+2,$

such that

$latex |f(x_i)-P(x_i)|=||f-P||,\quad\text{ for all } i,$

and moreover the signs of f(xi)-P(xi) alternate when i increases. Below, I call the existence of points with these two properties the “d+2 alternation property”; note that it says nothing about the size of the distance from f to P. Here is an illustration:

Chebychev approximation

the red function f is the cosine function, and the blue graph is the best approximation of degree (at most) 3; the gray line is the difference cos(x)-P(x), and the 5 alternating extrema are clearly visible. (I used Maple to construct the approximation, and Sage to plot the graph).

The unicity seems to be typically proved by first showing that there is at most one polynomial with the d+2 alternation property (which is fairly easy; in fact, alternation with d+1 points would be enough), and then showing that minimizing polynomials are characterized by it. The most technical part is the proof that a minimizing polynomial has the required property (if there are less than d+2 alternations, one shows how to improve the approximation).

But the converse (an alternating polynomial is minimizing) looks also quite difficult at first since, as we observed, the defining condition doesn’t say anything about the size of

$latex ||f-P||,$

only that it is achieved at d+2 points. But here is the trick: suppose another polynomial Q (of degree at most d) is such that

$latex (*)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ ||f-Q||<||f-P||.$

Then, at the points xi, the difference Q-P has the same sign as f-P (by writing

$latex Q(x_i)-P(x_i)=(f(x_i)-P(x_i)) – (f(x_i)-Q(x_i)),$

and the second summand is not big enough, by (*), to provoke a change of sign). By the assumption of alternation of signs of the d+2 values

$latex f(x_i)-P(x_i),$

(which hasn’t been used yet), this means the polynomial Q-P, of degree at most d, has at least d+1 changes of signs, hence that many zeros, so we must have Q=P, which is a contradiction.

Incidentally, the only reason I know this result is that it was the subject of one of the entrance exams at the French École Polytechnique, and that this exam text was (because of this) used by the teacher of my Math. Spé for one of our weekly take-home exams; the last questions were set up in a challenging way (which I don’t remember, but it must have been the proof of the clever part above), and the result was fixed in my mind because I succeeded in finding the trick…

For a full account, see for instance Analysis of Numerical Methods, by Isaacson and Keller (which I only have in my personal library because it was in the bargains bin in the English bookstore in Bordeaux one day, for some mysterious reason).

Reading Buffon, I

I’ve started reading the volume of Buffon’s Histoire Naturelle where the Buffon needle problem is discussed, partly out of curiosity (why is it called Essai d’arithmétique morale? why is it included in a book of natural history?) and partly, I must say, because the book’s format is remarkably convenient to put in a coat pocket and read while travelling in the Zürich tramways. (It’s certainly much easier to carry along than any modern scientific book I’ve ever seen).

A few pages suffice to understand the meaning of the title and, at least, why probability theory is related. Buffon is interested here in various types of “truths” or “certainties” (vérités et certitudes, in French). In fact, he distinguishes three kinds: (1) intellectual (geometric or mathematical) certainties; (2) physical certainties; and (3) moral certainties, and the discussion of the third kind is really the topic of his essay (and probably it is indeed the most original — of course I am not knowledgeable about his possible precursors here, but whereas I could recognize at least partly his ideas on the first two kinds of truths, I do not remember encountering something similar to the third one before).

Geometric certainties, for Buffon, are those of mathematics. He states that there is not much to be said about them, because they refer, or derive, ultimately, from accepted truths (axioms, though I didn’t see the word in the text) using obvious rules of inference (… (elles) se réduisent toutes à des vérités de définition). In effect, he takes the point of view (which I’ve seen in other places) that mathematics, properly considered, is tautological and really should be trivial, except if irrelevant or badly-posed questions are raised. (I’ll probably come back to this, and other mathematical topics in the essay, because there are amusing remarks there).

Physical and moral truths are different because they refer to the “real” world. The distinction between the two is that a typical physical certainty is that “the sun will rise tomorrow”, which is knowledge based on experience and considered to be “true” because it has been verified in a very large number of cases, going back to the beginning of the world. Moral certainties, on the other hand, are based on much less data, and are more a matter of personal conviction and analogies.

Where things really become interesting is when Buffon decides to quantify these two types of truths, which are not absolutely certain, by assigning a numerical probability to them — indeed, this is were probability theory comes in. Geometric certainties have absolute probability (l’évidence n’est pas susceptible de mesure), being immune to any kind of doubt. But Buffon says that physical certainties are not absolute, rather they are those which have a probability comparable to that of the sun rising tomorrow (puisque le Soleil s’est toujours levé, il est dès-lors physiquement certain qu’il se lèvera demain). The latter, he estimates rather curiously, by (implicitly) assigning the probability 1/2 to each event “the sun rises” on any given day, and assuming that those events are independent. Since, he says, the sun has risen every day for about 6000 years (!), the probability of the negation of a physically certain event is about

$latex 2^{-6000\cdot 365+1}=2^{-2189999},$

and hence it is zero for any practical purpose. (It’s not clear what Buffon really thinks of this estimate of the age of the earth; he remarks that 2000 more years would not change anything to the qualitative feature of this bound, but doesn’t go further).

Now for moral certainties, things are much more interesting. Buffon states that he considers that a good estimate for the probability of the negation of a “morally certain” event is the probability p that a person, aged 56 years old and in good health, will die within the next twenty four hours. He argues that this is a correct value because people typically dismiss this probability by going about on their normal activities every day. So, he says, if some other event has probability less than p, it can be considered to be morally impossible, and therefore should be dismissed from consideration in any practical matter. (Or comme tout homme de cet âge, où la raison a acquis toute sa maturité, & l’expérience toute sa force, n’a néanmoins nulle crainte de la mort dans les vingt-quatre heures … j’en conclus que toute probabilité égale ou plus petite, doit être regardée comme nulle…)

Then Buffon goes about finding a value for p. For this, he has very detailed and scrupulous mortality tables and statistics (about 350 pages at the end of this volume, in fact), and from this he concludes that p is of size roughly 1/10189 (which he rounds up to 1/10000; note that “homme” means “person” in his discussion, since the mortality tables he uses do not distinguish between men and women). There is then an interesting footnote (click on the image below to see it on the right-hand page), where he informs the reader of the comments of Daniel Bernoulli on this matter: Bernoulli is said to have approved of the principle, but objected that the correct value should be 1/100000 instead, because one should take into account that sick people are included in the statistics, and are likely to be much more afraid of dying on a given day.

Extract of the “Essai d’arithmetique morale”

After establishing this value of p, more developments follow of how knowing this could (or should) affect people’s actions, and in particular a long discussion involving gambling, which is mathematically rather weird — so I’ll discuss it in the next post… (Readers may also, of course, still be wondering what the needle has to do with any of this…)

(Note: The tables at the end make for melancholy reading; Buffon observes that

One fourth of the human race perishes, so to say, before having seen the light, since about a quarter dies within the first eleven months of life… A third perishes, before the age of twenty three months, that is to say, before having had the use of their limbs, and of most of their other organs…

There is no indication that he thinks these numbers to be anything but “moral certainties” (he calls them vérités) — facts of life, that will not change during the course of humanity’s existence on Earth.)