Bad writing advice: write the introduction first!

A classic advice about writing papers and books is to write the introduction last. I must admit that it makes excellent sense, and in fact, I’m sure I’ve told as much to students. However, I find that I’m usually sorely tempted to write the Introduction first, and that I end up doing this quite often (especially when the project involved is not a joint paper).

There is an advantage in this approach: if I write the introduction early, most often I do not know the precise technical statements that will come out of the arguments, so I am forced to try to explain the motivation, the main points and the qualitative interest of the paper, instead of focusing on the minutia of the actual theorem, which may well be of less importance. Of course, this is partly a consequence of working in a field (analytic number theory) where it is very frequent that the final theorem involves (for instance) some parameters whose value is not particularly important, but where it is instead crucial that it is positive, etc. Some other fields afford much cleaner statements: something like “for every elliptic curve E/Q, the group E(Q) is an abelian group of finite type”, or like “two compact hyperbolic manifolds M and N of dimension at least 3 are isometric if and only if they have isomorphic fundamental groups” can not really be made clearer by trying to focus on any larger picture…

The disadvantages, on the other hand, are in fact quite real: one may write and polish with enthusiasm an introduction (so it becomes suitable for a O’Henry award) only to realize when coming to the point of writing the proofs that a fatal mistake lurked somewhere in the arguments only sketched previously. Or one may find new ideas or points of view when writing the proofs in question that lead to a complete change of emphasis of the paper (e.g., going from proving a special case of a statement to a more general one), and require a complete overhaul of the finely chiseled prose of the already completed introduction…

Indeed, both have happened to me, except of course that the literary quality of my drafts are far from deserving any award. The elephant cemetery section of my LaTeX directories contains at least three sad and melancholy beginnings of papers that will most likely never be revived, and I don’t know how many times I ended up re-working the introduction to my book on the large sieve (the final version of which states, quite accurately, that this project started as a planned short paper on extending previous results about the large sieve for Frobenius over finite fields to work in small sieve contexts…)

More mathematical terminology: friable

Today’s terminological post will be a contribution to the French-led insurgency that tries to replace the denomination “smooth number” (or “smooth integer“) with the much better “friable number” (in French, “nombre friable” instead of “nombre lisse“).

Of course, many readers may wonder “what is this anyway?”. And part of the point is that the better choice may lead such a reader to guess fairly accurately what is meant (possibly with a hint that this has to do with multiplicative properties of integers), whereas playing a game of “Define a smooth number” with a wide group of mathematicians may probably lead to wildly different interpretations.

So here is the definition: a positive integer n is called y-friable (or smooth, if you still insist) if all the prime divisors of n are at most y. The idea is that y should be much smaller than n, so that this means (intuitively) that n only has “small” prime factors. But the definition makes sense for all y, and for instance, any integer n is n-friable, a 2-friable integer is a power of 2, etc.

I do not wish to discuss the properties of those integers (only their name), so let me just refer to this survey by Granville for a discussion of their basic properties and of their applications to computational number theory.

The adjective “friable” (Capable of being easily crumbled or reduced to powder, OED) seems perfect to describe this type of integers: it is evocative and conveys not only something of the technical definition, but also a lot of the intuitive meaning and applicability. The other contender, “smooth”, has several problems (in fairness, it has at least one positive aspect: whatever we call them, the integers without large prime factors are extremely useful in many parts of analytic and algorithmic number theory, and the underlying current that smoothness is something desirable is not usurped): (1) it is much too overloaded (search for “smooth” without more precision in Math Reviews: 68635 hits as of tonight; for “friable”, only 19); (2) whichever meaning of smooth you want to carry from another field, it does not really mean anything here; (3) not to mention that, chronologically speaking, the terminology was already preempted by the smooth integers of Moerdijk and Reyes, which are the solutions of the equation sin π x=0 in the real line of suitable topoi (such as the smooth Zariski topos, apparently).

The chronology of the use of these words, as it appears from Math Reviews at least, seems quite interesting: the first mention it finds of “smooth numbers” in the number-theoretic meaning is in the title of a paper of Balog and Pomerance, published in 1992. However, the notion is of course quite a bit older: the standard paraphrase was “integers without large prime factors”, with many variants (as can be seen from the bibliography of Granville’s survey, e.g, A. A. Buchstab, “On those numbers in an arithmetic progression all prime factors of which are small in order of magnitude”, 1949; Balog and Sarkozy, “On sums of integers having small prime factors, I”, 1984; Harman, “Short intervals containing numbers without large prime factors”, 1991; etc : clearly, something needed to be done…).

As for “friable”, the first number-theoretic use (interestingly, the six oldest among the 19 occurences of “friable” in Math Reviews also refer to other contexts than number theory, namely some studies of models of friable materials, from 1956 to 1987) is in a review (by G. Martin in 2005) of a paper of Pomerance and Shparlinski from 2002, though “smooth” is used instead in the paper. The first occurence in a paper (and so, possibly, in print) is in one by G. Tenenbaum and J. Wu, published in 2003. It must be said that, for the moment, only French writers seem to use the right word (Tenenbaum, de la Bretèche, Wu, and their students)… G. Martin consistently uses it in his reviews, despite having to recall that this is the same as smooth numbers; however, he uses “smooth” in the title and body of his paper on friable values of polynomials (published in 2002, admittedly, and the abstract on his web page uses mostly “friable” instead…).

Beno Eckmann, 1917–2008

The Mathematical Department of ETH was saddened last week of learning of the death of Beno Eckmann, on Tuesday, Nov. 25, aged 91. He was one of the major historical figures in the department, linking us to the “classical” times of Polya, Weyl, Hopf and others by his long activity and involvement in the international mathematical community. In Zürich, part of his importance has been as the initiator and first director of the Forschungsinstitut für Mathematik in 1964.

He was a student of Heinz Hopf at ETH, obtaining his degree in 1942, and his genealogy lists 73 students and 1040 descendants (!), among whom can be recognized many well-known names from topology (e.g., M. Kervaire, 1956, or G. Mislin, 1968), analytic geometry (H. Grauert, 1956), algebra (e.g, M.A. Knus, 1967) and even probability theory (E. Bolthausen, 1973), and so on. (Part of the genealogy is presented in a more impressive way in this genealogical tree).

Although I didn’t meet him after my arrival in January, my colleagues have told me he was active until quite recently (and indeed, his last research paper on Math Reviews is dated 2004, though it is claimed to originate from a 2002 lecture..). His 90th birthday was celebrated last year at FIM.

Topologists in particular will probably enjoy some of the essays collected in these notes, in particular the reminiscences of the antesagittarian days of algebraic topology. (Apparently, the very first occurence of a “physical” arrow to indicate a map between spaces is to be found in a Research Announcement by W. Hurewicz from 1941).

The W(E_8) polynomial, graphically expressed

My colleague Richard Pink had the idea of illustrating the matrix I had found with F. Jouve and D. Zywina, whose characteristic polynomial has Galois group W(E8), by plotting it on a grid with colors to indicate the size of the coefficients. This representation was then constructed by Leopold Talirz using Matlab, with the following result:

Graphical view of matrix with W(E_8) characteristic polynomial

The square grid represents the matrix graphically, and the scale for colors is indicated on the right. I think this is an intriguing picture. Can anyone suggest an explanation for its structure? In fact, since the matrix arises by a product

$latex m=x_1x_2\cdots x_{16}$

it would actually probably be even more interesting to show the evolution of the matrix from the identity as more terms are added in the product (and even to continue beyond this matrix in a random walk, as this the motivation for the construction…). If this leads to a nice result, I’ll post it later…

English comparative and the sieve

One of my favorite constructions in the English language is that bizarre form of comparative that makes it possible to speak of the “Shorter Oxford English Dictionary”, without any mention of what this estimable dictionary (two long and heavy volumes…) is actually compared to. Does this grammatical construction have a name? Does it exist in other languages? Certainly it is completely inexistent in French, and makes for rather thorny translation puzzles: how should a number theorist translate, in French, the name of Gallagher’s remarkably clever larger sieve? [The construction is actually particularly twisted here, since the implicit comparison point of Gallagher is, of course, already known as the large sieve…]

For those readers who have never heard of the larger sieve, here is the idea and the explanation for the name (which is very clearly explained in Gallagher’s paper): recall that a basic sieve problem (for integers) is to estimate the number of integers remaining from (say) an interval

$latex 1,2,\ldots, N$

after removing all those n which, reduced modulo some prime p in some set (for instance, all those up to z=Nδ for some δ>0) always stay away from a given subset Ωp of primes: in other words, one wishes to know the cardinality of the sifted set

$latex S=\{n\leq N\,\mid\, n\text{ mod p}\notin \Omega_p\text{ for all }p\leq z\}.$

Classically (and also not so classically), the first examples were those were one tries to get S to be essentially made of primes, or twin primes, etc. In that case, the size of Ωp is bounded as p grows. There situations are called small sieves.

Then Linnik introduced the large sieve which is efficient for situations where the size of Ωp is not bounded, and typically grows to infinity with p: basic examples are the set of quadratic residues (or non-residues), or the set of primitive roots modulo p.

And then came the larger sieve: Gallagher’s method works better than the large sieve when Ωp is extremely large, so that the integers in S have few possible reductions modulo primes (roughly speaking, the larger sieve is better when the number of excluded classes is larger than half of the residue classes modulo p; so quadratic non-residues are borderline, and indeed both the large and the larger sieve give the correct upper bound — up to a constant — for the number of squares up to N). More precisely, Gallagher shows that

$latex |S|\leq N/D$

where

$latex N=\sum_{p\leq z}{\log p}-\log N$

and

$latex D=\sum_{p\leq z}{\frac{\log p}{p-|\Omega_p|}}-\log N,$

provided the denominator D is positive.

As the number of classes excluded increases, the efficiency of this inequality becomes extremely impressive: if

$latex |\Omega_p|>p-p^{\theta}$

with θ>0, the number of elements of S becomes at most a power of log(N), whereas the large sieve gives a power of N. For an arithmetico-geometric application of a new variant of the larger sieve in number fields in a situation where the numerology is of this type, you can read a recent paper of J. Ellenberg, C. Elsholtz, C. Hall and myself.

[I should mention that it was C. Elsholtz who first mentioned the larger sieve to me a few years ago: the method is not as well known as it should, since it is extremely simple — Gallagher deals with it in nine lines, and our version is not much more complicated, though it is a bit more involved since it works with heights in the number field to sieve elements which are not necessarily integers. The basic argument and its applications can provide excellent exercises and problems for any introductory number-theory course.]