The Mertens formula

The Mertens formula is the name given to the asymptotic determination of the behavior of the partial products of the Euler product for ζ(s) at s=1, where the zeta function has a pole: precisely, it is the statement that

$latex \prod_{p\leq x}{\Bigl(1-\frac{1}{p}\Bigr)}\sim \frac{e^{-\gamma}}{\log x}$

as x tends to infinity (the product being of course restricted to primes), with γ being the Euler constant:

$latex \gamma=\quad \lim_{x\rightarrow +\infty}{\Bigl(\sum_{j\leq x}{\frac{1}{j}}-\log x\Bigr)}=0.57721566490153286060651209008240243104\ldots$

This formula has a special place in elementary prime number theory in that, although the asymptotic behavior is not so deep (at the level roughly of Chebychev-type estimates), and it is very easy to prove, from the Prime Number Theorem, in the form

$latex \prod_{p\leq x}{\Bigl(1-\frac{1}{p}\Bigr)}\sim \frac{e^{-c}}{\log x}$

for some constant c, the fact that this constant is none other than γ is not a purely formal consequence of the Prime Number Theorem. And if one looks at the standard treatments (e.g., in Hardy & Wright, and in Tenenbaum’s book, which follow more or less the same reasoning), it’s hard to come back with a clear explanation of why this is so, and what this constant may possibly mean.

I’ve been thinking about this the last few days because of something that looked very much like a serious problem in my joint work with A. Nikeghbali (which I’ve described earlier — the paper will be posted soon) on analogies between random matrices, zeta, and the Erdös-Kác theorem. This was pointed out to us by P. Bourgade: in the post above, I said

Well, one can clearly expect a finite-field version of the mod-Poisson convergence for ω(f), where f ranges over monic polynomials of degree d tending to infinity over a fixed finite field:

$latex \frac{1}{q^d}\quad\sum_{\deg(f)=d}{e^{iu(\omega(f)-1)}}\quad\sim_{d\rightarrow +\infty}\quad \exp((\log d)(e^{iu}-1))\gamma_q\Phi_1(u)\Phi_3(u),$

where there is an innocuous constant γq>0, the function Φ1 is the same as before (i.e., it comes from random permutations), and

$latex \Phi_3(u)=\prod_{\pi}{(1-1/q^{\deg(\pi)})^{e^{iu}}(1+e^{iu}/(q^{\deg(\pi)}-1))},$

where the product runs over irreducible monic polynomials in A (so it is completely analogous to the Euler product for Φ2…)

and although I didn’t write down the “innocuous” constant, its value, as we had it in our paper, was given by

$latex \log \gamma_q=\sum_{\pi}{\Bigl(\log(1-\frac{1}{q^{\deg(\pi)}})+\frac{1}{q^{\deg(\pi)}}\Bigr)}+\sum_{j\geq 1}{\Bigl(\frac{\Pi_q(j)}{q^j}-\frac{1}{j}\Bigr)}$

where the sum ranges over irreducible monic polynomials in Fq[T] and

$latex \Pi_q(j)=\sum_{\deg(\pi)=j}{1}$

is the number of such polynomials of degree j.

The problem with this, as Bourgade pointed out, is that if one takes u=0, then obviously everything that depends on u can be evaluated, and the purported formula becomes

$latex 1=\gamma_q,$

and this doesn’t sound reasonable at all at first sight, in view of the expression we have for γq. In particular, because the origin of this constant is really the (or an) analogue of Mertens’s formula for Fq[T], namely

$latex \prod_{\deg(\pi)\leq d}{\Bigl(1-\frac{1}{q^{\deg(\pi)}}\Bigr)}\sim_{d\rightarrow +\infty} \gamma_q\exp\Bigl(-\sum_{1\leq j\leq d}{\frac{1}{j}}\Bigr)\sim \gamma_q\frac{e^{-\gamma}}{d},$

this sounded particularly implausible (to me at least) since it did not seem reasonable to expect that the very much simpler arithmetic of polynomials over a fixed finite field could lead so easily to the same highly transcendental-looking constant that occurs for the ordinary primes: recall that the analogue of the zeta function is the much simpler function

$latex \zeta_q(s)=\frac{1}{1-q^{1-s}}.$

So we looked for mistakes elsewhere, and this being the nature of preprints, we found a few minorish ones around this part of the proof, which however didn’t lead to any resolution of that problem.

And in fact, there is no problem: it is true that γq=1 for all q. It is psychologically amusing that I convinced myself of this not by searching Google for “Mertens function field” (which immediately gives suitable references), but by deciding to test numerically the behavior of the expressions

$latex \exp\Bigl(\sum_{1\leq j\leq d}{\frac{1}{j}}\Bigr)\times \prod_{\deg(\pi)\leq d}{\Bigl(1-\frac{1}{q^{\deg(\pi)}}\Bigr)}$

for various d and q: the convergence to 1 is unmistakable.

At that point I did look for references (it being even less likely to be new than the remark on the spectrum of multiplication operators). Possibly from “post-internet bias”, the earliest reference found is a paper of M. Rosen from 1999, in the J. Ramanujan Math. Soc., which unfortunately I have not been able to access. However, at least three other papers contain a proof (but I have only looked at two of them): this one by M. Car (2007), this one by P. Lebacque, and a possibly unpublished version by K. Conrad which is quoted by M. Car. Judging from the (short) Math Review, it seems that Rosen adapts the classical proof, and M. Car certainly does the same.

However, having these references is only a double-check: since our original argument was, in fact, correct, the remark of Bourgade amounts to saying that we have a (different) proof of the Mertens formula for polynomials over a finite field, in the form above:

$latex \prod_{\deg(\pi)\leq d}{\Bigl(1-\frac{1}{q^{\deg(\pi)}}\Bigr)}\sim_{d\rightarrow +\infty} \exp\Bigl(-\sum_{1\leq j\leq d}{\frac{1}{j}}\Bigr).$

And because of the way this is done, we understand much better where the constant comes from. In fact, there is really no constant: the right quantity to consider for product over irreducible polynomials of degree at most d is

$latex \exp\Bigl(-\sum_{j\leq d}{\frac{1}{j}}\Bigr),$

and for primes up to x, it is

$latex \exp\Bigl(\quad-\sum_{1\leq j\leq \log x}{\ \ \ \ \ \frac{1}{j}}\Bigr).$

And this factor is explained by the similar behavior of random permutations: it arises because the probability that a random permutation of n letters has no cycle of length at most d is approximately

$latex \exp\Bigl(\quad-\sum_{j\leq d}{\ \ \ \ \ \frac{1}{j}}\Bigr)$

when n gets large. (For d=1, this is the density of derangements, which is well known to be asymptotically 1/e).

In fact, this should have led us to understand immediately that our constant γq has to be 1, since this explanation of the way the Mertens formula looks is explained (by analogy with permutations, but without proof) in the survey of Granville which I had already mentioned in the earlier post (see the “Academic’s aside” on the bottom of page 5, and the discussion before). Our own proof is very much related: we relate polynomials with no small factors to permutations without short cycles, and use suitable equidistribution statements to get the same density.

Power function

Today’s Word of the day for the OED is Power function (go there fast, as the page will disappear after one day!).
The first quotation given is from 1914, which seems a bit late, so it wouldn’t be surprising if simple searches were to yield earlier instances (as happened for algorithms).

Workflow

I have to admit I am still not quite happy with my typical workflow when I write my posts. The web-based text editor of WordPress (as it exists on the ETH blogs, at least) is not very convenient (or efficient), but on the other hand writing HTML in emacs, as I would like to do, doesn’t work so well because it is does not seem possible to preview the inline LaTeX formulas from emacs.

So I’ve looked for alternatives. And I was surprised to see that on Linux, it is not so easy to find a solution. There are a number of blog editors, which would be more or less satisfactory (including, of course, an emacs package…), except that most seem to take the point of view that one should type a blog post entirely, then publish it, and then make only minor corrections (if any). But I usually try to save locally what I type after each line (almost), and I consider it important to be able to such local copies for backup and reference purpose.

After trying a few programs, the only one I found to work reasonably (and that I use now) is the fairly obscure QTM. The text editor is not the best in the world (it has a search feature, but no replace), but one can save the posts to text files (which contain some very easily understandable and removable metadata), and publish them separately later on. One fairly obvious feature of the editor (which is unaccountably missing from the WordPress editor) is the possibility to insert a link — for which the target has been previously selected in a web browser — by a simple command (CTRL+U), after highlighting the words that should refer to the link. (On WordPress, the highlighting has the effect or replacing the link address in the clipboard with the linking words; this is standard X11 behavior, of course, but here this turns out to be a drawback).

Of course, the program itself does not preview the LaTeX formulas, but this can be done decently enough by sending the post as draft to the ETH blog. A bit more annoying is the fact that I don’t see how to insert pictures in the WordPress account from QTM itself, and then refer to them directly from its editor. So sometimes I write posts without pictures and add them in from WordPress later.

There’s also an apparent bug where sometimes the publishing date sent to WordPress gets wrong, and I have to manually change that. I haven’t tried to investigate deeply if I can locate it and correct in the source code.

150 and 36

The 150th Birthday of Darwin’s “On the origin of species” is being amply celebrated all over the world, but let’s not forget another scientific milestone sharing the same birthday: the Riemann Hypothesis was formulated 150 years ago, in Riemann’s famous short paper Ueber die Anzahl der Primzahlen unter einer gegebenen Grosse

in the Monatsberichte der Berliner Akademie, November 1859. This should also be celebrated in various places, one of which will be a conference held in Verbania, Italy, later this month, which I am very much looking forward to.

I will give a lecture during the conference on the topic of Some aspects and applications of the Riemann Hypothesis over finite fields. It’s during the last day, so I must admit I haven’t yet gone very far in selecting exactly which topics I will mention.

However, this led me to have another look at Deligne’s paper giving the first proof of the Riemann Hypothesis in the form conjectured by Weil. And this brings me to the second topic of this post: by a nice coincidence, Deligne’s paper

is essentially contemporary with Szemerédi’s paper

proving the existence of arbitrarily long arithmetic progressions in sets of integers with positive density. (The publication years are different, but both were submitted in 1973, Szemerédi’s on July 1, and Deligne’s on September 20, and Deligne states in the introduction that he had lectured on the proof in July in Cambridge).

To my mind, it’s hard to imagine a more striking illustration of the varieties of the arithmetic experience than these two papers. At the time, and probably still today, the number of readers able to understand both of them in depth must have been dangerously close to zero (as far as I’m concerned, I’ve looked at Deligne’s quite closely, but I still have no serious experience with Szemerédi’s theorem, alas). Both are well-known to have been the source of inspiration for many people, and new proofs of their main results have been found, and have also had enormous importance.

The two papers can also be taken as good experimental tests (at least in number theory) of the “Two worlds of mathematics” issue: ask anyone which one he or she prefers (or better, which one he or she would rather have written…), and from the answer you can probably guess pretty accurately whether the person in question considers him(her)self to be of the theory-building or problem-solving kind…

I may as well state that I am not, myself, quite sure if this type of distinction really makes sense. After all, Deligne ends his paper with two applications which can be stated completely concretely: there is the Ramanujan-Petersson conjecture, which for the Ramanujan delta function is just the statement that

$latex |\tau(p)|\leq 2p^{11/2}$

for the coefficients at prime indices of the (formal) power series

$latex \sum_{n\geq 1}{\tau(n)X^n}=X\prod_{n=1}^{\infty}{(1-X^n)^{24}},$

and there is his estimate for exponential sums of the type

$latex \left|\sum_{1\leq x_1,\ldots,x_k\leq p}{\ \ \ \ \ \exp(2i\pi P(x_1,\ldots,x_k)/p)}\right|\leq (d-1)^kp^{k/2}$

if the polynomial P in k variables of degree d has the property that the zero set of the homogeneous part of degree d defines a smooth hypersurface (take

$latex P(x_1,\ldots,x_n)=x_1^d+\cdots +x_n^d+x_1\cdots x_n+x_1^2-1,$

and n>1 if you want to try your hand at a specific case…)

It is probably another interesting poll, from the sociological point of view, to ask: “Which do you think is deepest?” (Partly because so few answers will come from people who can really judge, so there will be a divide between those who answer, say: “Deligne’s”, because they understand it better, and the other is just combinatorics, after all; and those that say: “Szemerédi’s”, because they understand Deligne’s work better, and therefore the mysterious incredible combinatorial games and the theorem of Szemerédi must be mystically deeper).

This question of depth, interestingly, was probably considered “solved” at the time the papers appeared: compare the (well-deserved) long and detailed review of Deligne’s paper by N. Katz (“This is without question the most important paper in algebraic geometry to have appeared in the last ten years… Deligne has proved the Riemann hypothesis for varieties over finite fields!”), with the seven lines devoted to Szemerédi’s paper (still laudatory, of course: “By an exceedingly ingenious and complicated elementary method the author proves the following theorem, thus settling a celebrated conjecture…”).

Here are two further remarks on these papers: (1) both authors acknowledge that they wrote the papers using auditor’s notes (Katz, in the case of Deligne, and Graham and Hajnal, in the case of Szemerédi’s); and both have quite short reference lists (15 references for Szemerédi, and 8 for Deligne).

Finally, I hadn’t realized that Szemerédi’s paper was published in the volume of Acta Arithmetica in memory of Linnik. Since, like many analytic number theorists, I believe that Linnik is one of the greatest mathematicians of the 20th century, but is somewhat under-appreciated (compared with his achievements!), I found this a well-deserved tribute.

From the “This can’t be new” department

Here is a fundamentally obvious observation which must be well-known, but which seems to be overlooked in the three or four texts of functional analysis I’ve looked at (for instance, here). This came up during my Spectral Theory class). Consider a (bounded) multiplication operator Mg acting on

$latex H=L^2(X,\mu),$

where (X,μ) is a finite measure space, i.e., we have

$latex M_g(\varphi)=g\varphi\text{ for all } \varphi \in H,$

where the multiplier g is a measurable bounded function. A basic question is to understand the spectrum of Mg, which is the set

$latex \sigma(M_g)=\{\lambda\in\mathbf{C}\,\mid\, (M_g-\lambda)\text{ is invertible}\},$

where “invertible” is meant in the Banach algebra of bounded/continuous linear maps from H to itself (which means the same as “bijective” here because Mg is continuous and H is a Hilbert space, so by the Closed Graph Theorem for instance, its set-theoretic inverse, if it exists, is continuous).

The intuitive answer is that the spectrum should be the image of g in C; however, this can’t be right in general, because the spectrum must be closed, and there is no reason that g(X) be closed. So the standard correct answer is that

$latex \sigma(M_g)=\mathrm{Essim}(g),$

the essential range (or image) of g, defined by

$latex \lambda\in \mathrm{Essim}(g)\text{ if and only if } \mu(\{x\,\mid\, |g(x)-\lambda|<\epsilon\})>0\text{ for all }\epsilon>0.$

This is a fine answer as it goes, but there is a feeling that one should minimize the number of definitions involving too many epsilons and quantifiers, if possible. And the observation is that this is perfectly possible here: this definition exactly means that

$latex \sigma(T)=\mathrm{Supp}\ g_*(\mu)$

is the support of the image measure; we recall that this measure is defined by

$latex g_*(\mu)(A)=\mu(\{x\in X\,\mid\, g(x)\in A\})$

for any measurable set A in C, and that a point λ belongs to its support if and only if any of its open neighborhoods, say the discs with radius ε>0 around it, have (strictly) positive measure, which exactly translates to

$latex (g_*\mu)(\{z\,\mid\, |z-\lambda|>\epsilon\})=\mu(\{x\,\mid\, |g(x)-\lambda|<\epsilon\})>0,\text{ for all } \epsilon>0,$

namely, this is equivalent to λ belonging to the spectrum of Mg, as claimed.

This coincidence means in particular that there is no need to try to remember how to prove that this essential range is closed in C: everyone knows that the support of something has to be closed…

Another point where this is useful is in the definition of the functional calculus for these multiplication operators: suppose

$latex f\,:\, \sigma(M_g)\rightarrow \mathbf{C}$

is given and is continuous. The (continuous) functional calculus defines a bounded operator f(Mg) in a natural way, which means that if f is a polynomial, this operator is the “obvious” one. Since one checks immediately by induction and linearity that

$latex f(M_g)=M_{f\circ g}$

if f is a polynomial, it is natural to expect the same formula to hold in general. The problem with this, is that it is not necessarily the case that

$latex g(X)\subset \sigma(M_g),$

since g is only measurable (and bounded), so the composition is not immediately well-defined. But the point is of course that

$latex \mu(\{x\,\mid\, g(x)\notin \mathrm{Supp}(g_*(\mu))\})=(g_*\mu)(\mathbf{C}-\mathrm{Supp}(g_*\mu))=0$

(using the characterization of the support as the complement of the largest open set with zero measure), so the composition makes sense “almost everywhere”.

In this interpretation, the spectral mapping theorem

$latex \sigma(f(M_g))=f(\sigma(M_g))$

becomes the fact that

$latex \mathrm{Supp}((f\circ g)_*(\mu))=f(\mathrm{Supp}\ g_*(\mu))$

(which makes sense because f is continuous and the support is compact here, so its image under f is closed, as it should be); this can be checked directly, of course.

Note: The reason why multiplication operators are important is, of course, that the spectral theorem shows that any normal operator on a Hilbert space is unitarily equivalent to an operator of this type.