Kowalski – Page 67 – E. Kowalski's blog

A “Buffon needle” for e

Suppose we want to compute the constant e (the basis of the natural exponential) in a probabilistic way, similar to the Buffon needle experiment that leads to a probability directly related to π; how should we proceed? (Of course, the game is only valid if we restrict in some way to solutions where the experiment itself is described without using e; otherwise, picking a real number between, say 0 and 3, uniformly and independently and checking whether it is at most e will lead to an uninteresting solution).

There’s at least two ways that I know of, one of which I learnt only recently from looking at a paper of Rényi in the journal Annales Scientifiques de l’Université Clermont-Ferrand 2, the archives of which were recently added to the outstanding Numdam collection.

But first, the one I already knew: it is based on the probability that a random (uniformly chosen) permutation σ in S_n has at least one fixed point. Indeed, using the inclusion-exclusion formula, one gets that this probability is exactly

$latex p_n=1-\frac{1}{2!}+\frac{1}{3!}-\cdots +(-1)^{n-1}\frac{1}{n!}$

and so

$latex \lim_{n\rightarrow +\infty}{p_n}=1-\frac{1}{e}.$

From the Strong Law of Large Numbers, it follows that this limit can be obtained (almost surely) by taking larger and larger values of n, and repeating the experiment of drawing independently and uniformly at random a permutation of n letters, and calculating the proportion of those that have a fixed point. However, note that I’m hiding something in this informal description: to be sure to have convergence almost surely, we must be careful to determine how many experiments to do with each n — otherwise, the convergence may fail to hold.

Another objection from the point of view of the analogy with Buffon’s needle, is that we are not repeating the same experiment here: we need to change the size of the permutations to reach the limit.

Here’s then Rényi’s process: consider an arbitrary experiment with result described by a real-valued random variable with a continuous density (e.g., picking a real number in [0,1] uniformly according to Lebesgue measure), and consider independent random variables distributed in this way

$latex X_1,\ X_2,\ldots, X_n,\ $

Then, among the sequence of values observed in such a sample, say that an index k corresponds to a rising point (élément saillant is the terminology used by Rényi, in French) if

$latex X_k=\max_{1\leq j\leq k}{X_j}.$

Almost surely, there will be infinitely many rising points and corresponding indices, denoted

$latex 1=\nu_1<\nu_2<\cdots <\nu_n<\cdots$

which are clearly themselves random variables. Here is Rényi’s theorem:

We have almost surely
$latex \lim_{k\rightarrow +\infty}{\sqrt[k]{\nu_k}}=e.$

(Rényi observes the amusing possibility of seeing this as analogue of the Buffon needle, but of course his paper contains more than this and is more interesting than this…).

The fact that the answer is universal among all possible (repeated) experiences with continuous density is maybe surprising, but in fact easy to see: if F is the distribution function, then

$latex Y_n=F(X_n)$

are independent random variables now uniformly distributed on [0,1], and the rising points for this new sequence are the same as those for the original one (since F is increasing).

Rényi’s proof is quite short, after the following re-interpretation: instead of studying ν_k, consider the “dual” random variables

$latex \mu_N=|\{n\leq N\,\mid\, X_n \text{ is a rising point}\}|$

which is at most k if and only if ν_k is strictly larger than N. Then the result is equivalent with

$latex \lim_{N\rightarrow +\infty}{\frac{\mu_N}{\log N}}=1$

almost surely. But Rényi observes that if we write

$latex \mu_N=\varepsilon_1+\cdots+\varepsilon_N$

where the i-th summand is the characteristic function of the event “X_i is a rising point”, then (somewhat surprisingly) those summands are independent Bernoulli variables with

$latex \mathbf{P}(\varepsilon_i=1)=\frac{1}{i}.$

Then, although those are not identically distributed random variables, a general version of the law of large numbers (due to Kolmogorov) suffices to show that

$latex \frac{\mu_N}{\mathbf{E}(\varepsilon_1+\cdots+\varepsilon_N)}$

converges almost surely to 1, and this is the desired statement.

New Yorker, bis

When I discussed the New Yorker digital edition, I mentioned that the inclusion of the whole magazine (and not only the actual “content”) just adds to its charm. Here is one I came across recently (and randomly) in the issue of November 19, 1960. It is so strange in some subtle and ironic sense that it could probably be a Shouts and Murmurs piece in the next issue of the Magazine without any eyebrows raising…

Images des mathématiques

A few years ago, the French CNRS had produced two nice glossy brochures for a general (scientific) audience to explain current developments in mathematics (similar in principle to the AMS “What’s happening in the mathematical sciences“). Some of the mathematicians involved have now continued this project analytically on the internet, in a very nice web site dedicated to

présenter la recherche mathématique — en particulier française — et le métier de mathématicien, à l’extérieur de la communauté scientifique

present mathematical research — in particular French research — and what mathematicians do, outside of the scientific community

It’s not clear when the web site was created: there are two articles dated November 2007, but most of the other content goes back no later than October 2008. In any case, there is already some very nice articles, which will not surprise those who see that Étienne Ghys is one of the persons most involved in the web site. One can discover a nice theorem of Falconer, introduced through the device of presenting a digital solar clock, and a beautiful discussion of the Menger curve, which begins with a remarkably accessible discussion of what is a curve, in topological terms, with illustrations of the covering definition. One notes a very nice feature of the web site: some slightly more technical descriptions or definitions are “hidden” behind a “Précisions” link (with turnkey arrow, as in outliner software). One can also see, in a billet written by A. Valette, the coat of arms of the Vicomte Deligne.

Finally, the articles of the original CNRS publications are also archived on the web site.

Blog software upgrade

As some readers may have noticed earlier, the underlying software of the ETH blogging service was upgraded today. It seems to have worked fine (if any problem arises, please indicate them in comments), the only problem being that the upgraded theme I use has switched its rendering of the <em> tag from italic to bold. And as you can see, the default WordPress editor automatically transforms italic tags into emphatic ones. Unfortunately, I still do minor changes of most posts on this editor, so it will probably require some thought how to correct this (hopefully the support people will accept to change the CSS code of the theme)…
At worse, I can switch to another theme, since not all seem to have this problem.
(Note that if you read the posts through Google Reader or something similar, the problem will not appear).

Tying loose ends

I don’t know if there is a single English word for the action of tying loose ends, but this is precisely what I’ve just done by putting together three different short notes and remarks in a single text, entitled “Two remarks on the large sieve”, now available on my home page. (The is ample precedent for a title to mention one less than the actual number of main characters). I’ve already discussed two of those notes: the first one is the new amplification-based proof of the arithmetic large sieve inequality, and the last is the amusing result that two primitive cusp forms are equal if and only if the signs of their respective Hecke eigenvalues coincide for all primes (or all primes with few exceptions). The reader will not fail to notice a certain distance between those two results, and therefore the third note, the most recent (which is not available separately, only in bundled form, like some journals) provides the glue from one to the other.

The gluing is in two parts (though there is rather less precedent for a title to forget about half of the characters of a tale…): there is a result that tries to answer statistically the question of how many primitive forms (of a given level q) may have Hecke eigenvalues with identical signs for the first few primes, which links with the second note, and there is a large-sieve inequality that gives a way to obtain this result. In turn, this inequality turns out to be much easier to prove using the amplification method than using the classical proofs of the arithmetic large sieve, thus relating it with the first result! To finally explain the title, I should say that I think those large sieve results are overall more interesting than the particular application, so I chose to emphasize them. (In fact, I wouldn’t be surprised if the statistical result could be improved by more refined or devious arguments).

The main attraction from the sieve point of view is to have a version of the large sieve which looks like a sieve, but is based directly on the “spectral” inequalities for coefficients of modular forms (originating in work of Iwaniec, and much developped later by Deshouillers and Iwaniec). Those were, until then, mostly considered as distinct beasts, with only a vague connection to sieve, coming from their resemblance with the “harmonic” large sieve inequality

$latex \sum_{q\leq Q}{\sum_{a\text{ mod } q, (a,q)=1}{\left|\sum_{n\leq N}{a(n)\exp(2i\pi an/q)}\right|^2}}\leq (N-1+Q^2)\sum_{n}{|a(n)|^2},$

which also looks like a purely analytic statement.

Here’s the idea for our unification: we consider as our set of objects to “sieve” the finite set S(q) of primitive cusp forms of level q (with a fixed weight, say 1728). As “reductions modulo primes”, we select the maps

$latex \rho_p\,:\, f\mapsto \lambda_f(p)\in \mathbf{R}$

sending each f to its p-th Hecke eigenvalue (and we omit primes dividing q for simplicity). [More intrinsically, this should be seen as mapping each f to its p-component when factored as an automorphic representation, seen as an element of the “dual” of the local group GL(2,Q_p), but for classical modular forms, this amounts to the same thing if we only consider unramified primes]. In fact, from the Ramanujan-Petersson conjecture (proved by Deligne) we know that

$latex \rho_p(f)\in [-2,2]$

for all primes.

Now what should a sifted set look like? Recall that for integers, we would look at sets defined by

$latex \{n\leq N\,\mid\, n\text{ mod } p\notin \Omega_p\text{ for } p\leq Q\}$

so we may want to do the same and look at sets of forms given by

$latex T=\{f\in S(q)\,\mid\, \lambda_f(p)\notin \Omega_p\text{ for } p\leq Q\}$

for some subsets Ω_p of [-2,2]. If we take

$latex \Omega_p=[0,2]$

we get the forms with negative Hecke eigenvalues for all primes p up to Q.

At the moment at least, I can not get a bound for such a set T in this generality, because if we try to approach the question by harmonic analysis (as seems natural), the fact that the target sets are compact intervals [-2,2] means that we need infinitely many harmonics to analyze functions defined on it, such as the characteristic function of Ω_p. Maybe one can chop down the higher frequencies efficiently enough, but for the moment I use a subterfuge: the natural basis on (square-integrable) functions on [-2,2] is given here by the Chebychev polynomials defined by

$latex X_n(2\cos \theta)=\frac{\sin (n+1)\theta}{\sin\theta},\text{ for } \theta\in [0,\pi],\ n\geq 0.$

Precisely, this is the natural basis if we count the forms in S(q) with a suitable weight ω(f) related to the inverse of their Petersson norm, as I will do; otherwise, one needs to use a different basis, which will vary with p: this is related to the fact that, for fixed p, the ρ_p(f) become equidistributed, when q increases, with respect to different measures depending on which weight is used; here the X_n have the feature of being an orthonormal basis of

$latex L^2([-2,2],\nu)$

where ν is the Sato-Tate measure:

$latex \nu=\frac{1}{\pi}\sqrt{1-\frac{x^2}{4}}dx.$

Then I consider sifted sets of the simpler type

$latex U=\{f\in S(q)\,\mid\, \lambda_f(p)\notin \Omega_p\text{ for } p\leq Q\}$

where

$latex \Omega_p=\{x\in [-2,2]\,\mid\, Y_p(x)\leq \beta_{p,0}-\delta_p\},\text{ with } \delta_p\geq 0,$

for some polynomials

$latex Y_p(x)=\sum_{i=0}^s{\beta_{p,i}X_i(x)}$

of degree at most s for all p. Note that, because of orthogonality of the Chebychev polynomials for the Sato-Tate measure, the constant coefficient is

$latex \beta_{p,0}=\int_{-2}^2{Y_p(x)d\nu(x)},$

and therefore the sets U above correspond to forms where, for each prime p up to Q, the polynomials take a value “far” from its average, determined according to the Sato-Tate measure. Using polynomials of bounded degree is what avoids the analytic difficulty with the infinite basis (X_n).

Now it is an easy matter to find a polynomial Y (in fact, of degree 2) and a suitable δ>0 such that

$latex \lambda_f(p)<0\text{ implies } Y(\lambda_f(p))\leq \beta_0-\delta.$

So controlling the sieve with sets of type U is enough to control the forms with negative Fourier coefficients. Then it turns out that if we unravel the sieve framework, the underlying inequality to prove is a consequence of

$latex \sum_{f\in S(q)}{\omega(f)\left|\sum_{m\leq Q^s}{\alpha(m)\lambda_f(m)}\right|^2}\ll (1+Q^sq^{-1})\sum_{m}{|\alpha(m)|^2}$

and this is a special case of the Deshouillers-Iwaniec inequalities! (Actually, a rather simple case). But going from this to the sieve inequality which bounds the (weighted) size of the sifted set U, although it is very simple with the amplification approach, does not seem to be doable with the standard proof (at least, I puzzled on it for a while without success — but of course I have a vested interest in my own argument being better…)