Encadrement, suite

Enlightnement came from a somewhat unexpected source (the manual for my digital camera), and a comment on the earlier post also suggested it: what may be the most natural English rendering of the French encadrement is “bracket”, or “bracketing”.

Indeed, it seems the term “bracket” is used in photography for the operation of taking simultaneously (or as nearly so as possible) three pictures, one with a given selected exposure, and two with higher and lower settings, so that the amateur photographer can then select which is best.

The ever-helpful OED confirms that this is a good choice: we find for Bracket, n., 5(b)

The (specified) distance between a pair of shots fired, one beyond the target and one short of it, in order to find the range for artillery; chiefly in the phrase to establish a bracket.

The quotations that go with this sense are convincing (if somewhat martial); here is the first one:

1899 Daily News 6 Dec. 5/7 At first I fire at 3100 yards, and if I find that my shot is short I fire a second round, say at 3300, in order to go beyond the object. If I see that my shot does go over I am satisfied that I have established what is called ‘a long bracket’, that is to say, I have found two ranges, 200 yards apart, between which the object must lie… I..fire another shot to shorten the distance within which I can then know that the target must be. This we call, on the same principle as the other, ‘a short bracket’.

There is then a further sense 5(c) with similar meaning:

A group bracketed together as of equal standing in some graded system, as income bracket: a class of persons grouped according to income.

And then, finally, bracketing is defined as “The action of furnishing, coupling, uniting, with brackets”. Altogether, it seems one can quite correctly state, in demotic English, something like:

… and so we have the bracket

$latex x-\frac{x^2}{2}\leq \cos(x)\leq x-\frac{x^2}{2}+\frac{x^4}{24}$

(although, to my ear, the variant “we have the bracketing” seems better).

A tricks-wiki article: smooth sums

Here is a first trial at writing a possible article for the mathematical wiki that T. Gowers has been discussing (starting with this post).

For comparison, he has himself written a few sample articles, and so has T. Tao:

    Recognising countable sets (T. Gowers)
    Dimension arguments in combinatorics (T. Gowers)
    The tensor product trick (T. Tao)

I follow the general outline of these articles, but I’ve added, after the general discussion at the beginning, a list of links to the specific examples which are discussed further on. This seems to improve navigability for readers who might feel that one particular example is closer to what they are looking for.

The writing is quite rough in places, as it wasn’t easy to get the right feel for the level and detail of exposition (for instance, the later examples are treated more briefly). Also, there are probably not enough references and links, and so I will certainly make minor adjustments over the next few days to remedy some of this.

—————————————————————
Title: Smooth sums

Quick description: It is often difficult to evaluate asymptotically sums of the type

$latex \sum_{n=1}^N{a(n)},$

but much simpler to deal with

$latex \sum_{n\geq 1}{a(n)\varphi(n)}$

where φ is a smooth function, which vanishes or decays very fast for n larger than N. Such sums are called smoothed sums.

This trick is useful because it often turns out, either that the original problem leading to the first type of sums could be treated using the second type; or that understanding the smoothed sums may be the right path to understanding the first ones.

General discussion: The underlying facts behind this trick are two elementary properties of harmonic analysis: (1) a Fourier (or similar) transform of a product is the convolution of the Fourier transforms of the arguments; (2) in the sums above, one of the arguments in the product is an (implicit) characteristic function of the interval of summation, and Fourier transforms exchange properties of regularity with properties of decay at infinity, so that after applying the Fourier transform to the product, the smoothing function becomes a rapidly decaying factor which simplifies many further analytic manipulations. Thus, this technique may be interpreted also as some form of regularization.

The effect is the trick is thus to eliminate some purely analytic problems of convergence which are otherwise unrelated to the main issue of interest. This trick is often especially useful in situations where the other argument of the product involved is non-negative. In particular, it is very relevant for asymptotic counting problems in analytic number theory and related fields. It may also fruitfully be combined with dyadic partition arguments (see this (unwritten) tricki article).

Prerequisites: Real analysis and integration theory, complex analysis for some applications. Elementary harmonic analysis (such as Fourier transforms, or Mellin transforms, depending on the type of applications). In particular, the two facts indicated above (behavior with respect to product, and exchange of regularity and decay) are important.

Here are links to the examples to be found below:

Example 1. (Summation of Fourier series)

Example 2. (Fourier inversion formula)

Example 3. (Estimating sums of arithmetic functions)

Example 4. (The Prime Number Theorem)

Example 5. (The analytic large-sieve inequality)

Example 6. (The approximate functional equation for ζ(s))

Example 7. (The explicit formula)

——————————————————————————————–

Example 1. (Summation of Fourier series) If f is a periodic function with period 1 on the real line, which is integrable (in the Lebesgue sense, say) on the closed interval [0,1], the Fourier coefficients of f are given by

$latex a_n=\int_0^1{f(t)\exp(-2i\pi nt)dt}$

for any integer n. A basic problem of Fourier analysis is to determine when the Fourier series

$latex x\mapsto \sum_{n\in\mathbf{Z}}{a_n \exp(2i\pi n x)}$

exists and coincides with f.

Let

$latex S_N(x)=\sum_{-N\leq n\leq N}{a_n \exp(2i\pi n x)}$

be the partial sum of this series. We can also interpret the formula as

$latex S_N(x)=\sum_{n\in\mathbf{Z}}{a_n\exp(2i\pi nx)\varphi_N(x)}$

where the sum is again unbounded, but the Fourier coefficients are “twisted” with the characteristic function of the finite interval of summation

$latex \varphi_N(x)=0\ \text{if}\ |x|>N,\ \varphi_N(x)=1\ \text{if}\ |x|\leq N$

The multiplication-convolution relation of harmonic analysis and the definition of the Fourier coefficients lead to

$latex S_N(x)=\int_0^1{f(t)D_N(x-t)dt}$

where

$latex D_N(x)=\sum_{|n|\leq N}{\exp(2i\pi nx)}= \frac{\sin((2N+1)\pi x)}{\sin \pi x}$

is the so-called Dirichlet kernel of order N. This convolution expression is very useful to investigate when the partial sums converge to f(x), but it is well-known that some regularity assumptions on f are needed: there exist integrable functions f such that the Fourier series diverges almost everywhere, as first shown by Kolmogorov.

A significantly better behavior, for continuous functions f, is obtained if, instead of the partial sums above, the Fejér sums are considered:

$latex \sigma_N(x)=\sum_{-N\leq n\leq N}{a_n \exp(2i\pi n x)\left(1-\frac{|n|}{N}\right)}=\frac{1}{N}(S_0(x)+\cdots+S_{N-1}(x)).$

These are our first examples of smoothed sums, since the first expression shows that they differ from the partial sums by the replacement of the characteristic function φN(x) by the function

$latex \psi_N(x)=(1-|x|/N)\varphi_N(x)$

which is more regular: it is continuous on R. Intuitively, one may expect a better analytic behavior of those modified sums because of the averaging involved (by the second expression for the Fejér sums), or because the “smoother” cut-off at the end of the interval of summation implies that those sums should be less susceptible to sudden violent oscillations of the size of the Fourier coefficients.

This is visible analytically from the integral expression which is now

$latex \sigma_N(x)=\int_0^1{f(t)F_N(x-t)dt}$

where

$latex F_N(x)=\frac{1}{n}\frac{\sin^2(n\pi x)}{\sin^2\pi x}$

is the Fejér kernel. Because this kernel is non-negative, it is not very difficult to prove that

$latex \lim_{N\rightarrow +\infty}{\sigma_N(x)}=f(x)$

for all x if f is continuous. (And in fact, if f is of class C1, it is also easy to deduce that the original partial sum do converge, proving Dirichlet’s theorem on the convergence of Fourier series for such functions).

Example 2. (Fourier inversion formula) If we consider a function f which is integrable on R, the Fourier transform of f is the function defined by

$latex g(x)=\int_{\mathbf{R}}{f(t)\exp(-2i\pi xt)dt}$

for x in R. The Fourier inversion formula states that if g is itself integrable, then we can recover the function f by the formula

$latex f(t)=\int_{\mathbf{R}}{g(x)\exp(2i\pi xt)dx}.$

A first idea to prove this is to replace the values g(x) on the right-hand side of this expression by their definition as an integral, and apply Fubini’s theorem to exchange the two integrals; this leads formally to

$latex \int_{\mathbf{R}}{f(t)\left(\int_{\mathbf{R}}{\exp(2i\pi x(t-u)du}\right)dt}$

but the problem is that the inner integral over u does not exist in the Lebesgue sense. However, if g is replaced by a smoother version obtained by multiplying with a function φ which decays fast enough at infinity, then the same computation leads to

$latex \int_{\mathbf{R}}{g(x)\varphi(x)\exp(2i\pi xt)dx}=\int_{\mathbf{R}}{f(t)\left(\int_{\mathbf{R}}{\varphi(x)\exp(2i\pi x(t-u)du}\right)dt},$

which is now legitimate. Selecting a sequence of such functions φ which converge (pointwise) to the constant function 1, the left-hand side converges to

$latex \int_{\mathbf{R}}{g(x)\exp(2i\pi xt)dx}$

while a standard Dirac-sequence argument shows that

$latex \int_{\mathbf{R}}{\varphi(x)\exp(2i\pi xu)du}$

behaves like a Dirac function at the origin, hence the right-hand side of the smoothed formula converges to f(t).

Note. In these two first examples, one can see the “regularization” aspect very clearly, and one could also invoke more abstract ideas of the theory of distributions to express the same arguments. (For instance, for Example 2, one can say that Fourier inversion holds for the Dirac measure with Fourier transform 1, and that the formal properties involving convolution then extend the formula to more general functions). In the next examples, involving analytic number theory, it is often of the greatest importance to have explicitly quantitative estimates at all steps, and this is often more easily done using computations with functions instead of distributions. However, which functions are used for smoothing is often of little importance.

Example 3. (Estimating sums of arithmetic functions) Let a(n) be an arithmetic function, or in other words a complex-valued function defined for positive integers, and let A(x) denote the summatory function

$latex A(x)=\sum_{1\leq n\leq x}{a(n)}.$

Many different problems of analytic number theory can be expressed as the problem of understanding the asymptotic behavior of such sums as x goes to infinity. In Example 4, we describe a celebrated example, the Prime Number Theorem where a(n) is the characteristic function of the set of primes, but here we look at a slightly simpler situation where we assume that

$latex a(n)\geq 0$

and that we want to have an upper-bound for A(x), instead of an asymptotic formula. (This weakened goal might be natural, either because we do not need an asymptotic formula for further applications, or because some uniformity in parameters is needed which is easier to arrange with upper bounds). Quite often, this is approached by trying to use the properties of the Dirichlet generating series

$latex D(s)=\sum_{n\geq 1}{a(n)n^{-s}}$

provided it converges at least in some half-plane where the real part of s is larger than some C.

Then we can replace the implicit characteristic function of the interval [1,x] in A(x) by any smooth(er) function which is pointwise larger. For instance, if we choose a function φ as in the graph below

Smoothing function

so that it is smooth, non-negative, compactly supported on [0,2], and such that

$latex \varphi(x)=1\ for\ 0\leq x\leq 1,$

then we have

$latex A(x)\leq A_{\varphi}(x)=\sum_{n\geq 1}{a(n)\varphi(n/x)}.$

Using the Mellin transform (the multiplicative counterpart to the Fourier transform), we can write

$latex \varphi(y)=\frac{1}{2i\pi}\int_{\sigma-i\infty}^{\sigma+i\infty}{\hat{\varphi}(s)y^{-s}ds}$

where we integrate over a vertical line in the complex plane with real part σ>0, and

$latex \hat{\varphi}(s)=\int_0^{+\infty}{\varphi(x)x^{s-1}dx}.$

Because we use a compactly supported function, it is easy to justify the exchange of the sum over n and the integral representing φ(n/x), and obtain

$latex A(x)\leq \frac{1}{2i\pi}\int_{\sigma-i\infty}^{\sigma+i\infty}{\hat{\varphi}(s)x^sD(s)ds}.$

Now we are free to use any technique of complex analysis to try to estimate this integral. The usual idea is to move the line of integration (using Cauchy’s theorem) as far to the left as possible (since the modulus of xs diminishes when the real part of s does). For this, clearly, one needs to know how far D(s) can be analytically continued, but it is equally important to have some control of the integrand high along vertical lines to justify the change of line of integration, and this depends on the decay properties of the Mellin transform at infinity, which amount exactly to the regularity properties of φ itself. If φ was the characteristic function of the interval [1,x], then the Mellin transform only decays like 1/s for s high in a vertical strip, and multiplying this with any functions D(s) which does not tend to 0 leads, at best, to conditionally convergent integrals. On the other hand, if φ is compactly supported on [0,+∞[, it is very easy to check that the Mellin transform is not only holomorphic when the real part of s is positive (which allows changing the contour of integration that far), but also it decaysfaster than any polynomial in vertical strips, so that any D(s) which has at most polynomial growth in vertical strips can be involved in this manipulation.

(A prototype of this, though it doesn’t involve a compactly supported function, is φ(x)=e-x, for which we have

$latex \hat{\varphi}(s)=\Gamma(s),$

the Gamma function, which decays exponentially fast in vertical strips by the complex Stirling formula: we have

$latex |\Gamma(\sigma+it)|\leq C|t|^{\sigma-1/2}\exp(-\pi|t|/2)$

for some constant C, uniformly for σ in any bounded interval and |t|>1).

Finally, even if an asymptotic formula for A(x) is desired, it may be much easier to prove them by using upper and lower bounds

$latex A_1(x)\leq A(x)\leq A_2(x)$

where

$latex A_i(x)=\sum_{n\geq 1}{a(n)\varphi_i(n)}$

and the smoothing functions, this time, have graphs as described below

Two smoothing functions

with a parameter y<x which is left free to optimize later on. In a concrete case, one may (for instance) prove — using the smoothness of the sums — that

$latex A_1(x)=x-y+O(x^{2/3}),\ \ \ \ \ A_2(x)=x+y+O(x^{2/3})$

and then the choice of y=x2/3 and the encadrement above imply that

$latex A(x)=x+O(x^{2/3}).$

Example 4. (The Prime Number Theorem) The first proofs of the Prime Number Theorem

$latex |\{p\leq x\ |\ p\text{ is prime}\}|\sim \frac{x}{\log x}$

as x goes to infinity, due (independently) to J. Hadamard and C.J. de la Vallée Poussin in 1896, were implementations of the previous example, either with a(n) the characteristic function of the primes, or

$latex a(p)=\log p\ \ \ \text{if } p \text{ is prime},\ \ \ \ a(n)=0\text{ otherwise}$

(which leads to slightly simpler formulas, as already discovered by Chebychev when working with elementary methods). Both proofs used smoothing in a somewhat implicit form. Hadamard’s proof, for instance, amounted roughly to considering

$latex B(x)=\sum_{n\leq x}{a(n)\log (x/n)}$

(where the smoothing is present because the function which is inserted is zero at the end of the summation), and rewriting it as

$latex B(x)=\frac{1}{2i\pi}\int_{3-i\infty}^{3+i\infty}{-\frac{\zeta^{\prime}(s)}{\zeta(s)}x^s\frac{ds}{s^2}}$

where the point is that the smoothing in B(x) is the reason for the appearance of the factor s-2, which itself allows the integral to converge absolutely, even after shifting the integration to a contour close to but to the left of the line where the real part of s is 1. Hadamard adds (see the last page) a remark that, in so doing, he avoids the well-deserved criticisms levelled by de la Vallée Poussin against those arguments which, being based on A(x) itself, involve an integral with merely a factor s-1, which do not converge absolutely.

The last step of the proof is then an easy argument, using the fact that A(x) is increasing, that shows that the asymptotic formula for B(x) implies the expected asymptotic formula for A(x), i.e., the Prime Number Theorem.

There is nothing special about suing this particular form of smoothing; almost any kind would work here.

Note. It is of course the case that the proof of the Prime Number Theorem involves much more than this trick of smoothing! However, it is clearly the case that trying to dispense with it — as can be done — means spending a lot of energy dealing with purely analytic issues of convergence and exchange of sums and integrals, which become completely obvious after smoothing.

Example 5. (The analytic large-sieve inequality) Sometimes smoothing can be used efficiently for estimating sums involving oscillating terms (like exponential sums), although no direct comparison holds as it does in Example 3. As an example, consider the dual analytic large sieve inequality

$latex \sum_{n=1}^N{\left|\sum_{r=1}^R{a(r)\exp(2i\pi n\xi_r)}\right|^2}\leq \Delta \sum_{r=1}^N{|a(r)|^2}$

which can be proved to hold with

$latex \Delta=N-1+\delta^{-1}$

for all N, alll complex coefficients a(r), provided the real numbers ξr are δ-spaced modulo 1, i.e., we have

$latex \min_{m\in\mathbf{Z}}{|m+\xi_r-\xi_s|}\geq \delta$

if r is distinct from s.

To prove this inequality one is tempted to open the square on the left-hand side, and bring sum over n inside to obtain

$latex \sum_{r,s}{a(r)\overline{a(s)}W(r,s)}$

with

$latex W(r,s)=\sum_{n=1}^N{\exp(2i\pi (\xi_r-\xi_s)n)}.$

This is a geometric sum, and is easily summed, but it is not so easy to get a good grip on it afterwards in summing over r and s. Although Montgomery and Vaughan succeeded, it is instructive to note that a much quicker argument — though with a weaker result — can be obtained by smoothing W(r,s). However, after opening the square, it is not clear how to insert a smoothing factor without (maybe) changing the sums completely — since the coefficients a(r) are arbitrary. So one must insert the smoothing beforehand: the left-hand side of the inequality is non-negative and can be bounded by

$latex \sum_{n=1}^N{\left|\sum_{r=1}^R{a(r)\exp(2i\pi n\xi_r)}\right|^2}\varphi(n/N)$

for any smooth function φ which is equal to 1 on [1,N]. This leads to modified sums

$latex W_{\varphi}(r,s)=\sum_{n=1}^N{\exp(2i\pi (\xi_r-\xi_s)n)\varphi(n/N)}$

which can be evaluated and transformed, for instance, by Poisson’s formula to exploit the rapid decay of the Fourier transform of φ.

Example 6. (The “approximate functional equation”) This next example shows how to use smoothing to represent an analytic function given by a Dirichlet series outside of the region of absolute convergence, so that it can be analyzed further. As an example, the series

$latex L(s)=1-\frac{1}{3^s}+\frac{1}{5^s}-\cdots=\sum_{n\geq 1}{\frac{(-1)^{n+1}}{(2n-1)^s}}$

converges absolutely only when the real part of s is larger than 1 (it converges conditionally when the real part is positive). It is known to have analytic continuation to an entire function, but in order to investigate its properties in the critical strip, say when the real part of s is 1/2, it is necessary to find some convenient alternative expression valid in this region. [Note that what is explained below applies also to the Riemann zeta function, with a small complication in presentation due to the presence of a pole at s=1; this is why we describe the case of L(s).]

So assume s has real part between 0 and 1. A common procedure is to pick a non-negative smooth function φ on [0,+∞[, with compact support or rapide decay, equal to one on a neighborhood of 0, and with

$latex \int_0^{+\infty}{\varphi(t)dt}=1$

and then consider, for a parameter X>1, the smoothed sum

$latex S(X)=\sum_{n\geq 1}{\frac{(-1)^{n-1}}{(2n-1)^s}\varphi(n/X)}$

which makes sense for all s if φ decays fast enough. The role of X is, intuitively, as the effective length of the sum.

Using Mellin inversion as in Example 3, we get

$latex S(X)=\frac{1}{2i\pi}\int_{3-i\infty}^{3+i\infty}{L(s+w)\hat{\varphi}(w)X^wds}.$

Now shift the vertical line of integration to the left; because L(s) has at most polynomial growth vertical strips (a non-obvious fact, but one that can be proved without requiring much specific information because of the general Phragmen-Lindelöf principle) and because the Mellin transform of the smooth function φ decays faster than any polynomial, this shift is easy to justify.

Now where do we get poles while performing this shift? The recipe for φ (specifically, that it equals 1 close to 0 and has integral 1) easily imply that φ(w) has a simple pole at w=0 with residue 1, and this leads to a contribution equal to

$latex L(s)X^0\hat{\varphi}(0)=L(s),$

so that

$latex S(X)=L(s)+R(X)$

where the remainder R(X), if we shifted the intregration to the line with real part c is

$latex R(X)=\frac{1}{2i\pi}\int_{c-i\infty}^{c+i\infty}{L(s+w)\hat{\varphi}(w)X^wds}.$

This can be estimated fairly easily if c is very negative, but one can also use the functional equation of L(s) to express R(X) as a sum very similar to S(X), except that s is replaced with 1-s, and the new sum has effective length roughly equal to (|t|+1)/X, where t is the imaginary part of s (the smoothing function is also different). Thus one gets, roughly, an expression for L(s) as the sum of two sums of lengths, the products of which is equal to |t|+1. This is a very general fact in the study of L-functions and of the basic tools for the study of their analytic properties.

Example 7. (The “explicit formula” of prime number theory) In his paper on the Riemann zeta function, Riemann stated that the function π(x) which counts primes up to x is given by

$latex \pi(x)=\mathrm{Li}(x)+\sum_{\rho}{\mathrm{Li}(x^{\rho})}+\text{(other elementary terms)}$

where the sum runs over zeros of ζ(s). This “explicit formula” and similar ones can be very useful to understand many aspects of the distribution of primes, but this particular one is hard to justify for convergence reasons. However, provided one looks at a smoothed sum

$latex \sum_{p}{(\log p)\varphi(p/x)}$

(which has effective length x), it is a simple exercise in contour integration to obtain an explicit formula which converges very well, involving a sum over the zeros of the zeta function weighted with the Mellin (or Laplace, or Fourier) transform of φ.

Generalizations of it to primes in arithmetic progressions or Dirichlet L-functions are very efficient, for instance, in finding very strong estimates — assuming the Generalized Riemann Hypothesis — for the smallest prime in an arithmetic progression, or similar problems.

—————————————————————————————–

References: The technique of smoothing is not usually discussed in detail in textbooks. One exception is in my introductory book on analytic number theory (in French, so smooth sums are called sommes lisses or sommes lissées): see Section 2.3 in particular where the technique is introduced; it is then used throughout the book quite systematically even when — sometimes — it could be dispensed with.

For details of Examples 4, 5, 6, 7, 8, one can look in my book with Henryk Iwaniec (see Chapters 5 and 7 in particular), although the technique is used there without particular comments.

Encadrement

Is there a particular word in English for a string of two inequalities which together give both a lower and an upper bound for a certain quantity?  French has encadrement for things like this, as in

Pour tout réel x, on a l’encadrement

$latex x-\frac{x^2}{2}\leq \cos(x)\leq x-\frac{x^2}{2}+\frac{x^4}{24},$

This is often quite convenient — of course, one can say “we have the following inequalities…”, but the extra information is useful to have, and it helps avoiding too many repetitions.

Note: The OED only lists encadré as an English word, as a technical term in crystallography:

A crystal is named encadré, when it has facets which form kinds of squares around the planes of a more simple form already existing in the same species, R. Jameson, A treatise on the external characters of minerals, 1805.

Demystifying (a bit) the arithmetic large sieve

In the development of the large sieve, leading to its arithmetic applications, there is one step which — I must admit shamefully — I never completely understood as a natural argument instead of one with a somewhat mysterious clever trick. Or, at least, until now: just this week-end, one vaguely related thing leading to another, I’ve just stumbled on a very transparent proof of this result.

I’ve written a short note with the details, but here is the outline. For background, it may be useful to read my earlier post on the subject on T. Tao’s blog, though I’ll use the classical setting of the sieve for integers below (as I did in the note, although it extends mutatis mutandis to the general setting described in my recent book).

The point is to provide the link between the analytic and arithmetic large sieve inequalities. The first one of these is the inequality

$latex (*)\ \ \ \ \ \ \ \ \ \ \sum_{q\leq Q}{\sum_{(a,q)=1}{|\sum_{n\leq N}{a_n\exp(2i\pi na/q)}|^2}}\leq (N-1+Q^2)\sum_{n\leq N}{|a_n|^2}$

which is valid for all complex numbers (an); here the sum over a is over all residue classes modulo q, coprime with q, and the resulting exponential is independent of the choice of representatives of those classes. This inequality is discussed in great detail in Montgomery’s survey article; I’ll just say here that the constant N-1+Q2, which is not particularly easy to obtain, is of no importance for our purpose, and I will just write

$latex \Delta=N-1+Q^2$

below to emphasize this.

The second inequality, the arithmetic large sieve inequality, states that, for any choice of subsets Ωp for primes p, we have

$latex (**)\ \ \ \ \ \ \ \ \ \ |\{n\leq N\ |\ n\ \text{mod}\ p\ \notin \Omega_p\ \text{for}\ p\leq Q\}|\leq \Delta H^{-1}$

where

$latex H=\sum_{q\leq Q}{\mu(q)^2\prod_{p\mid q}{\frac{|\Omega_p|}{p-|\Omega_p|}}}.$

This is thus, recognizably, a sieve estimate: we bound from above the number of integers remaining in a segment after removing all those which fail to satisfy some constraints on their reductions modulo primes, constraints which are imposed for all primes p up to Q. This Q is a parameter which is adjusted for applications, depending on N.

—————————————————————–

Example. If sieve is completely unfamiliar, here is one of the most traditional examples: let Ωp be the residue classes 0 and -2 modulo p. Among the integers surviving the sieve process are then all prime numbers r larger than Q such that r+2 is also prime. Hence the arithmetic large sieve inequality implies an upper bound for the number of twin primes up to N, namely

$latex \pi_2(N)\leq \Delta H^{-1},\ \ \ \ \ \text{where}\ \ \ \ \ H=\sum_{q\leq Q}{\mu(q)^2\prod_{p\mid q}{\frac{2}{p-2}}}$

(where in fact the coefficients 2 must be replaced by 1 when p=2).

Using elementary techniques (see this treatment by Ben Green for example), or general results on sums of multiplicative functions, it follows that

$latex H\geq c(\log Q)^2$

for Q> 2 and some constant c>0. This leads to an estimate for the number of twin primes which is (conjecturally) of the right order of magnitude

$latex \pi_2(N)\ll \frac{N}{(\log N)^2}$

and by a partial summation, to the (weaker, but still spectacular) conclusion that the series of inverses of twin primes converges:

$latex \sum_{p,p+2\ \text{primes}}{\frac{1}{p}}<+\infty,$

which was one of V. Brun’s first achievements in building the beginnings of the modern theory of sieve methods in the early 20th Century.

————————————————————————–

So I will now describe how to derive (*) from (**). Or rather, I will derive (**) from the dual inequality

$latex (***)\ \ \ \ \ \ \sum_{n\leq N}{|\sum_{q\leq Q}{\sum_{(a,q)=1}{\beta(q,a)\exp(\frac{2i\pi na}{q})}}|^2}\leq \Delta\sum_{q\leq Q}{\sum_{(a,q)=1}{|\beta(q,a)|^2}}$

where the complex coefficients β(q,a) are still arbitrary, and Δ has the same value as before. The equivalence of (*) and (***) is a consequence of the elementary duality theory in finite-dimensional Hilbert spaces. But more importantly, (*) is often proved by means of (***) and this duality principle; in more general contexts, this is because (***) is usually easier to approach. So it is a natural starting point to argue towards the arithmetic inequality (**).

But before we begin, a notational convention: below, q (and sums over q) will always refer to squarefree (positive) integers.The idea is quite simple, in particular if you’ve ever seen the Chebychev inequality in probability: let S denote the sifted set

$latex S=\{n\leq N\ |\ n\ \text{mod}\ p\ \notin \Omega_p\ for\ p\leq Q\}$.

Comparing with the structure of (***), we are going to describe an “amplifier” A(n) which is of the form

$latex A(n)=\sum_{q\leq Q}{\sum_{(a,q)=1}{\beta(q,a)\exp(\frac{2i\pi na}{q})}}$

and has the property that |A(n)| is “large” whenever n is in S. If we have

$latex |A(n)|\geq B,$

for n in S, then by positivity we deduce

$latex B^2|S|\leq \Delta \sum_{q}{\sum_{a}{|\beta(q,a)|^2}}$

and if the last sum can be expressed conveniently and is not too large, we get an upper bound for |S| in this manner.

To build the amplifier, consider first a single prime p less than Q: if n is in S, we know that the reduction of n modulo p is constrained to not be in Ωp. We can express this analytically by saying that the characteristic function of this set, evaluated at n, is zero. But now expand this characteristic function (say fp) in terms of the additive characters (the “discrete Fourier transform”): we have

$latex f_p(x)=\sum_{a\ mod\ p}{\alpha(p,a)\exp(2i\pi ax/p)}$

with

$latex \alpha(p,a)=\frac{1}{p}\sum_{x\in\Omega_p}{\exp(-2i\pi ax/p)}.$

Thus for n in S, we have

$latex 0=f_p(n)=\sum_{a\ mod\ p}{\alpha(p,a)\exp(2i\pi an/p)}$

and we rewrite this by isolating the contribution of the 0-th harmonic (which is the constant function 1, which encodes the probability that an integer modulo p is in Ωp):

$latex \frac{|\Omega_p|}{p}=\alpha(p,0)=\sum_{(a,p)=1}{(-\alpha(p,a))\exp(2i\pi an/p)}.$

This is the basis of our detector! First, this defines the coefficients

$latex \beta(p,a)=-\alpha(p,a)$

and then, to extend this to all squarefree integers q up to Q, we use the Chinese Remainder Theorem and multiply the detectors for primes dividing q, getting coefficients β(q,a) for a coprime with q, such that

$latex \prod_{p\mid q}{\frac{|\Omega_p|}{p}}=\sum_{(a,q)=1}{\beta(q,a)\exp(2i\pi an/q)}.$

For notational simplicity, we write

$latex c_p=\frac{|\Omega_p|}{p}.$

Thus the final detector for S is

$latex A(n)=\sum_{q\leq Q}{\sum_{(a,q)=1}{\beta(q,a)\exp(2i\pi an/q)}},$

and we have

$latex |A(n)|=\sum_{q\leq Q}{\prod_{p\mid q}{c_p}}.$

Calling this quantity B, as previously, the resulting inequality after applying (***), from the sketch above, is

$latex B^2|S|\leq \Delta A$

with

$latex A=\sum_{q\leq Q}{\sum_{(a,q)=1}{|\beta(q,a)|^2}}.$

To compute this, we invoke the orthogonality of additive characters, and their compatibility with the Chinese Remainder Theorem: we have first

$latex A=\sum_{q\leq Q}{\prod_{p\mid q}{\sum_{(a,p)=1}{|\beta(p,a)|^2}}},$

and then, using the (discrete) Parseval formula

$latex \sum_{a\ mod\ p}{|\alpha(p,a)|^2}=\frac{1}{p}\sum_{x\ mod\ p}{|f_p(x)|^2}=c_p,$

we get by removing the contribution of a=0 that

$latex A=\sum_{q\leq Q}{\prod_{p\mid q}{c_p(1-c_p)}}.$

Now we have an inequality

$latex |S|\leq \Delta \frac{A}{B^2},$

which is similar, but not quite identical, to (**). In general, it is somewhat weaker — though barely so for some problems like the twin-prime example above, where the sets Ωp are quite small (i.e., what are called small sieves…)

To improve this inequality and get (**), we notice that we still can try other amplifiers. In particular, it is natural to notice that the inequality we got is not homogeneous if we replace the coefficients β(q,a) by multipliying by constants (depending only on q, multiplicatively). Indeed, if we consider now

$latex \gamma(q,a)=(\prod_{p\mid q}{\lambda_p})\beta(q,a)$

we replace B with

$latex B_1=\sum_{q\leq Q}{\prod_{p\mid q}{\lambda_pc_p}}$

while A is replaced with

$latex A_1=\sum_{q\leq Q}{\prod_{p\mid q}{\lambda_p^2c_p(1-c_p)}}$

and we can try to minimize the ratio A1/B12 when the coefficients λp are allowed to vary. This is the same classical problem that occurs in the Selberg sieve (see for instance the write-up by Ben Green of the application of the Selberg sieve to the twin-prime problem, already mentioned, for an introduction): minimize a quadratic form with respect to a linear constraint (with a different quadratic form), and it is easily and elegantly solved: we casually Cauchy

$latex B_1^2\leq (\sum_{q\leq Q}{\prod_{p\mid q}{\lambda_p^2c_p(1-c_p)}})\times (\sum_{q\leq Q}{\prod_{p\mid q}{c_p/(1-c_p)}})=A_1H,$

so that, first of all, we can not do better (i.e., make the ratio smaller) than

$latex \frac{A_1}{B_1^2}=\frac{1}{H}$

and second, by the equality case in Cauchy’s inequality, we can do something this good by taking

$latex \lambda_q=\prod_{p\mid q}{1/(1-c_p)}.$

This leads to the optimal value of the ratio and therefore proves the arithmetic large sieve inequality (**), as described previously.

A final remark: the simplicity of the argument suggests that it is probably not new. There are (in Montgomery’s survey article, for instance) a number of earlier derivations of the arithmetic large sieve inequality from the dual form of the analytic inequality. However, those I have seen (I must say I had not really looked at any of them until after writing the gist of my argument…) are based on, or inspired by, the connection with the Selberg sieve, and are therefore less elementary (and motivated). Still, the ingredients look always much the same, and I think this write-up is more a question of re-arranging them, instead of a really new proof.

Can L^p be the same as L^q?

A fairly standard question that students are asked after a course in real analysis (involving Lebesgue integration) is to show that there is no inclusion between Lp spaces on R with respect to Lebesgue measure. This is usually done (sometimes by hinting or prompting) by considering for which values of α and p the functions

$latex x\mapsto x^{\alpha}$

are in Lp (after truncating either for |x|<1 or for |x|>1, to avoid divergences).

Once I was present as observer at an oral examination for such a course, and the professor in charge had raised this question more or less by asking

Is it possible that Lp is the same as Lq, if p is not equal to q?

where the meaning was clearly, implicitly, the one above; and I wondered (aloud) if the answer was the same in a more abstract way: is it possible that Lp(X,μ) be isomorphic to Lq(X,μ) if p is different from q? This is a question that makes sense for general measure spaces, of course, and one where one must be careful to specify what “isomorphic” is taken to mean. In the algebraic sense, this is a question of cardinality only, and doesn’t seem very interesting, but isomorphism as topological vector spaces seems much more natural. Note that the answer is not immediately clear for the classical Lebesgue spaces over R: even if there is no inclusion, there might exist some clever linear renormalization map that sends (for instance) a square-integrable function bijectively — and continuously — to an integrable one.

But still, the answer is (not surprisingly) “No”, provided the only obvious reservation is made: if Lp and Lq are finite dimensional, then they are isomorphic as topological vector spaces, as is well known (they are not necessarily isometric, of course). But otherwise, we have

Theorem. Let (X,μ) be a measure space, and let p and q be real numbers at least 1 such that Lp(X,μ) has infinite dimension. Then Lp(X,μ) and Lq(X,μ) are isomorphic, as topological vector spaces, if and only if p=q.

I think this goes back to Banach, at least for the classical spaces of functions on R (if I understand correctly Chapter XII of his book Théorie des opérations linéaires). For the general case, although I didn’t find a reference, this must be well-known since it is a direct consequence of the computation of the type and cotype invariants of such Banach spaces. (The only reason I actually had the idea to look at these is that I was browsing pleasurably into the nice book of Li and Queffélec on the geometry of Banach spaces, where type and cotype are described in detail; this is overall very far from what I can claim any expertise to…)

Indeed, for an infinite dimensional Banach space of the form Lp(X,μ), it is known that

$latex type(L^p(X,\mu))=\min(2,p),\ \ \ \ \ \ cotype(L^p(X,\mu))=\max(2,p)$

where the (best) type (denoted type(E)) of a Banach space E is defined to be the largest real number p such that

$latex \int_0^1{||\sum_{j=1}^n{r_j(t)x(j)}||dt}\leq M\left(\sum_{j=1}^n{||x(j)||^p}\right)^{1/p}$

for any n>1, any finite sequence of vectors x(j) in E, and some constant M>0 (independent of n), where

$latex r_j(t)=\mathrm{sign}\sin 2^j\pi t$

denotes the sequence of Rademacher functions. Dually, the cotype is the smallest real number q such that

$latex \int_0^1{||\sum_{j=1}^n{r_j(t)x(j)}||dt}\geq m\left(\sum_{j=1}^n{||x(j)||^q}\right)^{1/q}$

for some constant m>0. The results above on the type and cotype of Lp spaces are explained in Section III.3 of the book of Li and Queffélec.

These definitions show that the type and cotype are preserved under continuous linear isomorphisms, so if we have infinite-dimensional spaces Lp(X,μ) and Lq(X,μ) which are isomorphic, their types and cotypes must coincide, i.e., we must have

$latex \min(2,p)=\min(2,q),\ \ \ and\ \ \ \max(2,p)=\max(2,q),$

which means p=q.