The Rademacher–Menshov maximal estimate for exponential series

Let us begin with the basic problem. Given an exponential sum

\displaystyle \sum_{n=1}^{N}a_ne^{inx},

what information about the coefficients is enough to ensure that its partial sums actually converge at almost every point? The first condition one naturally encounters is square summability:

\displaystyle \sum_{n=1}^{\infty}|a_n|^2<\infty.

Because the exponentials are orthogonal, this condition immediately implies that the partial sums are Cauchy in L^2 . But L^2 convergence only says that the average squared difference between two partial sums is small. It does not prevent a particular point x from seeing large oscillations along some exceptional sequence of stopping points. Thus the real issue is not merely to control one partial sum S_m at a time. We must control all possible partial sums simultaneously.

Work on the circle \mathbb T:=\mathbb R/(2\pi\mathbb Z) , with normalized measure d\mu(x):=\frac{dx}{2\pi} . Then

\displaystyle \int_{\mathbb T}e^{inx}\overline{e^{imx}}d\mu(x)=\delta_{nm}.

Thus e^{inx} , for n\in\mathbb Z , is an orthonormal system in L^2(\mathbb T) . For a finite coefficient sequence, write

\displaystyle S_m(x):=\sum_{n=1}^{m}a_ne^{inx},\quad 1\le m\le N.

Orthogonality gives the exact identity

\displaystyle ||S_m||^2=\sum_{n=1}^{m}|a_n|^2.

This formula says that, for every fixed endpoint m , the size of S_m is completely controlled by the coefficient energy. However, convergence asks a stronger question. At a given point x , the partial sums may reach their largest value at an endpoint which depends on x . Therefore we must study the maximal partial sum

\displaystyle S^*(x):=\max_{1\le m\le N}|S_m(x)|.

There is no direct way to insert this maximum into Parseval’s identity. The Rademacher–Menshov estimate supplies a substitute. It says that the price of controlling all possible stopping points is only logarithmic in the number of terms:

\displaystyle  ||  S^{*} ||  \le\Big(2+ \log_2N \Big) \Big(\sum_{n=1}^{N}|a_n|^2\Big)^{1/2}.

Equivalently,

\displaystyle \int_{\mathbb T} \max_{1\le m\le N} \Big|\sum_{n=1}^{m}a_ne^{inx}\Big |^2 d\mu(x) \le\Big(2+\log_2N \Big)^2 \sum_{n=1}^{N}|a_n|^2.

The important point is not just the inequality itself, but its meaning. The expression \sum_{n=1}^{N}|a_n|^2 measures the ordinary L^2 energy of the coefficient sequence. The extra logarithm measures the additional difficulty of allowing the endpoint to vary. Instead of asking about one predetermined sum, we ask for the largest among N correlated sums. The proof will organize all these possible endpoints by binary scale, showing that every initial interval can be assembled from only about \log_2N dyadic pieces.

This estimate is elementary but remarkably general. Its proof uses only orthogonality, so it applies to any orthonormal system, not merely to the exponentials. The logarithmic loss is the cost of ignoring all further structure. For the trigonometric system, the functions e^{inx} possess much more cancellation than arbitrary orthogonal functions, and Carleson’s theorem eventually removes the logarithm altogether. But before reaching that deeper result, the Rademacher–Menshov argument gives a clear and concrete answer to a fundamental question: how can square-summable coefficients be upgraded from an averaged L^2 statement to almost-everywhere convergence?

Dyadic Decomposition

First assume that N=2^r . Consider all dyadic intervals contained in {1,\dots,2^r} . At scale \ell , where 0\le\ell\le r , these intervals are

\displaystyle I_{\ell,q}:={q2^\ell+1,q2^\ell+2,\dots,(q+1)2^\ell},\quad 0\le q<2^{r-\ell}.

The intervals at scale 0 are single points. The intervals at scale 1 have length 2 , those at scale 2 have length 4 , and so on. At the final scale r , there is only one interval, namely {1,\dots,2^r} . Now fix an integer m with 1\le m\le2^r . Its binary expansion has the form m=\sum_{\ell=0}^{r}\varepsilon_\ell2^\ell,\quad \varepsilon_\ell\in{0,1}. The binary digits tell us how to partition the initial interval {1,\dots,m} . Each digit equal to 1 instructs us to take one block at the corresponding dyadic scale. There is never more than one selected block at any scale.

For example, when m=13 , we have 13=8+4+1. Accordingly, \{1,\dots,13\} =\{1,\dots,8\}\sqcup \{9,\dots,12\}\sqcup \{13\}. The first piece has length 8 , the second has length 4 , and the final piece has length 1 . This is not merely an example: every initial interval can be decomposed in exactly this way, using at most one dyadic interval from each scale.

For a dyadic interval I , define its block sum by

\displaystyle B_I(x):=\sum_{n\in I}a_ne^{inx}.

The dyadic decomposition gives

\displaystyle S_m(x)=\sum_{\ell=0}^{r}B_{I_\ell(m)}(x),

where I_\ell(m) is either the one selected dyadic interval of length 2^\ell or is absent. When no interval is selected at a scale, we interpret the corresponding summand as zero. There are at most r+1 nonzero terms in this decomposition. Cauchy–Schwarz therefore gives

\displaystyle |S_m(x)|^2 \le(r+1)\sum_{\ell=0}^{r}|B_{I_\ell(m)}(x)|^2.

Every interval selected in this decomposition belongs to the full dyadic family. Hence

\displaystyle |S_m(x)|^2 \le(r+1)\sum_{\ell=0}^{r}\sum_q|B_{I_{\ell,q}}(x)|^2.

The expression on the right does not depend on m . Taking the maximum over m yields the pointwise inequality

\displaystyle \max_{1\le m\le2^r}|S_m(x)|^2 \le(r+1)\sum_{\ell=0}^{r}\sum_q|B_{I_{\ell,q}}(x)|^2.

This is the first appearance of the logarithm. A partial interval may cut across every dyadic scale, and there are approximately \log_2N scales. Cauchy–Schwarz pays one factor for the number of such pieces.

We now integrate the preceding pointwise bound. Orthogonality gives an exact formula for every dyadic block:

\displaystyle ||B_I||^2 =\int_{\mathbb T}\Big|\sum_{n\in I}a_ne^{inx}\Big|^2 d\mu(x) =\sum_{n\in I}|a_n|^2.

Indeed,

\displaystyle \int_{\mathbb T}\Big|\sum_{n\in I}a_ne^{inx}\Big|^2 d\mu(x) =\sum_{n,m\in I}a_n\overline{a_m} \int_{\mathbb T}e^{i(n-m)x}d\mu(x),

and the integral is \delta_{nm} , so only the diagonal terms remain. At a fixed scale \ell , the intervals I_{\ell,q} partition {1,\dots,2^r} . Thus

\displaystyle \sum_q ||B_{I_{\ell,q}}||^2 =\sum_{n=1}^{2^r}|a_n|^2.

Every coefficient appears once at each scale. Since there are r+1 scales,

\displaystyle \sum_{\ell=0}^{r}\sum_q ||B_{I_{\ell,q}} ||^2 =(r+1)\sum_{n=1}^{2^r}|a_n|^2.

Combining this with the pointwise maximal estimate from above gives

\displaystyle  ||~~\max_{1\le m\le2^r}|S_m| ~~||^2 \le(r+1)^2\sum_{n=1}^{2^r}|a_n|^2.

Taking square roots,

\displaystyle ||~~\max_{1\le m\le2^r}|S_m| ~~|| \le(r+1)\Big(\sum_{n=1}^{2^r}|a_n|^2\Big)^{1/2}.

For a general positive integer N , choose r so that N\le2^r<2N , extend the coefficient sequence by setting a_n=0 for N<n\le2^r , and apply the estimate above. This proves the Rademacher–Menshov maximal inequality

\displaystyle  ||  S^{*} ||  \le\Big(2+ \log_2N \Big) \Big(\sum_{n=1}^{N}|a_n|^2\Big)^{1/2}.

The origin of the squared logarithm in the integrated estimate is now completely visible. The first factor of \log N comes from breaking a single partial sum into roughly \log N dyadic pieces and applying Cauchy–Schwarz. The second factor comes from integrating the square: every coefficient is counted once at every dyadic scale, hence roughly \log N times.

The same argument applies when the frequencies begin at an arbitrary integer A rather than at 1 . If A\le B , then

\displaystyle \int_{\mathbb T} \max_{A\le m\le B} \Big|\sum_{n=A}^{m}a_ne^{inx}\Big|^2 d\mu(x) \le\Big(1+\lceil\log_2(B-A+1)\rceil\Big)^2 \sum_{n=A}^{B}|a_n|^2.

Nothing new needs to be proved. One simply applies the preceding dyadic decomposition to the finite orthonormal family e^{iAx},e^{i(A+1)x}, \dots,e^{iBx}. This block form is what turns the finite maximal estimate into an almost-everywhere convergence theorem for infinite series.

Almost-everywhere convergence

Suppose that \displaystyle \sum_{n=1}^{\infty}|a_n|^2\log_2^2(n+1)<\infty. Then the exponential series \displaystyle \sum_{n=1}^{\infty}a_ne^{inx} converges for almost every x\in\mathbb T .

The hypothesis is stronger than ordinary square summability, and so it automatically implies \sum_{n=1}^{\infty}|a_n|^2<\infty. Consequently, the partial sums are Cauchy in L^2(\mathbb T) . But L^2 convergence only provides an averaged limit. To prove almost-everywhere convergence, we must show that the partial sums are pointwise Cauchy outside a set of measure zero.

The strategy is to divide the frequency axis into dyadic blocks: B_j:=\{2^j,2^j+1,\dots,2^{j+1}-1\},\quad j\ge1. Let E_j:=\sum_{n=2^j}^{2^{j+1}-1}|a_n|^2. This is the coefficient energy contained in the j -th block. When n\in B_j , we have \log_2(n+1)\ge j , and for j\ge1 we also have (j+1)^2\le4j^2 . Hence the hypothesis implies

\displaystyle \sum_{j=1}^{\infty}(j+1)^2E_j<\infty.

There are two distinct forms of movement to control. First, a partial sum can fluctuate while it passes through a single dyadic block. Second, the completed blocks can accumulate as we move farther and farther out in frequency. The maximal estimate controls the first phenomenon; a weighted Cauchy–Schwarz argument controls the second.

Define \displaystyle M_j(x):=\max_{2^j\le m<2^{j+1}}\Big|\sum_{n=2^j}^{m}a_ne^{inx}\Big|. This is the largest amount by which the partial sums can move while passing through the j -th dyadic block. Since this block has length 2^j , the block maximal estimate gives

\displaystyle ||M_j||^2\le(j+1)^2E_j.

Our coefficient hypothesis implies \displaystyle \sum_{j=1}^{\infty}(j+1)^2E_j<\infty, and therefore

\displaystyle \sum_{j=1}^{\infty}||M_j||^2<\infty.

We now derive carefully what this says at individual points. Fix \varepsilon>0 , and for each j let

\displaystyle A_j(\varepsilon):=\{x\in\mathbb T:M_j(x)>\varepsilon\}.

On this set, we have M_j(x)^2>\varepsilon^2 . Hence

\displaystyle \varepsilon^2\mu\Big(A_j(\varepsilon)\Big) \le\int_{A_j(\varepsilon)}M_j(x)^2 d\mu(x) \le\int_{\mathbb T}M_j(x)^2d\mu(x) =||M_j||^2.

Thus \displaystyle \mu\Big(A_j(\varepsilon)\Big) \le\varepsilon^{-2}||M_j||^2. After summing over j , we obtain

\displaystyle \sum_{j=1}^{\infty}\mu\Big(A_j(\varepsilon)\Big) \le\varepsilon^{-2}\sum_{j=1}^{\infty} ||M_j||^2 <\infty.

Now consider the set of points F_J(\varepsilon) where the inequality M_j(x)>\varepsilon occurs infinitely often for j \ge J. We can upper bound the measure of the set by the union bound.

\displaystyle \mu\Big(F_J(\varepsilon)\Big) \le\sum_{j\ge J}\mu\Big(A_j(\varepsilon)\Big).

The series on the right converges, so its tail tends to zero as J\to\infty . Therefore the set \displaystyle F_\infty(\varepsilon):=\bigcap_{J=1}^{\infty}F_J(\varepsilon) has measure zero: indeed, F_\infty(\varepsilon)\subseteq F_J(\varepsilon) for every J , and hence \displaystyle \mu\Big(F_\infty(\varepsilon)\Big) \le\mu\Big(F_J(\varepsilon)\Big)\longrightarrow0.

Thus, outside a set of measure zero, the inequality M_j(x)>\varepsilon can hold only for finitely many j . Equivalently, for almost every x , there is an index J_\varepsilon(x) such that \displaystyle M_j(x)\le\varepsilon whenever j\ge J_\varepsilon(x). Apply this conclusion successively with \varepsilon=1,1/2,1/3,\dots . The union of the corresponding exceptional sets still has measure zero. Hence, for every point outside one fixed null set and for every positive \varepsilon , the quantities M_j(x) are eventually smaller than \varepsilon . Therefore M_j(x)\longrightarrow0 for almost every x\in\mathbb T .

This conclusion has a very specific meaning: inside sufficiently far-out dyadic blocks, the partial sums barely move at almost every point. It does not yet prove that the full partial sums converge, because small movements in infinitely many different blocks could still accumulate. The next step must therefore control the sums of the completed blocks themselves.

Now define the sum of the full j -th block:

\displaystyle D_j(x):= \sum_{n=2^j}^{2^{j+1}-1}a_ne^{inx}.

Orthogonality gives \displaystyle ||D_j||^2=E_j. Therefore

\displaystyle \int_{\mathbb T} \sum_{j=1}^{\infty}(j+1)^2|D_j(x)|^2 d\mu(x) =\sum_{j=1}^{\infty}(j+1)^2E_j<\infty.

For almost every x , \displaystyle \sum_{j=1}^{\infty}(j+1)^2|D_j(x)|^2<\infty. At such a point, apply Cauchy–Schwarz in the form

\displaystyle \sum_{j=1}^{\infty}|D_j(x)| =\sum_{j=1}^{\infty}\frac{1}{j+1}(j+1)|D_j(x)|.

Since \sum_{j\ge1}(j+1)^{-2}<\infty , we obtain

\displaystyle \sum_{j=1}^{\infty}|D_j(x)| \le \Big(\sum_{j=1}^{\infty}\frac{1}{(j+1)^2}\Big)^{1/2} \Big(\sum_{j=1}^{\infty}(j+1)^2|D_j(x)|^2\Big)^{1/2}<\infty.

Thus \displaystyle \sum_{j=1}^{\infty}D_j(x) converges absolutely for almost every x . This shows that the partial sums at dyadic endpoints, S_{2^J-1}(x), converge almost everywhere.

The distinction between M_j and D_j is important. The quantities M_j control the worst local fluctuation inside a block. The quantities D_j control the total jump from one completed block to the next. Both controls are necessary.

Let p\le q , and suppose that p\in B_j and q\in B_k , where j\le k . When j=k , both indices lie in the same dyadic block. The difference S_q(x)-S_p(x) is the difference of two partial block sums. Hence

\displaystyle |S_q(x)-S_p(x)|\le2M_j(x).

When j<k , the interval of frequencies from p+1 to q consists of three parts: a final portion of the j -th block, the completed blocks between j and k , and an initial portion of the k -th block. Therefore

\displaystyle |S_q(x)-S_p(x)| \le2M_j(x)+\sum_{\ell=j+1}^{k-1}|D_\ell(x)|+M_k(x).

At every point where M_j(x)\to0 and \sum_j|D_j(x)|<\infty , the right side tends to zero whenever p,q\to\infty . The first and third terms vanish because the local block oscillations vanish. The middle term vanishes because it is a tail of an absolutely convergent series.

Thus, for almost every x , the sequence of partial sums S_m(x) is Cauchy. We have proved the result. \sum_{n=1}^{\infty}a_ne^{inx} converges for almost every x whenever \sum_{n=1}^{\infty}|a_n|^2\log^2(n+1)<\infty.

The coefficient hypothesis implies \sum_n|a_n|^2<\infty . Hence ||S_M-S_N||^2 =\sum_{n=N+1}^{M}|a_n|^2\longrightarrow0. There is therefore an f\in L^2(\mathbb T) such that S_N\longrightarrow f in L^2(\mathbb T) . The Rademacher–Menshov argument gives a pointwise limit g(x) for almost every x . These are not two different functions. Convergence in L^2 implies pointwise convergence on a subsequence.Therefore g(x)=f(x) for almost every x . Thus the theorem says not only that the series has an almost-everywhere pointwise limit, but that this pointwise limit agrees almost everywhere with the L^2 Fourier sum determined by the coefficient sequence.

Convergence of Fourier Series

Suppose that \displaystyle f(x)\sim\sum_{n\in\mathbb Z}\widehat f(n)e^{inx}, and assume \displaystyle \sum_{n\in\mathbb Z}|\widehat f(n)|^2\log^2(2+|n|)<\infty.

Apply the one-sided result separately to the positive-frequency series and the negative-frequency series. Both series converge almost everywhere. After adding the constant coefficient \widehat f(0) , it follows that the symmetric Fourier partial sums

\displaystyle \sum_{|n|\le N}\widehat f(n)e^{inx}

converge almost everywhere to f(x) assuming \displaystyle \sum_{n\in\mathbb Z}|\widehat f(n)|^2\log^2(2+|n|)<\infty.

The proof does not attempt to obtain pointwise convergence directly from

\displaystyle \sum_{n=1}^{\infty}|a_n|^2<\infty.

That condition gives an L^2 limit, but it does not by itself control the largest partial sum at a point. Instead, the proof organizes the frequency axis into dyadic blocks. The block B_j has length approximately 2^j , and so it contains many possible stopping points. The maximal estimate says that controlling every possible stopping point inside this one block costs a factor of approximately j in L^2 norm. But the index j is comparable with \log n on the block B_j . The condition \sum_{n=1}^{\infty}|a_n|^2\log^2(n+1)<\infty says exactly that the total coefficient energy is strong enough to pay the maximal-function cost on every dyadic scale. The proof then separates the pointwise convergence problem into two manageable statements. The quantities M_j(x) show that the oscillation inside the later blocks becomes negligible. The quantities D_j(x) show that the completed block jumps are summable. Together, these facts imply that the full sequence of partial sums is pointwise Cauchy almost everywhere.

The Rademacher–Menshov proof treats the exponential system only as an orthonormal system. It never uses the much richer interaction among the phases e^{inx} at different frequencies. This is why the proof pays a logarithmic price. For the actual trigonometric system, Carleson’s theorem proves the stronger maximal estimate

\displaystyle ||~~ \sup_{N\ge1} |\sum_{n=1}^{N}a_ne^{inx}| ~~|| \le C\Big(\sum_{n=1}^{\infty}|a_n|^2\Big)^{1/2}.

There is no logarithm. Consequently, every L^2 Fourier series converges almost everywhere. The conceptual difference is important. Rademacher–Menshov decomposes every partial sum into dyadic pieces and pays for each possible scale separately. Carleson’s theorem exploits cancellation that is special to the trigonometric system and controls many possible stopping points at once. It sees analytic structure that pure orthogonality does not see.

Take a_n=\frac1n. Then \sum_{n=1}^{\infty}\frac{\log^2(n+1)}{n^2}<\infty. vTherefore the Rademacher–Menshov theorem implies thatv\sum_{n=1}^{\infty}\frac{e^{inx}}{n} vconverges almost everywhere. It cannot converge everywhere, because at x=0 it becomes the harmonic series \sum_{n=1}^{\infty}\frac1n, which diverges. This example shows why “almost everywhere” is the natural conclusion of a general theorem based only on coefficient estimates.

For a contrasting borderline example, take, for n\ge2 , a_n=\frac{1}{\sqrt n\log(n+1)}. Then \sum_{n=2}^{\infty}|a_n|^2 =\sum_{n=2}^{\infty}\frac{1}{n\log^2(n+1)}<\infty, but \sum_{n=2}^{\infty}|a_n|^2\log^2(n+1) =\sum_{n=2}^{\infty}\frac1n=\infty. The Rademacher–Menshov theorem does not prove almost-everywhere convergence in this case. Carleson’s theorem does, because ordinary square summability is already enough for the trigonometric system.

Leave a comment