Let us begin with the basic problem. Given an exponential sum
what information about the coefficients is enough to ensure that its partial sums actually converge at almost every point? The first condition one naturally encounters is square summability:
Because the exponentials are orthogonal, this condition immediately implies that the partial sums are Cauchy in . But
convergence only says that the average squared difference between two partial sums is small. It does not prevent a particular point
from seeing large oscillations along some exceptional sequence of stopping points. Thus the real issue is not merely to control one partial sum
at a time. We must control all possible partial sums simultaneously.
Work on the circle , with normalized measure
. Then
Thus , for
, is an orthonormal system in
. For a finite coefficient sequence, write
Orthogonality gives the exact identity
This formula says that, for every fixed endpoint , the size of
is completely controlled by the coefficient energy. However, convergence asks a stronger question. At a given point
, the partial sums may reach their largest value at an endpoint which depends on
. Therefore we must study the maximal partial sum
There is no direct way to insert this maximum into Parseval’s identity. The Rademacher–Menshov estimate supplies a substitute. It says that the price of controlling all possible stopping points is only logarithmic in the number of terms:
Equivalently,
The important point is not just the inequality itself, but its meaning. The expression measures the ordinary
energy of the coefficient sequence. The extra logarithm measures the additional difficulty of allowing the endpoint to vary. Instead of asking about one predetermined sum, we ask for the largest among
correlated sums. The proof will organize all these possible endpoints by binary scale, showing that every initial interval can be assembled from only about
dyadic pieces.
This estimate is elementary but remarkably general. Its proof uses only orthogonality, so it applies to any orthonormal system, not merely to the exponentials. The logarithmic loss is the cost of ignoring all further structure. For the trigonometric system, the functions possess much more cancellation than arbitrary orthogonal functions, and Carleson’s theorem eventually removes the logarithm altogether. But before reaching that deeper result, the Rademacher–Menshov argument gives a clear and concrete answer to a fundamental question: how can square-summable coefficients be upgraded from an averaged
statement to almost-everywhere convergence?
Dyadic Decomposition
First assume that . Consider all dyadic intervals contained in
. At scale
, where
, these intervals are
The intervals at scale are single points. The intervals at scale
have length
, those at scale
have length
, and so on. At the final scale
, there is only one interval, namely
. Now fix an integer
with
. Its binary expansion has the form
The binary digits tell us how to partition the initial interval
. Each digit equal to
instructs us to take one block at the corresponding dyadic scale. There is never more than one selected block at any scale.
For example, when , we have
Accordingly,
The first piece has length
, the second has length
, and the final piece has length
. This is not merely an example: every initial interval can be decomposed in exactly this way, using at most one dyadic interval from each scale.
For a dyadic interval , define its block sum by
The dyadic decomposition gives
where is either the one selected dyadic interval of length
or is absent. When no interval is selected at a scale, we interpret the corresponding summand as zero. There are at most
nonzero terms in this decomposition. Cauchy–Schwarz therefore gives
Every interval selected in this decomposition belongs to the full dyadic family. Hence
The expression on the right does not depend on . Taking the maximum over
yields the pointwise inequality
This is the first appearance of the logarithm. A partial interval may cut across every dyadic scale, and there are approximately scales. Cauchy–Schwarz pays one factor for the number of such pieces.
We now integrate the preceding pointwise bound. Orthogonality gives an exact formula for every dyadic block:
Indeed,
and the integral is , so only the diagonal terms remain. At a fixed scale
, the intervals
partition
. Thus
Every coefficient appears once at each scale. Since there are scales,
Combining this with the pointwise maximal estimate from above gives
Taking square roots,
For a general positive integer , choose
so that
, extend the coefficient sequence by setting
for
, and apply the estimate above. This proves the Rademacher–Menshov maximal inequality
The origin of the squared logarithm in the integrated estimate is now completely visible. The first factor of comes from breaking a single partial sum into roughly
dyadic pieces and applying Cauchy–Schwarz. The second factor comes from integrating the square: every coefficient is counted once at every dyadic scale, hence roughly
times.
The same argument applies when the frequencies begin at an arbitrary integer rather than at
. If
, then
Nothing new needs to be proved. One simply applies the preceding dyadic decomposition to the finite orthonormal family This block form is what turns the finite maximal estimate into an almost-everywhere convergence theorem for infinite series.
Almost-everywhere convergence
Suppose that Then the exponential series
converges for almost every
.
The hypothesis is stronger than ordinary square summability, and so it automatically implies Consequently, the partial sums are Cauchy in
. But
convergence only provides an averaged limit. To prove almost-everywhere convergence, we must show that the partial sums are pointwise Cauchy outside a set of measure zero.
The strategy is to divide the frequency axis into dyadic blocks: Let
This is the coefficient energy contained in the
-th block. When
, we have
, and for
we also have
. Hence the hypothesis implies
There are two distinct forms of movement to control. First, a partial sum can fluctuate while it passes through a single dyadic block. Second, the completed blocks can accumulate as we move farther and farther out in frequency. The maximal estimate controls the first phenomenon; a weighted Cauchy–Schwarz argument controls the second.
Define This is the largest amount by which the partial sums can move while passing through the
-th dyadic block. Since this block has length
, the block maximal estimate gives
Our coefficient hypothesis implies and therefore
We now derive carefully what this says at individual points. Fix , and for each
let
On this set, we have . Hence
Thus After summing over
, we obtain
Now consider the set of points where the inequality
occurs infinitely often for
. We can upper bound the measure of the set by the union bound.
The series on the right converges, so its tail tends to zero as . Therefore the set
has measure zero: indeed,
for every
, and hence
Thus, outside a set of measure zero, the inequality can hold only for finitely many
. Equivalently, for almost every
, there is an index
such that
whenever
Apply this conclusion successively with
. The union of the corresponding exceptional sets still has measure zero. Hence, for every point outside one fixed null set and for every positive
, the quantities
are eventually smaller than
. Therefore
for almost every
.
This conclusion has a very specific meaning: inside sufficiently far-out dyadic blocks, the partial sums barely move at almost every point. It does not yet prove that the full partial sums converge, because small movements in infinitely many different blocks could still accumulate. The next step must therefore control the sums of the completed blocks themselves.
Now define the sum of the full -th block:
Orthogonality gives Therefore
For almost every ,
At such a point, apply Cauchy–Schwarz in the form
Since , we obtain
Thus converges absolutely for almost every
. This shows that the partial sums at dyadic endpoints,
converge almost everywhere.
The distinction between and
is important. The quantities
control the worst local fluctuation inside a block. The quantities
control the total jump from one completed block to the next. Both controls are necessary.
Let , and suppose that
and
, where
. When
, both indices lie in the same dyadic block. The difference
is the difference of two partial block sums. Hence
When , the interval of frequencies from
to
consists of three parts: a final portion of the
-th block, the completed blocks between
and
, and an initial portion of the
-th block. Therefore
At every point where and
, the right side tends to zero whenever
. The first and third terms vanish because the local block oscillations vanish. The middle term vanishes because it is a tail of an absolutely convergent series.
Thus, for almost every , the sequence of partial sums
is Cauchy. We have proved the result.
converges for almost every
whenever
The coefficient hypothesis implies . Hence
There is therefore an
such that
in
. The Rademacher–Menshov argument gives a pointwise limit
for almost every
. These are not two different functions. Convergence in
implies pointwise convergence on a subsequence.Therefore
for almost every
. Thus the theorem says not only that the series has an almost-everywhere pointwise limit, but that this pointwise limit agrees almost everywhere with the
Fourier sum determined by the coefficient sequence.
Convergence of Fourier Series
Suppose that and assume
Apply the one-sided result separately to the positive-frequency series and the negative-frequency series. Both series converge almost everywhere. After adding the constant coefficient , it follows that the symmetric Fourier partial sums
converge almost everywhere to assuming
The proof does not attempt to obtain pointwise convergence directly from
That condition gives an limit, but it does not by itself control the largest partial sum at a point. Instead, the proof organizes the frequency axis into dyadic blocks. The block
has length approximately
, and so it contains many possible stopping points. The maximal estimate says that controlling every possible stopping point inside this one block costs a factor of approximately
in
norm. But the index
is comparable with
on the block
. The condition
says exactly that the total coefficient energy is strong enough to pay the maximal-function cost on every dyadic scale. The proof then separates the pointwise convergence problem into two manageable statements. The quantities
show that the oscillation inside the later blocks becomes negligible. The quantities
show that the completed block jumps are summable. Together, these facts imply that the full sequence of partial sums is pointwise Cauchy almost everywhere.
The Rademacher–Menshov proof treats the exponential system only as an orthonormal system. It never uses the much richer interaction among the phases at different frequencies. This is why the proof pays a logarithmic price. For the actual trigonometric system, Carleson’s theorem proves the stronger maximal estimate
There is no logarithm. Consequently, every Fourier series converges almost everywhere. The conceptual difference is important. Rademacher–Menshov decomposes every partial sum into dyadic pieces and pays for each possible scale separately. Carleson’s theorem exploits cancellation that is special to the trigonometric system and controls many possible stopping points at once. It sees analytic structure that pure orthogonality does not see.
Take Then
vTherefore the Rademacher–Menshov theorem implies thatv
vconverges almost everywhere. It cannot converge everywhere, because at
it becomes the harmonic series
which diverges. This example shows why “almost everywhere” is the natural conclusion of a general theorem based only on coefficient estimates.
For a contrasting borderline example, take, for ,
Then
but
The Rademacher–Menshov theorem does not prove almost-everywhere convergence in this case. Carleson’s theorem does, because ordinary square summability is already enough for the trigonometric system.