Hölder’s inequality is repeated Cauchy–Schwarz

Cauchy–Schwarz and Hölder’s inequality are the basic tools for controlling the interaction, or correlation, of two functions. For finite sequences f=(f_i) and g=(g_i) , their correlation is measured by the inner product

\displaystyle \langle f,g\rangle=\sum_i f_i\overline{g_i}.

The triangle inequality reduces the problem to estimating \sum_i|f_i g_i|. Thus the central question is this: how can one control the total interaction \displaystyle \sum_i|f_i g_i| using only separate information about the sizes of f and g? Cauchy–Schwarz answers this when both sequences are measured using squares:

\displaystyle \sum_i|f_i g_i|\le\Big(\sum_i|f_i|^2\Big)^{1/2}\Big(\sum_i|g_i|^2\Big)^{1/2}.

Hölder’s inequality is the general version. It says that one may measure f and g using different powers, provided the powers fit together correctly. If \displaystyle 1<p,q<\infty and \displaystyle \frac1p+\frac1q=1, then

\displaystyle \Big|\langle f,g\rangle\Big|\le\sum_i|f_i g_i|\le\Big(\sum_i|f_i|^p\Big)^{1/p}\Big(\sum_i|g_i|^q\Big)^{1/q}.

Equivalently, in norm notation,

\displaystyle \Big|\langle f,g\rangle\Big|\le ||f||_{p}||g||_{q}.

The deeper point of this discussion is that Hölder is not an independent miracle that sits above Cauchy–Schwarz. Morally, it is Cauchy–Schwarz repeated at finer and finer interpolation scales. Cauchy–Schwarz controls a midpoint. Repeating midpoint estimates reaches dyadic fractions such as \displaystyle 1/2,1/4,3/4,1/8,3/8 , while finite chains of such steps already produce every rational proportion. Continuity then fills in every real interpolation parameter.

Examples:

For p=q=2, Hölder is precisely Cauchy–Schwarz. The next case (p,q)=(4,\frac43), already shows the main idea.. Let

\displaystyle S:=\sum_i|f_i g_i|,\quad A:=\sum_i|f_i|^4,\quad B:=\sum_i|g_i|^{4/3}.

We want to move from the mixed product |f_i g_i| toward the two pure endpoint expressions |f_i|^4 and |g_i|^{4/3}. The first split is

\displaystyle |f_i g_i|=(|f_i|^2|g_i|^{2/3})^{1/2}(|g_i|^{4/3})^{1/2}.

Applying Cauchy–Schwarz gives

\displaystyle S^2\le\Big(\sum_i|f_i|^2|g_i|^{2/3}\Big)B.

This does not finish the proof, because the first factor is still mixed. But it is less mixed than before: the power of f has moved from 1 to 2, while the remaining power of g has been reduced. We apply Cauchy–Schwarz once more:

\displaystyle (\sum_i|f_i|^2|g_i|^{2/3} )^2\le AB.

Combining the two inequalities gives \displaystyle S^4\le AB^3, and hence

\displaystyle \sum_i|f_i g_i|\le\Big(\sum_i|f_i|^4\Big)^{1/4}\Big(\sum_i|g_i|^{4/3}\Big)^{3/4}.

Thus Hölder for \displaystyle \Big(4,\frac43\Big) is simply Cauchy–Schwarz applied twice. The proof works because each application moves the mixed expression one step nearer to a pure endpoint.

The case (p,q)=(8,\frac87) is the same story, with one additional intermediate stage. Define

\displaystyle S_0:=\sum_i|f_i g_i|,\quad S_1:=\sum_i|f_i|^2|g_i|^{6/7},\qquad S_2:=\sum_i|f_i|^4|g_i|^{4/7},

and let A:=\sum_i|f_i|^8,\quad B:=\sum_i|g_i|^{8/7}. Cauchy–Schwarz supplies the chain

\displaystyle S_0^2\le S_1B,\quad S_1^2\le S_2B,\quad S_2^2\le AB.

The final inequality reaches the pure f endpoint. Working backwards through the chain gives

\displaystyle S_0^8\le AB^7,

and therefore taking eighth roots gives Hölder for (p,q)=(8,8/7).

\displaystyle \sum_i|f_i g_i|\le(\sum_i|f_i|^8)^{1/8}(\sum_i|g_i|^{8/7})^{7/8}.

The pattern is now visible. Every Cauchy–Schwarz step doubles the current power of f, while the unassigned power of g is moved into the fixed endpoint quantity \displaystyle \sum_i|g_i|^{8/7}. These are the easiest cases because the relevant interpolation parameters are dyadic.

The pair \displaystyle (p,q)=(\frac32,3) has a different appearance. The proof no longer moves along one straight ladder toward an endpoint. Instead, two mixed sums arise and control each other. Repeated Cauchy–Schwarz forms a short cycle. Let

\displaystyle S:=\sum_i x_i y_i,\quad A:=\sum_i x_i^{3/2},\quad B:=\sum_i y_i^3.

We begin from the product x_i y_i and split it as x_i y_i=x_i^{3/4}(x_i^{1/4}y_i), so Cauchy–Schwarz gives

\displaystyle S^2\le A U,\quad U:=\sum_i x_i^{1/2}y_i^2.

The first step has created a new mixed quantity U. It is not an endpoint norm, but it admits a complementary decomposition: x_i^{1/2}y_i^2=y_i^{3/2}\Big(x_i^{1/2}y_i^{1/2}\Big), and hence applying Cauchy–Schwarz again gives \displaystyle U^2\le BS.

The two estimates form a loop: S^2\le AU,\quad U^2\le BS. The first inequality carries us from S to U, while the second carries us back from U to S. Eliminating the intermediate quantity U gives

\displaystyle S^4\le A^2U^2\le A^2BS.

Dividing by S and taking cube roots gives

\displaystyle \sum_i x_i y_i\le\Big(\sum_i x_i^{3/2}\Big)^{2/3}\Big(\sum_i y_i^3\Big)^{1/3}.

The two Cauchy–Schwarz estimates form the loop \displaystyle S^2\le AU,\ U^2\le BS, which closes to give Hölder for (3/2,3). The point of this example is not merely that it proves one more case. It shows that “repeated Cauchy–Schwarz” need not mean a rigid one-directional iteration. It may produce a small system of inequalities between several mixed sums; solving that system gives the final exponent.

Every rational Hölder exponent

The preceding examples are all instances of one finite construction. Fix an integer \displaystyle m\ge2 , let \displaystyle u_i,v_i\ge0, and define

\displaystyle M_r:=\sum_i u_i^{m-r}v_i^r,\qquad 0\le r\le m.

The endpoints are the two pure sums are M_0=\sum_i u_i^m,\quad M_m=\sum_i v_i^m. The intermediate sums are obtained by transferring one power at a time from u_i to v_i. The crucial observation is that each intermediate monomial is exactly the geometric mean of its two neighbors:

\displaystyle u_i^{m-r}v_i^r=\Big(u_i^{m-r+1}v_i^{r-1}\Big)^{1/2}\Big(u_i^{m-r-1}v_i^{r+1}\Big)^{1/2}.

Therefore Cauchy–Schwarz gives, for every 1\le r\le m-1,

\displaystyle M_r^2\le M_{r-1}M_{r+1}.

Thus the list M_0,M_1,\ldots,M_m is log-convex. This is the finite-chain version of midpoint convexity. Writing \displaystyle L_r:=\log M_r, this says

\displaystyle 2L_r\le L_{r-1}+L_{r+1}.

Equivalently, the discrete slopes increase:

L_1-L_0\le L_2-L_1\le\cdots\le L_m-L_{m-1}.

Defining the successive slopes d_r:=L_r-L_{r-1} , d_1\le d_2\le\cdots\le d_m. This ordered list of slopes is the whole mechanism. To estimate L_k , split the total change from L_0 to L_m at k :

\displaystyle  L_k-L_0=d_1+\cdots+d_k,\quad L_m-L_k=d_{k+1}+\cdots+d_m.

Every slope in the first sum is at most every slope in the second sum. Hence the average slope on the left is at most the average slope on the right:

\displaystyle  \frac{L_k-L_0}{k}\le\frac{L_m-L_k}{m-k}.

Rearranging gives mL_k\le(m-k)L_0+kL_m. In geometric language, the points (r,L_r ) lie below the line segment joining the endpoint points (0,L_0 ) and (m,L_m). Thus, for every 0\le k\le m,

\displaystyle L_k\le\frac{m-k}{m}L_0+\frac{k}{m}L_m.

Exponentiating gives M_k\le M_0^{(m-k)/m}M_m^{k/m}. In other words,

\displaystyle \sum_i u_i^{m-k}v_i^k\le\Big(\sum_i u_i^m\Big)^{(m-k)/m}\Big(\sum_i v_i^m\Big)^{k/m}.

This formula already contains Hölder. Choose u_i:=|f_i|^{1/(m-k)},\quad v_i:=|g_i|^{1/k}. Then u_i^{m-k}v_i^k=|f_i g_i|, while u_i^m=|f_i|^{m/(m-k)},\quad v_i^m=|g_i|^{m/k}. The chain estimate becomes

\displaystyle \sum_i|f_i g_i|\le\Big(\sum_i|f_i|^{m/(m-k)}\Big)^{(m-k)/m}\Big(\sum_i|g_i|^{m/k}\Big)^{k/m}.

This is Hölder with p=\frac{m}{m-k},\quad q=\frac{m}{k},\quad \frac1p+\frac1q=1. Every rational exponent \displaystyle p>1 arises in this way. Indeed, write \displaystyle p=s/r with integers \displaystyle s>r\ge1. Taking m=s,\quad k=s-r, gives

\displaystyle p=\frac{m}{m-k}=\frac{s}{r},\quad q=\frac{m}{k}=\frac{s}{s-r}.

There is also a completely multiplicative elimination procedure, which shows exactly how the local Cauchy–Schwarz inequalities combine. Define

\displaystyle D_r:=\frac{M_r^2}{M_{r-1}M_{r+1}}\le1,\qquad 1\le r\le m-1.

We want to multiply powers of the D_r so that every interior M_j cancels except M_k . The required powers are not guessed. Set c_0=c_m=0 and require that the exponent of L_j in the product vanish for every j\ne k . Since \log D_r=2L_r-L_{r-1}-L_{r+1} , this cancellation condition is 2c_j-c_{j-1}-c_{j+1}=0,\quad j\ne k. Thus the weights must change at a constant rate on either side of k , so they are forced to be the piecewise-linear sequence

\displaystyle c_r=\begin{cases}r(m-k),&1\le r\le k,\\ k(m-r),&k\le r\le m-1.\end{cases}

At r=k , the slope has a jump of size m ; this produces precisely the desired power M_k^m . Expanding the product now gives the exact identity

\displaystyle \prod_{r=1}^{m-1}D_r^{c_r}=\frac{M_k^m}{M_0^{m-k}M_m^k}.

Since every D_r\le1 and every c_r\ge0, the product is at most 1 , proving again that M_k^m\le M_0^{m-k}M_m^k .

The (\frac32,3) cycle above is exactly the case \displaystyle m=3 and \displaystyle k=1. Indeed, taking \displaystyle u_i=x_i^{1/2} and \displaystyle v_i=y_i gives

\displaystyle M_0=\sum_i x_i^{3/2},\quad M_1=\sum_i x_i y_i,\quad M_2=\sum_i x_i^{1/2}y_i^2,\quad M_3=\sum_i y_i^3.

The two local midpoint inequalities are precisely

\displaystyle M_1^2\le M_0M_2,\qquad M_2^2\le M_1M_3.

Eliminating M_2 yields M_1^3\le M_0^2M_3.

Likewise, \displaystyle m=5 and \displaystyle k=2 give

\displaystyle {M_1^2}\le {M_0M_2},\quad {M_2^2}\le {M_1M_3},\quad {M_3^2}\le{M_2M_4},\quad {M_4^2}\le {M_3M_5}.

We want to start from M_2^2\le M_1M_3 and remove every quantity except M_2 itself and the endpoints M_0,M_5 . First remove M_1 using M_1\le M_0^{1/2}M_2^{1/2} , obtaining M_2^{3/2}\le M_0^{1/2}M_3 . The only remaining unwanted term is now M_3 . To remove it, use its local relation M_3^2\le M_2M_4 ; this introduces M_4 , but only as a temporary obstruction. Remove that new term immediately with M_4\le M_3^{1/2}M_5^{1/2} , which follows from M_4^2\le M_3M_5 . Hence M_3^2\le M_2M_3^{1/2}M_5^{1/2} , so M_3\le M_2^{2/3}M_5^{1/3} . Substituting this into M_2^{3/2}\le M_0^{1/2}M_3 leaves only M_0,M_2,M_5 : M_2^{3/2}\le M_0^{1/2}M_2^{2/3}M_5^{1/3} . Thus M_2^{5/6}\le M_0^{1/2}M_5^{1/3} , and therefore M_2^5\le M_0^3M_5^2 . Equivalently,

\displaystyle M_2\le M_0^{3/5}M_5^{2/5},

which is Hölder for \displaystyle (p,q)=(\frac53,\frac52).

Thus finite repeated Cauchy–Schwarz arguments prove Hölder for every rational conjugate pair.

Cauchy–Schwarz as midpoint interpolation

The general mechanism is clearest when one removes the specific exponents. Let \displaystyle A_i,B_i\ge0 and define

\displaystyle H(t):=\sum_{i=1}^N A_i^{1-t}B_i^t,\quad 0\le t\le1.

The endpoints are H(0)=\sum_iA_i and H(1)=\sum_iB_i. The parameter t measures how much of the exponent has been transferred from A_i to B_i. Now take two parameters s,t\in[0,1]. At their midpoint,

\displaystyle H\Big(\frac{s+t}{2}\Big)=\sum_i\Big(A_i^{1-s}B_i^s\Big)^{1/2}\Big(A_i^{1-t}B_i^t\Big)^{1/2}.

Cauchy–Schwarz therefore gives the midpoint inequality

\displaystyle H\Big(\frac{s+t}{2}\Big)^2\le H(s)H(t).

Whenever H is positive, this is equivalent to

\displaystyle \log H\Big(\frac{s+t}{2}\Big)\le\frac{\log H(s)+\log H(t)}{2}.

Thus \displaystyle \log H(t) is convex: the logarithm of the mixed sum lies below the straight line joining its endpoint values. This is the exact analytic content of repeated Cauchy–Schwarz.

Applying the midpoint estimate first to \displaystyle 0 and \displaystyle 1, then to \displaystyle 0 and \displaystyle 1/2, and so on, gives

\displaystyle H\Big(\frac{k}{2^m}\Big)\le H(0)^{1-k/2^m}H(1)^{k/2^m}

for every dyadic rational \displaystyle k/2^m. Since H(t) is continuous, dyadic rationals approximate every \displaystyle \theta\in[0,1], and hence

\displaystyle \sum_iA_i^{1-\theta}B_i^\theta\le\Big(\sum_iA_i\Big)^{1-\theta}\Big(\sum_iB_i\Big)^\theta.

This is the basic interpolation inequality. Its proof is nothing more than Cauchy–Schwarz at all dyadic midpoint scales, followed by a limiting argument.

Now let \displaystyle 1<p,q<\infty satisfy \displaystyle 1/p+1/q=1. Set

\displaystyle A_i:=|f_i|^p,\quad B_i:=|g_i|^q,\qquad \theta:=\frac1q.

Then \displaystyle 1-\theta=1/p, and the mixed monomial becomes

\displaystyle A_i^{1-\theta}B_i^\theta=\Big(|f_i|^p\Big)^{1/p}\Big(|g_i|^q\Big)^{1/q}=|f_i g_i|.

The interpolation inequality is therefore exactly Hölder’s inequality:

\displaystyle \sum_i|f_i g_i|\le\Big(\sum_i|f_i|^p\Big)^{1/p}\Big(\sum_i|g_i|^q\Big)^{1/q}.

Thus Hölder is the continuous completion of repeated Cauchy–Schwarz: Cauchy–Schwarz gives midpoint control, finite chains give rational positions, and continuity gives every real position.

Convexity proof

The standard proof uses convexity. For \displaystyle u,v\ge0 and conjugate exponents p,q, weighted arithmetic-geometric mean gives

\displaystyle u^{1/p}v^{1/q}\le\frac{u}{p}+\frac{v}{q}.

This is Young’s inequality, and it follows from convexity of the exponential function. Normalize the sequences by setting

\displaystyle u_i:=\frac{|f_i|^p}{||f||_{p}^p},\quad v_i:=\frac{|g_i|^q}{||g||_{q}^q}.

Then \sum_i u_i=\sum_i v_i=1. Summing Young’s inequality gives

\displaystyle \frac{\sum_i|f_i g_i|}{||f||_{p}||g||_{q}}=\sum_i u_i^{1/p}v_i^{1/q}\le\frac1p\sum_i u_i+\frac1q\sum_i v_i=1.

This proves Hölder immediately. It is often the quickest route, but the interpolation proof explains the deeper geometry: Hölder is the log-convex continuation of Cauchy–Schwarz from one midpoint to every point between two endpoints.

The same argument works for integrals. One replaces finite sums by integrals, applies the finite or simple-function case first, and then passes to general measurable functions by approximation. Thus Hölder is the fundamental principle that a product can be controlled by assigning complementary portions of its exponent to the two factors.

Posted in $.

Leave a comment