Selberg’s Sieve

From a set ${A}$ , assume that we want to pick elements that are not divisible by any primes in a set ${P}$ . Let ${A_d}$ be the set of element of ${A}$ divisible by ${d}.$

If we write the inclusion-exclusion to sift out the element divisible by some prime in ${P}$ , we get

$\displaystyle S(A, P)=\sum_{d|P} \mu(d) |A_d|.$

Even if we know ${A_d}$ very accurately, the above expression is not useful if ${P}$ has a lot of primes because the sum has too many terms, the error terms with add up.

So we need approximations to ${\mu(d)}$ that are supported on ${d}$ small so that we have lesser number of terms in the sifting sum.

Say we want an upper bound, one way to achieve is to make sure that you always count elements not divisible by ${P}$ and add flexibility to include more elements.

So we want ${\alpha_d}$ such

$\displaystyle \sum_{d|n} \mu(d) \le \sum_{d|n}\alpha_d.$

$\displaystyle \implies 1 \le \alpha_1,\quad 0 \le \sum_{d|n}\alpha_d$

Selberg choice is the the following square form to ensure positivity.

$\displaystyle \sum_{d|n}\alpha_d=\left(\sum_{d|n}\lambda_d\right)^2$

with ${\lambda_1=1}$ .

Assume $\displaystyle |A_d| = g(d) X+ r_d.$

where ${g(d)}$ is multiplicative function telling us the probability of a random element being divisible by ${d}$ , ${r_d}$ is the error and ${X}$ is the total size of ${A}$ .

$\displaystyle S(A, z) \le \sum_{n \in A} \left(\sum_{d|(n, P)}\lambda_d\right)^2 = \sum_{n \in A} \sum_{d_1, d_2|(n, P)} \lambda_{d_1}\lambda_{d_2}$

$\displaystyle = \sum_{d_1, d_2 |P} \lambda_{d_1}\lambda_{d_2} \sum_{n\in A , d_1, d_2|n} =\sum_{d_1, d_2|P} \lambda_{d_1}\lambda_{d_2} |A_{[d_1, d_2]}|$

$\displaystyle = \sum_{d_1, d_2|P} \lambda_{d_1}\lambda_{d_2} g([d_1, d_2])X + )\left( \sum_{d_1, d_2|P} \lambda_{d_1}\lambda_{d_2} r_{[d_1, d_2]} \right)$

We now ignore the error terms and try to find ${\lambda_d}$ that minimize the main term. This is a quadratic form ${in \lambda_d}$ with coefficients ${g[\lambda_{d_1}, \lambda_{d_2}].}$

$\displaystyle G:= \sum_{d_1, d_2|P} \lambda_{d_1}\lambda_{d_2} g([d_1, d_2])= \sum_{{a b c \mid P}} \sum g(a b c) \lambda_{a c} \lambda_{b c}$

where we denoted ${d_1 = ac, d_2 = bc}$ , with the gcd ${(d_1, d_2)=c.}$

$\displaystyle =\sum_{c|P} \frac{1}{g(c)} \sum_{{ (a, b)=1}} \sum g(ac) g(bc) \lambda_{a c} \lambda_{b c}$

$\displaystyle =\sum_{c|P} \frac{1}{g(c)} \sum_{{ d_1|(a, b)}} \mu(d_1) \sum g(ac) g(bc) \lambda_{a c} \lambda_{b c}$

$\displaystyle =\sum_{c|P} \frac{1}{g(c)} \sum_{\substack{ a=a_1d_1\\ b=b_1d_1}} \mu(d) \sum g(a_1d_1c) g(b_1d_1c) \lambda_{a_1d_1 c} \lambda_{b_1d_1c}$

$\displaystyle \sum_{c} \frac{1}{g(c)} \sum_{d_1} \mu(d_1) \left(\sum_{k} g(kcd_1) \lambda_{kc d_1}\right)^{2}.$

Let us call define ${h(d)}$ by

$\displaystyle \frac{1}{h(d)}=\sum_{cd_1= d} \frac{\mu(d_1)}{g(c)}$

We have

$\displaystyle h(p)=\frac{g(p)}{1-g(p)} \quad \quad g(p)=\frac{h(p)}{1+h(p)} .$

So we have

$\displaystyle G=\sum_{d|P} \frac{1}{h(d)} \left(\sum_{k} g(dk) \lambda_{d k}\right)^{2}$

$\displaystyle =\sum_{d|P} h(d) \left( \frac{\mu(d)}{h(d)}\sum_{k} g(dk) \lambda_{d k}\right)^{2}.$

With

$\displaystyle x_d = \frac{\mu(d)}{h(d)}\sum_{k} g(dk) \lambda_{d k},$

the form is diagaonalized to

$\displaystyle G=\sum_{d|P} h(d) x_d^2.$

Now we rewrite ${\lambda_d}$ in terms of ${x_d}$ .

$\displaystyle \lambda_{l}=\frac{\mu(l)}{g(l)} \sum_{l \mid d} h(d) x_{d}.$

The constraint ${\lambda_1=1}$ gives

$\displaystyle \sum_{d|P} h(d) x_{d} =1$

Assume the support of ${\lambda_d}$ is ${d \le \sqrt{D}.}$

Cauchy -Schwarz implies

$\displaystyle 1= \left(\sum_{d|P} h(d) x_{d}\right)^2 \le \left(\sum_{\substack{d|P \\ d\le \sqrt{D}}} h(d) x_{d}^{2}\right) \left(\sum_{\substack{d|P\\ d \le \sqrt{D}}} h(d)\right)$

Therefore

$\displaystyle G \ge \frac{1}{J}.$

where

$\displaystyle J = \sum_{\substack{d|P\\ d \le \sqrt{D}}} h(d).$

The optimal vector satisifies

$\displaystyle x_d= \frac{1}{J}.$

And we get the main term ${X/J}$ . So we have

$\displaystyle S(A, P)=\frac{X}{J} +R,$

where

$\displaystyle R \le \sum_{\substack{d_{1}, d_{2} \leqslant \sqrt{D} \\ d_{1}, d_{2} \mid P(z)}} |\lambda_{d_1}| |\lambda_{d_2}|\left|r_{\left[d_{1}, d_{2}\right]}\right|$

We can see that

$\displaystyle J \le \sum_{d|P} h(d) = \prod_{p \mid P}(1+h(p))=\prod_{p \mid P}(1-g(p))^{-1}$

Thus

$\displaystyle G \ge V(z)=\prod_{p \mid P}(1-g(p)).$

Note the ${V(z)=\prod_{p \mid P}(1-g(p))X}$ is the main term for the sifted set if the events ${p|n, n \in A}$ are independent.

To get the good upper bounds we need to show that ${J}$ is large.

To get bounds on ${R}$ , we need bounds on ${\lambda_d}$ . I f we prove ${|\lambda_d| \le 1}$ , we get

$\displaystyle R \leqslant \sum_{d \leqslant D \atop d \mid P(z)} 3^{\omega(d)}\left|r_{d}\right|$

So we need to look at the how the optimal ${\lambda_d}$ look like.

$\displaystyle \lambda_{l}= \frac{\mu(l)}{g(l)} \sum_{l \mid d} h(d) x_{d} =\frac{\mu(l)}{Jg(l)} \sum_{l \mid d} h(d).$

$\displaystyle J=\sum_{\substack{d|P\\ d \le \sqrt{D}}} h(d) = \sum_{k \mid \ell} \sum_{\substack{d<\sqrt{D} \\(d, \ell)=k}} h(d) = \sum_{k \mid \ell} h(k) \sum_{\substack{d_1=d/k<\sqrt{D} / k\\ (d_1, l)=1}} h(d_1) \ge \sum_{k \mid \ell} h(k) \sum_{\substack{d_1<\sqrt{D} / l\\ (d_1, l)=1}} h(d_1)$

$\displaystyle = \sum_{k \mid \ell} \frac{h(k)}{h(l)} \sum_{\substack{d_1<\sqrt{D} / l\\ (d_1, l)=1}} h(ld_1) = \frac{1}{g(l)}\sum_{l |d} h(d)$

Hence we have ${ |\lambda_l | \le 1.}$

The shape of the sieve weight ${\lambda_d}$ .

We want to evaluate ${J}$ . Without the the restriction ${d|P}$ we get

$\displaystyle \sum_{d \le X} h(d) \sim c_h (\log x)^k$

if we have something like ${g(p) \sim \frac{k}{p}.}$ in some average sense where

$\displaystyle c_h=\frac{1}{\Gamma(k+1)} \prod_{p}\left(1-\frac{1}{p}\right)^{k}(1+h(p)).$

$\displaystyle \lambda_{l}=\mu(l) \frac{h(l)}{g(l)} \dfrac{\displaystyle \sum_{d<\sqrt{D} / l, (l, d)=1} h(d)}{\displaystyle \sum_{\substack{d|P\\ d \le \sqrt{D}}} h(d) }$

$\displaystyle \implies \lambda_{l} \sim \mu(l)\left(\frac{\log ^{+} \sqrt{D} / l}{\log \sqrt{D}}\right)^{\kappa}$

Thus ${\lambda_d}$ behaves like a smooth truncation of ${\mu(d)}$ . Instead of the optimal vectors, we could start with vectors of the shape

$\displaystyle \mu(d) F\left(\frac{\log d}{\log D}\right),$

where is ${F}$ is a smooth function on ${[-\infty, 1]}$ and similar results.

The smoothness is important because the sharp truncation $\displaystyle {(\sum_{\substack{d|n \, d\le D}} \mu(d))^2}$ gives a lot of weight to numbers that are not free of small prime factors. In fact, we have

$\displaystyle \sum_{n \le X}\left(\sum_{d|n d\le D} \mu(d)\right)^2 \approx X,$

whereas the smooth weights above give

$\displaystyle \sum_{n \le X}\left(\sum_{d|n } \lambda_d\right)^2 \approx \frac{X}{\log \sqrt D}$

and hence the main contribution comes from numbers with few number of small prime factors.

But we also have

$\displaystyle \sum_{n \le X}\tau(n)\left(\sum_{d|n } \lambda_d\right)^2 \approx X$

so the contribution of numbers with small prime factors is not too small, because by the above estimate, if we weight it by ${\tau(n)}$ , that the contribution amplifies and dominates the sum.

The sign of the resulting sieve weight:

$\displaystyle \alpha_d =\sum_{[d_1, d_2]=d} \lambda_{d_1}\lambda_{d_2}$

Looking at the formula for ${\lambda_d }$ we saw that ${\lambda_d}$ has the same sign as ${\mu(d)}$ , but this doesn’t mean ${\alpha_d}$ will also have the same sign.

But

$\displaystyle \alpha_p = (\lambda_p) (2+\lambda_p) \le 0.$

We can also prove that $\displaystyle \alpha_{pq} >0$

GPY Weights:

Instead of choosing ${x_d}$ to be constant, if we choose

$\displaystyle x_d =C(D)\left(\log \frac{\sqrt{D}}{d}\right)^{\ell}$

the resulting sieve weights looks like

$\displaystyle \lambda_l=\mu(d)\left(1-\frac{\log d}{\log \sqrt{D}}\right)^{k+l}.$

that is the like the weights for dimension ${k+l}$ . These are helpful to get small gaps between primes.

Dependence on ${P}$ :

If the primes sifted is ${p<z}$ , then for a fixed ${k}$ (the dimension, average classes mod ${p}$ that need to be sifted from ${A}$ ), as ${z}$ becomes small relative to ${D}$ (the support of the sieve weight), the main term converges to the ${V(z)}$ . But the rate of convergence is small compared to other combinatorial sieves like beta sieve.

The Selberg sieve can be used to create lower bound sieve which for large ${k}$ , gives better lower bound in a wider range of ${z}$ , ${z= D^{1/2k}}$ compared to other combinatorial sieves.

Also for large ${k}$ , the main terms wont change much if we increase ${z}$ from ${D^{\frac{\log k}{4k}}}$ to a bigger values. That is primes large than ${D^{\frac{\log k}{4k}}}$ have no effect on Selberg’s upper bound.