A note on the least prime divisor sequences of 2p plus or minus 1

Let p be the sequence of prime numbers: 2, 3, 5, 7… Define the sequences q such that q[n]=2p[n]\pm 1. Then sequence f_1 is defined such that f_1[n] is the lowest prime divisor (LPD) of q[n]=2p[n]+1 and sequence f_2 is defined so that f_2[n] is the LPD of q[n]=2p[n]-1.

f_1: 5, 7, 11, 3, 23, 3, 5, 3, 47, 59…
f_2: 3, 5, 3, 13, 3, 5, 3, 37, 3, 3…

primes2p_plus_1_1Figure 1. A plot of the first 100 terms of f_1, f_2

We observe that for f_1 the successive record values (i.e. successive maxima), M_{2p\pm 1}, are what are called safe primes and the corresponding p[n] is a Sophie Germain prime. For f_1[n] \ge 11 these M_{2p\pm 1} values are primes of the form 12n-1. In the case of f_2 when f_2[n] \ge 13 the successive M_{2p\pm 1} values are primes of the form 12n+1. From Figure 1 we observe that though the record values keep rising for these sequences for most part they assume low values. Obviously, the lowest value it can take is 3. We also observe that frequency of the occurrence of the n^{th} prime in these sequences from 3 upwards keeps decreasing. Below we tabulate the frequencies for the first 10 primes in f_1, f_2 for n \le 25997. The 3th column has the frequencies of the first 10 primes in the sequence of LPDs of odd numbers 2n+1 up to some large n.

primes2p_plus_1_tab

primes2p_plus_1_2Figure 2. Frequencies of the first 100 primes in f_1, f_2 (blue and red). The frequencies of the first 100 primes in the sequence of LPDs of odd numbers up to some large n (cyan). The curve y=\tfrac{1}{2x^2} is shown in green for comparison.

From the above we see that the frequencies of the n^{th} primes in the sequences f_1, f_2 are very similar and likely to asymptotically converge to the same value. We can easily calculate the exact frequencies of the n^{th} prime in the sequence of LPDs of odd numbers in general: e.g. 3 will occur at fr=1/3; 5 will occur at fr=(1-1/3)\times 1/5=.13333; 7 will occur at fr= (1-.\overline{3}-.1\overline{3})\times 1/7 =0.07619048; 11 will occur at fr= (1-.\overline{3}-.1\overline{3}-0.07619048)\times 1/11 =0.04155844 and so on. Thus, we observe that the frequencies of the n^{th} prime in f_1, f_2 notably differ from the frequencies of the same in the sequence of LPDs of odd numbers in general. We have not figured out if there is a means of exactly calculating the frequencies of the n^{th} prime in f_1, f_2. Strangely, the first few frequencies are close to reciprocals of the sequence 2, 8, 16, 32, 41, 78, 90, 128, which relates to a certain co-primality triangle. While this make no sense at all to us, it is unclear if it is all chance or some relationship exists (see postscript).

We then investigated how exactly the record values of f_1[n], f_2[n] grow with n. This is shown in Figure 3.

primes2p_plus_1_3Figure 3. f_1, f_2 plotted to 25997 terms

Visual examination of the plot showed that the record values M_{2p\pm 1} grow very similarly in but f_1 and f_2 and they are bounded by a smooth curve that appears to be of the form y=k x \log(x), where k is some constant. The original Gaussian form of the prime number counting function can be written as (using the asymptotic notation):
\pi(x) \sim \dfrac{x}{\log(x)}
From this we can write the expression for the n{th} prime p_n thus:
p_n \sim n \log(n)
The record values of the LPDs of f_1,f_2 will be primes of the form 2p_n \pm 1. From this we can infer that that the record values of the two sequences M_{2p\pm 1} will be fitted by the curve:
y= 2x \log(x)

In Figure 3 this is plotted as the cyan curve. While this reasonably captures the behavior of of the bounding curve of M_{2p\pm 1}, it systematically falls short of it. As we have seen before, the above Gaussian form of the prime counting function is only a crude approximation, which Gauss and Dirichlet eventually replaced with the logarithmic integral \textrm{Li}(x). In this regard Rosser had proved long ago that p_n \ge n\log(n); hence, what we see is a direct consequence of this. Inspired by the work of Chebyshev and Riemann, the obscure Russian village mathematician I.M. Pervushin (Pervouchine) investigated an exact formula for the n^{th} prime using a table of 25997 primes (for numbers \le 3 \times 10^5), which is coincidentally the same as the number we used in our investigation. Consequently he arrived at the remarkable formula:

p_n \approx n\left(\log(n)+\log(\log(n))-1 +\dfrac{5\log(n)}{12}-\dfrac{1}{24\left(\log(n)\right)^2}\right)

This formula inspired Ernesto Cesàro to discover the more correct formula for the n^{th} prime:

p_n=n\Bigg(\log(n)+\log(\log(n))-1 +\dfrac{\log(\log(n))-2}{\log(n)}-\dfrac{\left(\log(\log(n))\right)^2-6\log(\log(n))+11}{2\left(\log(n)\right)^2}\\ + o\left(\dfrac{1}{\left(\log(n)\right)^2}\right) \Bigg)

Here, the small-o notation can be interpreted to mean that the final error term is negligible compared to \tfrac{1}{\left(\log(n)\right)^2}

Searching the literature, we found that recently Pierre Dusart had proved that

p_n \le n\left(\log(n)+\log(\log(n))-1 +\dfrac{\log(\log(n))-2}{\log(n)}\right), \; n \ge 688383

Thus, for large n the first 4 terms are sufficient. Hence, based on Cesàro’s formula we arrived at the approximate function for the behavior of M_{2p\pm 1}:

y=2x\left(\log(x)+\log(\log(x))-1 +\dfrac{\log(\log(x))-2}{\log(x)}\right)

This uses only the first 4 terms of Cesàro’s formula but gives us a good fit as seen by the red curve in Figure 3. In numerical terms for the largest prime in both f_1, f_2 for the first 25997 terms this approximation gives an error fraction of .002 suggesting that it is indeed a good one.


After we posted this note on Twitter we rather quickly heard back from an acquaintence on that forum about his solution for exact form of the frequencies of the n^{th} prime in the above sequences. You can read his excellent post here.

This entry was posted in Scientific ramblings and tagged , , , , . Bookmark the permalink.