## A note on the least prime divisor sequences of 2p plus or minus 1

Let $p$ be the sequence of prime numbers: 2, 3, 5, 7… Define the sequences $q$ such that $q[n]=2p[n]\pm 1$. Then sequence $f_1$ is defined such that $f_1[n]$ is the lowest prime divisor (LPD) of $q[n]=2p[n]+1$ and sequence $f_2$ is defined so that $f_2[n]$ is the LPD of $q[n]=2p[n]-1$.

$f_1:$ 5, 7, 11, 3, 23, 3, 5, 3, 47, 59…
$f_2:$ 3, 5, 3, 13, 3, 5, 3, 37, 3, 3…

Figure 1. A plot of the first 100 terms of $f_1, f_2$

We observe that for $f_1$ the successive record values (i.e. successive maxima), $M_{2p\pm 1}$, are what are called safe primes and the corresponding $p[n]$ is a Sophie Germain prime. For $f_1[n] \ge 11$ these $M_{2p\pm 1}$ values are primes of the form $12n-1$. In the case of $f_2$ when $f_2[n] \ge 13$ the successive $M_{2p\pm 1}$ values are primes of the form $12n+1$. From Figure 1 we observe that though the record values keep rising for these sequences for most part they assume low values. Obviously, the lowest value it can take is 3. We also observe that frequency of the occurrence of the $n^{th}$ prime in these sequences from 3 upwards keeps decreasing. Below we tabulate the frequencies for the first 10 primes in $f_1, f_2$ for $n \le 25997$. The 3th column has the frequencies of the first 10 primes in the sequence of LPDs of odd numbers $2n+1$ up to some large $n$.

Figure 2. Frequencies of the first 100 primes in $f_1, f_2$ (blue and red). The frequencies of the first 100 primes in the sequence of LPDs of odd numbers up to some large $n$ (cyan). The curve $y=\tfrac{1}{2x^2}$ is shown in green for comparison.

From the above we see that the frequencies of the $n^{th}$ primes in the sequences $f_1, f_2$ are very similar and likely to asymptotically converge to the same value. We can easily calculate the exact frequencies of the $n^{th}$ prime in the sequence of LPDs of odd numbers in general: e.g. 3 will occur at $fr=1/3$; 5 will occur at $fr=(1-1/3)\times 1/5=.13333$; 7 will occur at $fr= (1-.\overline{3}-.1\overline{3})\times 1/7 =0.07619048$; 11 will occur at $fr= (1-.\overline{3}-.1\overline{3}-0.07619048)\times 1/11 =0.04155844$ and so on. Thus, we observe that the frequencies of the $n^{th}$ prime in $f_1, f_2$ notably differ from the frequencies of the same in the sequence of LPDs of odd numbers in general. We have not figured out if there is a means of exactly calculating the frequencies of the $n^{th}$ prime in $f_1, f_2$. Strangely, the first few frequencies are close to reciprocals of the sequence 2, 8, 16, 32, 41, 78, 90, 128, which relates to a certain co-primality triangle. While this make no sense at all to us, it is unclear if it is all chance or some relationship exists (see postscript).

We then investigated how exactly the record values of $f_1[n], f_2[n]$ grow with $n$. This is shown in Figure 3.

Figure 3. $f_1, f_2$ plotted to 25997 terms

Visual examination of the plot showed that the record values $M_{2p\pm 1}$ grow very similarly in but $f_1$ and $f_2$ and they are bounded by a smooth curve that appears to be of the form $y=k x \log(x)$, where $k$ is some constant. The original Gaussian form of the prime number counting function can be written as (using the asymptotic notation):
$\pi(x) \sim \dfrac{x}{\log(x)}$
From this we can write the expression for the $n{th}$ prime $p_n$ thus:
$p_n \sim n \log(n)$
The record values of the LPDs of $f_1,f_2$ will be primes of the form $2p_n \pm 1$. From this we can infer that that the record values of the two sequences $M_{2p\pm 1}$ will be fitted by the curve:
$y= 2x \log(x)$

In Figure 3 this is plotted as the cyan curve. While this reasonably captures the behavior of of the bounding curve of $M_{2p\pm 1}$, it systematically falls short of it. As we have seen before, the above Gaussian form of the prime counting function is only a crude approximation, which Gauss and Dirichlet eventually replaced with the logarithmic integral $\textrm{Li}(x)$. In this regard Rosser had proved long ago that $p_n \ge n\log(n)$; hence, what we see is a direct consequence of this. Inspired by the work of Chebyshev and Riemann, the obscure Russian village mathematician I.M. Pervushin (Pervouchine) investigated an exact formula for the $n^{th}$ prime using a table of 25997 primes (for numbers $\le 3 \times 10^5$), which is coincidentally the same as the number we used in our investigation. Consequently he arrived at the remarkable formula:

$p_n \approx n\left(\log(n)+\log(\log(n))-1 +\dfrac{5\log(n)}{12}-\dfrac{1}{24\left(\log(n)\right)^2}\right)$

This formula inspired Ernesto Cesàro to discover the more correct formula for the $n^{th}$ prime:

$p_n=n\Bigg(\log(n)+\log(\log(n))-1 +\dfrac{\log(\log(n))-2}{\log(n)}-\dfrac{\left(\log(\log(n))\right)^2-6\log(\log(n))+11}{2\left(\log(n)\right)^2}\\ + o\left(\dfrac{1}{\left(\log(n)\right)^2}\right) \Bigg)$

Here, the small-o notation can be interpreted to mean that the final error term is negligible compared to $\tfrac{1}{\left(\log(n)\right)^2}$

Searching the literature, we found that recently Pierre Dusart had proved that

$p_n \le n\left(\log(n)+\log(\log(n))-1 +\dfrac{\log(\log(n))-2}{\log(n)}\right), \; n \ge 688383$

Thus, for large $n$ the first 4 terms are sufficient. Hence, based on Cesàro’s formula we arrived at the approximate function for the behavior of $M_{2p\pm 1}$:

$y=2x\left(\log(x)+\log(\log(x))-1 +\dfrac{\log(\log(x))-2}{\log(x)}\right)$

This uses only the first 4 terms of Cesàro’s formula but gives us a good fit as seen by the red curve in Figure 3. In numerical terms for the largest prime in both $f_1, f_2$ for the first 25997 terms this approximation gives an error fraction of .002 suggesting that it is indeed a good one.

After we posted this note on Twitter we rather quickly heard back from an acquaintence on that forum about his solution for exact form of the frequencies of the $n^{th}$ prime in the above sequences. You can read his excellent post here.

This entry was posted in Scientific ramblings and tagged , , , , . Bookmark the permalink.