A sequence related to prime counting

The current note arose as an exploration branching off from the matter discussed in these earlier notes: this one and this one. As we saw before, Carl Gauss, while still in his teens, produced his first estimate of the prime number distribution in the form of the function:

\pi(n) \sim \dfrac{n}{\log(n)}

Here is \pi(n) is the prime counting function, which counts the number of prime numbers up to a given number n, and \log(n) is the natural logarithm of n. The \sim notation indicates that the prime counting function is asymptotic with \tfrac{n}{\log(n)}, i.e. as n\to \infty the ratio \pi(n)\big / \tfrac{n}{\log(n)} \to 1.

Subsequently, Gauss refined his fit for the prime counting function by using the famed logarithmic integral \textrm{Li}(x). We were curious if there was some arithmetic function, which was actually fitted by \tfrac{n}{\log(n)} rather it being merely a single term approximation of the \pi(n). In course of some arithmetic experiments, we stumbled upon a sequence, which we believe, without formal proof, is fitted by \tfrac{n}{\log(n)} in terms of average behavior.

This sequence f is defined thus: f[1]=1. Thereafter, add n-1 to all terms f[1:(n-1)]. Count how many of f[1:(n-1)]+(n-1) are primes. This count is f[n]. For example when n=2 we add 2-1=1 to 1 we get 2. Which is a single prime; hence, f[2]=1. Now for n=3 we add 3-1=2 to the first two terms and we get 3, 3. Thus, we have 2 primes; hence f[3]=2. For n=4, we add 4-1=3 to the prior terms and get 4, 4, 5, which yields a single prime, 5; hence, f[4]=1. Thus, the first few terms of the sequence goes: 1, 1, 2, 1, 3, 1, 4, 1, 1, 2, 7, 2, 7, 1, 1, 4, 11, 3, 9, 2, 4, 4, 11, 0, 2, 4, 4, 11, 11, 6. Figure 1 shows a plot of the first 20000 terms of the sequence.

prime_back_addition_fig1Figure 1

The blue line is the plot of this sequence and we notice right away that despite the fluctuations the average tendency is to grow with n. Via numerical experiments we were able to establish that this average growth is fitted best by the function \tfrac{n}{\log(n)} (red line in Figure 1). The green line in Figure 1 is the count of primes \pi(n). We observe that though some extreme values of f exceed \pi(n), the average behavior of f[n], i.e. \tfrac{n}{\log(n)} < \pi(n). This relates to a central development in the number theory: when Gauss conjectured the asymptotic relationship between \tfrac{n}{\log(n)} and \pi(n) the mathematical apparatus was not yet in place to prove it. This was finally developed by his last student Bernhard Riemann. Using those ideas, nearly century after Gauss’ conjecture, Hadamard and de la Vallée-Poussin proved it and it became known as the Prime Number Theorem. Further, de la Vallée-Poussin showed that \pi(n) was related to \tfrac{n}{\log(n)} thus:


Here, the second term is gives the error and is denoted using the big-O notation which was explained in an earlier note. This indicates that indeed \tfrac{n}{\log(n)} would be less than \pi(n). Thus, as can be seen in Figure 1 the average growth of f[n]<\pi(n).

We then used \tfrac{n}{\log(n)} to ‘rectify’ f[n] i.e. obtain:


Prime_back_addition_fig2Figure 2

This rectified f[n] is plotted in Figure 2 and provides a clear picture of fluctuations in f[n] once we have removed the average growth trend. We observe right away that the amplitude of the fluctuations grows with n. To determine this growth trend of the rectified f[n], we first noticed from Figure 1 that \pi(n) tends to run close to the maxima of f[n]. Hence, we utilized the asymptotic expansion of \textrm{Li}(n), which is a better approximation of \pi(n) and captures the behavior beyond the basic \tfrac{n}{\log(n)} term:

\textrm{Li}(n) \sim \dfrac{n}{\log(n)} \displaystyle \sum_{k=0}^\infty \dfrac{k!}{(\log(n))^k}

\textrm{Li}(n) \sim \dfrac{n}{\log(n)}+\dfrac{n}{\log^2(n)}+\dfrac{2n}{\log^3(n)}+\dfrac{6n}{\log^4(n)}...

Using the first 4 terms to approximate the growth of the amplitude of rectified f[n] we get the red bounding curves shown in Figure 2. Thus, we conjecture that while f[n] grows on an average as \tfrac{n}{\log(n)}, the amplitude of its fluctuations is roughly approximated by \textrm{Li}(n)-\tfrac{n}{\log(n)} (Green bounding curves in Figure 2).

This entry was posted in Scientific ramblings and tagged , , , . Bookmark the permalink.