From the genetic code to the proof of Fermat’s little theorem
Nucleic acids encode the 20 amino acids found in the sequence of a protein using just 4 bases: A, G, T, C in DNA. Thus, the 4-symbol nucleic acid alphabet encodes a 20-symbol protein alphabet. This is achieved by having 3 letters in the nucleic acid language (a codon) code for one letter in the protein language or for the stop sign to terminate the protein sequence. This is the famed genetic code. When we first learned of it at age 10, we became fascinated with the process of encoding and while playing around with codons, we learned not just the foundations of biology but also some of the basics of combinatorics. That was perhaps the reason why some years later in college we were quite agile with elementary combinatorics despite not having any special mathematical spark.
The first and the most obvious thing we observed was that the total number of codons in the genetic code was the total number permutations with replacements that could achieved in 3-letter words using a 4-symbol alphabet: . A closer look then revealed that these 3-letter words, i.e. codons, could be classified into groups by arranging them as ring graphs (Figure 1). Since nucleic acids have a polarity imparted by the (deoxy) ribose ring of the sugar, i.e. , each of these ring graphs are directed: they take the form of an uroboros. Thus, we get into 24 distinct groups
Of these, 4 rings are homopolymeric, i.e. AAA, GGG, TTT and CCC. Any circular permutation of them will yield the same codon again. Each of the remaining 20 rings is heteropolymeric. Hence, when circularly permuted, each will always yield 3 different codons. For example, the first ring in the second row (Figure 1) will yield AGA, GAA and AAG. Thus, we get the total number of permutations possible in the 3-letter words with a 4-symbol alphabet as: . This reveals a more important truth of combinatorics: If you have any word of prime number length then by definition, other than for homotypic words (equivalent to a homopolymeric codon), it will always have same prime number of circular permutations when the letters are arranged on a directed ring graph as in Figure 1. 3 is the prime number in the case of the genetic code; hence; the circular permutations of the heteropolymeric rings yield 3 codons each. We can express this as a generalization thus: Let be the number of symbols in the alphabet. Let be a prime which is the length of the word in that alphabet. We also insist that is not divisible by . Then the total number of -letter words will be . Of these the homotypic words will amount to . Thus, the remainder will be words. Now, these remaining words, by the above principle of arranging on directed ring graphs, can be grouped into sets each of words. Thus, ; therefore it will be divisible by . Alternatively,
This is the famous Fermat’s little theorem of arithmetic. Fermat had proposed it without a proof but it was subsequently proven by Liebniz. Euler published a proof more than 50 years later, apparently unaware of Liebniz’s manuscript which is believed to have not been formally published. He then provided its general form, the theorem of Euler regarding the totient function , which we had encountered in the previous note. The above proof which we presented is the proof by combinatorics. It was apparently first published by the mathematician Golomb and is a variant of Euler’s original proof.
The periods of the reciprocals of prime numbers
Early tetrapods showed a wide range of finger-counts in their limbs: Acanthostega had an 8-fingered limb; Ichthyostega showed a 7-fingered hind-limb; Tulerpeton had 6 fingers. Some time thereafter, perhaps in a form like Crassigyrinus, the number 5 got fixed. While there are frequent deviations from this in particular lineages, like amphibians losing a finger in the forelimb, the 5-fingered state continued to be the common baseline in most surviving tetrapod clades. Thus, we got our 5-fingered hands. This combined with our bilateral symmetry gave us a number system based on the product of 2 primes: ; i.e., the decimal system. While some islanders of Papua have apparently opted for the smaller senary system based on , the former system came to be the dominant usage of the world.
The peculiarities of the decimal system caught our fancy when our father began teaching us decimal fractions as a kid. We were fascinated by the observation that some decimal fractions terminated: , whereas others just fell into a cycle: . We asked our father why this was so? He told us to focus on: 1) reciprocals, i.e. fractions of the type because all other fractions are integer multiples of such and 2) to look out for primes. Then thinking it would be good for us to get some practice with division, he let us keep to dividing 1 by various numbers.
If one were to do this arithmetic operation, sooner or later, one realizes the following:
1) Only fractions of the form or their multiple terminate. The number of decimal places after which they terminate is . This is easily understood. One needs to divide as many units as the maximum number of times 2 or 5 appear in the denominator; hence, the decimal fraction terminates after that many digits after the point.
2) If the fraction is of the form , where are the primes other than 2 and 5 then it always cycles after an initial run of numbers whose length again depends of the part of the denominator as in the first case.
3) To better understand the cycles let us restrict ourselves to the basic situation where the fraction is of the form , where is any prime other than 2 or 5. Such fractions are always pure cycles in the decimal form. We define cycle-length as the length of the repeating pattern of digits. Since, and terminate we can take their . For other primes we see take various values as below:
What determines the length of the cycle for a given prime? We observe that the cycle is determined by when a multiple of the denominator becomes 1 less than a power of 10 for the first time. Thus we have: . Hence, the cycle for . Similarly, . Hence, the cycle for . Another example: . Hence, the cycle for . This can be formally expressed thus: When then the length of the decimal cycle of is . The pattern of repetition is then given by with the appropriate padding 0s.
With this in place, one can now show that for a given prime , the maximum length of a cycle can be . By Fermat’s little theorem (see above), . Thus, we will reach a number 1 less than a power of 10, which is also divisible by , latest by ; therefore, in these cases . In the above examples we see this for 7, 17 and 19. An examination of the distribution of the digits from for fractions with long cycles shows that they are all present in equal frequency.
This now leads us to an interesting sequence such that is defined as the length of the decimal cycle of the reciprocal of the prime, i.e. . The Englishman W. Shanks was the first to compute this sequence in the pre-computer era for every , i.e., the first 2262 primes and triumphantly presented his results before the Royal Society of England. Unlike Shanks, in our first attempt at this in our childhood we quickly ran out of steam but we had at least obtained some basic picture of the “lay of the land” of the sequence . Hence, we wondered how Shanks reached his goal: was it by brute force or by some trick which the late Śaṃkarācarya of the Govardhana-maṭha used. In any case, as time passed my father had grown richer and procured a computer for me. As a result, I revisited this problem and could now reach where old Shanks had gone. runs thus:
0, 1, 0, 6, 2, 6, 16, 18, 22, 28, 15, 3, 5, 21, 46, 13, 58, 60, 33, 35…
One notices that sometimes is even and other times odd. If it is even the corresponding has a curious property, which we illustrate with few examples:
Thus, we see that for fractions with an even cycle the sum of each of the digits in the first digits with the corresponding digit in the final digits is 9; e.g. in the case of . Likewise, for . This knowledge halves the calculation as long as you know you have reached the midpoint of the cycle. But this leads to the question: what is the distribution of cycle lengths? A good way to approach this question is by plotting against the corresponding . This is shown in Figure 2.
We observe that in this plot the terms of fall on lines of the form , where . Those where the cycle length is of the form will fall on . Those for which , e.g. , will fall on and so on. Since all primes other than 2 are odd, will be even. Thus, might be even or odd. Further, will be even. Thus, given that only when is even we can get odd-length cycles, the number of primes with odd-length cycles will be less than the number of primes with even-length cycles. The ratio of the number of odd-length cycles to even-length cycles for the first 2262 primes it is 0.487. While it close to , it is not clear if it converges to a particular value.
In Figure 3 we depict the first 2262 primes colored according to whether they have an even- or odd-length cycle. There is no obviously discernible pattern. Yet, there could be a subtle one which we are unable to describe: whether such a pattern exists remains an open question to us.
The and the corresponding can be classified into different families depending on which line of the form they fall on (Figure 2). Thus, for , we have the cycle 1 family; for we have the cycle 2 family and so on. The frequency of the cycle$k$ families for in the first 2262 primes is shown in figure 4.
We observe that the cycle 1 primes are the most common: 7, 17, 19, 23, 29, 47, 59, 61, 97, 109…
Then cycle 2 primes: 3, 13, 31, 43, 67, 71, 83, 89, 107, 151…
Thereafter the cycles become rarer rapidly. Cycle 3: 103, 127, 139, 331, 349, 421, 457, 463, 607, 661…
cycle 4: 53, 173, 277, 317, 397, 769, 773, 797, 809, 853…
cycle 5 is anomalously rarer than the flanking even cycles: 11, 251, 1061, 1451, 1901, 1931, 2381, 3181, 3491, 3851…
This anomalous rarity continues for at least few subsequent odd relative to the flanking even . The frequencies of the primes belonging to each cycle appear to converge to particular values. Whether there is some systematic way for accounting for this distribution remains an open question to us.
We can also look at the first prime in each cycle: The first prime to show cycle 1 is 7; the first to show cycle 2 is 3; then we get a dramatic jump with the first to show cycle 3 being 103 and so on. Thus we can define a sequence which is the first prime in each cycle: 7, 3, 103, 53, 11, 79, 211, 41, 73, 281, 353, 37, 2393, 449, 3061, 1889, 137, 2467, 16189, 641. Figure 5 shows a plot of this sequence.
This plot leave us with many unanswered questions: 1) Is there someway to decide a priori what will be the first prime to have a certain cycle? 2) Is there some pattern to the plot in Figure 5? 3) We observed that from onward there are several for which the cycle is not initiated within the range of the first 2262 primes we used in this plot. Are there are any for which a cycle is never initiated?
Finally, we could ask the question: what is the count of the primes of cycle , i.e., the counting function for cycle primes. This is shown for cycle 1 primes in Figure 6.
We observe that this count can be described by a fraction of the prime counting function or its asymptotic equivalent. In the above case, we use the logarithmic integral . The fraction is the convergent frequency of the cycle , which in the case of cycle 1 can be computed to be approximately 0.3824 using the first 2262 primes. This again brings us to the issue of whether one can obtain a closed form description for these frequencies.