Zipf’s law

Some recent discussions on scripts and languages reminded me of Zipf’s law. It is a version of the power law that states that the probability of encountering the nth most frequent word in a language is: p(n)=n^(-s) (where s is little more than 1). Thus the probability of find the nth most frequent word is approximately proportional to 1/n. For command-line users it is of interest to note that the Zipf’s law distributions are also seen in the probability with which the unix command ranked nth in usage is used. To the best of my knowledge nobody has performed a frequency analysis of the Vedic texts and studied the patterns of word and phrase occurrence in them. I am a little lazy to write the programs to perform this analysis, but it would definitely be of some interest for comparative studies.

It is interesting to note that in this respect we have already studied the biological parallels. Some time ago I had discovered that the most frequent protein families in a genome emerging through lineage specific gene expansion follow the power-law. In this light it would be interesting to note if there are any lineage specific expansions of words in the R^ig veda saMhitA with respect to the atharva veda saMhitA, and to note if there are any “regime changes”. As a parallel we may note that the C6 alpha-helical zinc fingers are the rulers of the world in the Fungi; whereas the nuclear hormone receptor-type zinc fingers rule the world in the nematode worms.

The parallels between genomes, texts and mythologies run deep.

This entry was posted in Scientific ramblings. Bookmark the permalink.