A recent paper by Metspalu et al in AHJG adds additional data to the growing material on the genetics of the Indians. The paper has several issues that are rather unsatisfactory – chief among them is the attempt to meaninglessly hand wave on OIT and AIT. The AIT is sitting right there in their data, yet they attempt to obfuscate the issue in somewhat amateurish ways [Appendix 1]. But that is not something we wish to discuss today because there is new work that might be published relatively soon that will smash the OIT for good. However, their paper does generate some new and useful data on positive selection and it is possible relationship to the insulin resistance related conditions in India. The spectrum of conditions that include hyperinsulinemia accompanied by resistance of tissues to insulin, impaired glucose tolerance, type-2 diabetes, hypertension, obesity, atherosclerosis and coronary artery blockage define the well-known syndrome that afflicts a large number of Hindus, especially those from the southern part of the sub-continent. The data from this recent paper might be used as a starting point to explore some of these issues in greater detail. The authors recover several genes that are apparently under positive selection in the subcontinent and superficially analyze their significance. Of note, they point to the possibility that the selection of certain alleles at several loci in the Indian population might have a relationship with type-2 diabetes and obesity. In this regard they conclude that: “In this context, it is tempting to hypothesize that past natural selection might have influenced genetic variation at these loci to increase infant survival, a change that became disadvantageous after changes in diet and lifestyle.” However, they do not attempt to actually investigate the biological implications of their results beyond this and simply state that: “Therefore, the loci we identify could be theoretically considered responsible for some of the present type-2 diabetes epidemic in India, making them worthy candidates for further functional examination. However, because relevant life-history traits, lipid metabolism and type 2 diabetes are all complex traits and the effect of natural selection would be expected to be fragmented across multiple genes, it would be naive to expect that a relationship between past selective processes and present-day disease would be mechanistically simple and explainable by variation at a handful of genetic loci.” Thus, they pass up the opportunity to do some interesting investigation regarding the biochemical properties of the genes they uncovered. While it is true that the actual phenotypic effects of the alleles that are positively selected in the sub-continent have not be tested, it is still possible to evaluate the genes in light of the various evolutionary hypothesis for the emergence of conditions such as the insulin resistance syndrome (IRS).
One of the first evolutionary hypothesis presented for the emergence of IRS has been the feast-famine or thriftiness hypothesis. This basic premise of this was that in premodern societies (at some point, perhaps going all the way back to the paleolithic phase) humans faced alternative periods of abundance and scarcity of food, and lead a physically active life due to the need to forage actively for subsistence. In this kind of environment individuals who were able to store fat were at advantage in tiding periods of “famine”. Hence, there was a selective advantage to individuals who rapidly took up nutrients during a “feast” phase, converted it to fat and hoarded it in the adipose tissue. This advantage in the feast-famine environment turned into disease in the modern environment with a constant supply of food rich in fat, coupled with a sedentary lifestyle that did not impose any requirement to utilize the stored fats. This hypothesis would imply that certain genetic variants which predisposed individuals to IRS were positively selected in the feast-famine period. As our acquaintance from college days had pointed out, in practice there are several problems with this hypothesis – firstly, there is no correlation between geographical regions where food in limiting and IRS. Secondly, there is no evidence that the arrow of causality proceeds from IRS to obesity rather than the other way around. Importantly, in tropical India where IRS is most prevalent, the food supply was relatively constant through the year, unlike in populations such as the Eskimoes, Mongols and Northern Europeans who have relatively lower incidence of IRS despite much greater constraints in terms of circum-annual food availability.
A related hypothesis has been based on the observation that there is significant association between low birth weight and IRS risk worldwide across several ethnic groups. This hypothesis holds that scarcity of nutrients during intrauterine development (which leads to low birth weight) sends a predictive signal for lack future absence of nutrients and tips the physiology in the direction of hoarding fat. In premodern situations where this predictive signal was indeed likely to be true this physiological shift resulting in fat hoarding is likely to have favored survival. However, today it results in disease as the intrauterine lack of nutrients is unlikely to mean absence of nutrients in the future. While, this hypothesis appears reasonably consistent at face value, it presents an internal contradiction when extended to the modern situation – how is it that correlation between intrauterine nutrient deprivation is not correlated with later starvation only in the modern world? The only way out is to postulate that in the modern scenario the low birth weight individuals are more likely to be provided with greater amounts of nutrients in modern settings than in premodern settings, where low birth weight was an honest predictor of scarcity.
A third set of hypotheses work from the view point of obesity. They start from the observation that monogenic defects resulting in obesity predominantly affect hypothalamic pathways resulting in impaired regulation of satiety and food intake. Thus, hunger, satiety and food intake, rather than metabolic rate or nutrient partitioning is key to the emergence of IRS. Accordingly, it is posited that rather than being a primarily metabolic disease IRS is a metabolic manifestation of a neuro-behavioural disease. It is in this regard that our acquaintance from college presented the so called soldier vs diplomat hypothesis: He posited that insulin resistance is a socio-ecological adaptation that is related to two phenotypic transitions, namely 1) a transition from a r-selected to a K-selected strategy and 2) a transition from a more muscle-dependent to a more brain-dependent lifestyle (soldier to diplomat transition). Like the other hypotheses he proposed that this adaptation developed in premodern societies but became pathological in modern societies with greater availability of calorie rich food. He supported the r to K transition by suggesting that insulin resistance in the mother results in decreased fertility but heavier neonates. He also interpreted the mutations in the insulin/IGF signaling pathway resulting in increased longevity as a possible move from a r to a K strategy. Finally, in support of the more brain-dependent strategy he suggested that the increased glucose levels due insulin resistance might supply more glucose for the brain to use. Even if this idea remains poorly supported, he points to the fact that insulin resistance is characterized by hyperinsulinemia and that insulin receptors are widely distributed in specific brain areas such as the olfactory bulb, pyriform cortex, amygdaloid nucleus, hippocampus, hypothalamic nucleus and cerebellar cortex. Thus, the increase in blood insulin might act via these brain receptors to positively affect spatial and verbal memory.
It is in the context of these hypotheses that the data from Metspalu et al become considerably interesting. First, let us consider Myostatin/GDF8 (MSTN) which might provide an explanation for a variety of factors, including perhaps why Indians need to be philosophical about their performance in athletic events such as the Olympics. MSTN is a member of the TGF-ß family of cystine knot cytokines. Its evolutionary history is a long one – being traceable to the basal animals such the cnidarians (NEMVEDRAFT_v1g196852; GI:156407906). It is preserved in most animal lineages studied to date and shows a strong muscle-specific expression pattern at least in vertebrates, arthropods and molluscs – probably this holds across all animals with muscles. It negatively regulates muscle development by signaling via the ActRIIB receptor. Its role in muscle development came to light first in studies in rodents where deletion of MSTN resulted in considerable increase in body size with muscles weighing 2-3 times more than the wild-type animals. Subsequently, it was seen that loss of function mutations in MSTN was behind the “doubled-muscled” phenotypes of European cattle bred for meat. Its equivalent role in humans was underscored by the dramatic case of a German boy who showed extraordinary muscle development at birth and by the age of 4.5 years was showing precocious strength and ability to lift large weights. Several members of his family were noted for their great physical strength and the boy’s maternal grandfather was known to lift heavy stones. Analysis of the boy’s myostatin gene revealed that he was homozygous for a mutation at splice donor site in intron 1, which resulted in a mispliced MSTN mRNA producing a truncated dysfunctional protein. Subsequently, the role of the myostatin mutation K153R has been shown to be associated with decreased jumping performance in young males. MSTN has also been shown to be under strong positive selection in humans, with certain coding region SNPs affecting highly conserved regions being present in greater than 14% frequency only in sub-Saharan Africans. One wonders if at least some of these SNPs might be related to selection for increased explosive muscular activity needed for sprinting and short distance speed (something which sub-Saharan Africans excel in). In addition to regulation of muscle development myostatin, also appears to favor increased adiposity and insulin resistance. MSTN knockouts show reduced fat accumulation and also decreased insulin resistance. Hence, when MSTN is knocked out in agouti lethal yellow and leptin knockout backgrounds there is a partial suppression of the pro-obesity and insulin resistance effects. Thus, MSTN is an intrinsic muscle-inhibiting, fat- and blood sugar-enhancing factor. The positive selection at this locus in the Indian sub-continent should be seen in the following light: Indian neonates weigh on average 700 g less than European neonates and difference arises primarily from a reduced muscle to fat mass ratio. The Indian neonates display relatively greater insulin resistance and this effect is enhanced, with the average age at which an Indian is diagnosed of IRS is significantly lower than in Europeans (~10 years earlier). The fastest Indian 100m runners are often slower than club level runners in Europe. Indians have rarely produced full pace (as opposed to medium pace) bowlers matching the speed of either European or African origin players. Even if they do they are much more prone to injury and slow recovery from it. Together these issues suggest that in India MSTN has been selected for being more rather than less active – consequently resulting in shortfalls in muscle development/repair and increased adiposity and insulin resistance.
Why would there be selection for poorer muscle development, IRS etc when the converse seems to enhance fitness in many ways. In a sense this is an extension to question as to why the MSTN pathway emerged in the first place? Though under laboratory conditions MSTN deletion has an apparently fitness-enhancing phenotype, MSTN appears to have been preserved throughout animal evolution and is strongly constrained in its conservation across vertebrates. This suggests that despite the deletion phenotypes, its loss or divergence is not much tolerated under natural conditions. Why might this be so? We suspect that it might be explained by considering muscles to be “parasitic”. protein hogs. So if their development is not under a negative regulatory pathway they compete with other organ systems for the availability of protein. If there is saturating protein availability in food, then this might not result any noticeable problem; however, when protein supplies are limiting there could be deleterious effects on other organs. This idea is compatible with the soldier-diplomat hypothesis because it favors brain development to proceed unaffected even under protein-limiting conditions. However, we suspect that this was not the original reason for the postulated selection for the strengthening of the MSTN effect in India. The invading Indo-Aryans are unlikely to have faced a protein limitation due to their focus on dairy farming. On the other hand the proto-Indians (the so called “ASI” of Reich et al) are likely to have faced protein deficits, especially among the peninsular tribal cultures. After all, even until recently the statistics suggested that more than half the non-urban Indians face a dietary deficit in protein availability. This, might have resulted in selection for an over-efficient MSTN signaling, which might have allowed completely functional organ development under low-protein conditions. Further, myostatin deficiency makes certain muscles susceptible to contraction-induced injury and have reduced force-generating capacity – this might be a major disadvantage in tribal life, where stamina in certain muscular operations might have been critical. This was possibly another reason for emphasizing the MSTN system in certain Indian settings. In this regard, there might certain specific stamina-related muscular activities, where even today the Indian MSTN alleles might provide an edge. Of course, it should be stressed that these are speculations that cannot be taken for granted unless further testing shows that the MSTN alleles selected for in India indeed acts in the above-suggested direction. Nevertheless, it is clear that the MSTN selection might have been very critical for life in ancient tribal India though disadvantageous today – better understanding of this might give us some key clues with respect to resource allocation to the muscles vs rest of the body. Here, one may also consider two paralogous genes emerging in the Metspalu et al survey – POPDC3 and BVES, which encode TM proteins with a C-terminal cyclic nucleotide binding domain. POPDC3 is expressed in both the striated muscle tissue and in certain brain regions. It is a a cyclic nucleotide-dependent regulatory protein, which might play a role in muscle development. Testing the interaction, if any, between variants at this locus with the MSTN variants might throw light on the functional implications of possible selection for altered muscle development among Indians. Another neglected aspect that might be of interest is the relatively high expression of MSTN in cranial and cervical ganglia – does it have an additional neural function that is under selection?
The next gene that was found to be under positive selection in the sub-continent was DOK5, which encodes a receptor tyrosine kinase substrate with PH and PTB domains. It has been shown to function downstream of the receptors for both insulin and the cystine knot nerve growth factor GDNF. It is strongly expressed in the amygdala, dorsal root ganglia and cranial ganglia and lies within a signaling cascade that promotes neurite outgrowth. Outside the brain it shows prominent expression in T lymphocytes in the blood. The GGC haplotype with SNPs rs6068916, rs6064099 and rs873079 in DOK5 was previously shown to be associated with obesity and type-2 diabetes in a northern Indian sample. On the other hand, another study showed that increased activation of neurons in the amygdala when exposed to hostile faces was correlated with the C haplotype SNP rs2023454 in DOK5. These observations are of considerable interest because DOK5 variants have the potential to connect three major components of the soldier-diplomat hypothesis: 1) Obesity/IRS; 2) the hostility perception and 3) regulation of the immune system via expression in T-cells – given that the hypothesis postulates that the immune system is directed from “outward” to “inward” focus during the transition from a soldier to a diplomat state. One rank speculation is that the positive selection in the sub-continent at the DOK5 locus might have facilitated or accompanied the emergence of agricultural civilization in northern India currently first attested at the Mehrgarh site. In this case it is seen as reducing hostility perception, while at the same time facilitating a more “diplomat” kind of metabolism in the context of the emergence of the agricultural civilization. If this speculation hold out to further tests, it might constitute a potential genetic base for the soldier-diplomat hypothesis of our acquaintance.
Metspalu et al also point to the key circadian regulator gene CLOCK as being under positive selection in India. Knockouts of the CLOCK and its paralogous partner BMAL1 result in diabetes mellitus and reduced glucose tolerance. Based on this the authors speculate that it might have some role in increased susceptibility for type-2 diabetes in the sub-continent. However, there are several issues with this speculation. The diabetes in the CLOCK knockouts results in a clear hypoinsulinemia from reduced insulin secretion and defective islets of Langerhans; however, the IRS in India is characterized by high blood insulin levels accompanied by loss of responsiveness to insulin in several glucose-utilizing tissues. CLOCK is also under positive selection across many human populations and is under comparable strong selection in Western Eurasia as India. So this is unlikely to be a India specific phenomenon and might have begun earlier. Given its pleiotropic effects on various other aspects of circadian physiology, it is not all clear as to whether variation in this gene was selected for in the context of pancreatic physiology or something else. Hence, we suspect this might not necessarily be a key player in Indian type-2 diabetes susceptibility.
GPHB5/OGH is another gene which emerges in the Metspalu et al survey for positive selection in the sub-continent; it is ignored by the authors, but might have a more important role. It encodes a cystine knot hormone paralogous to the beta subunits of thyrotropin, follitropin, lutropin and gonadotropin and signals via the thyrotropin receptor. It has been observed to be expressed in the pituitary, hypothalamus and probably the skin. Overexpression of GPHB5 has been shown to result in resistance to diet-induced obesity, and reduction in blood glucose, insulin, cholesterol, and triglycerides. Thus, the GPHB5 variants among Indians is a potential candidate for the connection between obesity and IRS, especially in response to extrinsic behavioral and immunological cues routed via the brain. In this context it is of interest to note that acute inflammation resulted in increased GPHB5 expression in the brain and pituitary and it appears to be responsible for altered thyroid hormone metabolism during illness (an example of it responding to extrinsic cues). Thus, GPHB5 could be another candidate for genetic basis of the soldier-diplomat transition.
Appendix 1: Since some people asked me about AIT/OIT, at the risk of repeating what we have said before on these pages let us go through the details again. In figure two of their paper the authors present an analysis with ADMIXTURE with 8 and 12 clusters (Figure 2). In the K8 analysis we have k5, which is shared by Indians, Iranians, many Indo-European speaking Europeans, Central Asians, West Asians, and Caucasians. It is found at similar levels in Northern and Southern brAhmaNa-s and northern kShatriya-s. It is absent in Africans, importantly the Sardinians (representatives of the Pre-Indoeuropean inhabitants of Europe (aka proto-Europeans), East Asians and Papuans. In India it is largely absent or low in the Austro-Asiatic Munda, eastern tribes like Khasis, Nihalis, Nagas and Garos and also in southern tribes like Paniyans, Pulliyars and Malayans. This corresponds in large part to the “ANI” component of Reich et al’s work; we may call it the Arya component. The other major component of interest among Indians is the k6@K8, which is found in majority of populations from the Indian sub-continent. This component is found at highest levels in Pulliyars, Paniyans, Malayans, Gonds and also the Munda tribesmen, in whom there is an additional east Asian admixture. This is in large part the “ASI” of Reich et al’s analysis; we may term it the niShAda component. This component declines both eastwards and westwards from the heart of India. Outside India it is only present at low levels in Iranians, Central Asians, Burmese and Cambodians. So in large part the Indians can be genetically reconstructed as differential admixtures of the k6@K8 and k5@K8 components – comparable to what was done by Reich et al.
Now given that k5@K8 is shared with Europeans, West Asians and Caucasians in principle both OIT and AIT are potential explanation. If OIT were the true explanation then we should see some proportion of k6@K8 being distributed to all the recipient populations of the OIT migration because all sub-continental populations share k6@K8. But this is not the case – only the populations in the immediate vicinity of India have k6@K8, consistent with Indian genetic material local spreading to the immediate periphery of the sub-continent. So if OIT happened it was confined to barely the immediate surroundings of the sub-continent, which is inconsistent with the spread of Indo-Aryan and Iranian, leave alone Indo-European. On the other hand k5@K8 is consistent with the spread of people from the general vicinity of the Caucasus to Europe, West Asia and the Indian sub-continent; several of the recipient populations also appear to share languages of the Indo-European family. This is basically AIT. It should be noted that k3@K8 and k4@K8 represent respectively early European and early West Asian populations that in Europe and West Asia mixed with the k5@K8 component. In India k5@K8 mainly mixed with the ancient Indian population represented by k6@K8. The authors use a simulation to claim that this admixture happened before 12,500 years. This result in our opinion is flawed because of certain early West Asian variants seeping into India and inflating the age of admixture. The k5@K8 signal in large part comes from the “ANI” component of Reich et al. Using that data one might conservatively estimate the age of admixture as being around 200 gens, which is consistent with an Indo-European influx into India bringing much of the ANI genetic signatures.
With respect to the linguistic landscape of India we might make the following comments based on the genetic data: The Austro-Asiatic Munda languages were not endogenous to ancient India. Instead, Austro-Asiatic speakers intruded from southeast Asia and penetrated deep into India, while mixing with local proto-India tribal populations. From this admixture emerged the Munda tribes who adopted the intrusive AA language, while retaining several words from the proto-India language. Given the dominance of the O2a Y-chromosome haplotype among the Munda, shared with southeast Asian AA speakers, this is another example of the father-tongue becoming mother tongue, whenever there is an male driven intrusion (e.g. Spanish or Portuguese decimating local languages in the colonial period). So there was no question of the Indus civilization being Munda or even “Para-Munda” for that matter. What we observe is that the common “inexplicable” vocabulary shared by IA, AA and Dravidian in the sub-continent is dominated by the kA, kI, ku class of words. Hence, these words belonged to the language X family proposed by Masica and not to Dr or AA. It is now becoming increasingly clear that this language X family was likely to have been a major language family of the proto-Indians, i.e. those who were the carriers of the original k6@K8 component. The footprints of the language X family indeed mirror the distribution of k6@K8 in cutting across the other linguistic families of the sub-continent though the family itself is extinct today. This leaves us with the question of Dravidian – was there a Dravidian invasion or did it emerge in situ? The uncertainty in the ultimate origin of Dravidian stems from how one treats the Brahui problem. If Brahui is considered a late secondary dispersal, then it is likely that Dravidian originated in peninsular India and spread upwards. Even if Brahui suggests an initial western origin, we suspect that there was a major episode of Dravidian expansion from peninsular India. On the whole we are currently tilted towards the hypothesis that both the language X and Dravidian families were representatives of the original linguistic diversification in the subcontinent that later expanded greatly to occupy large territories. But it appears that language X had a more north-western “center of gravity” relative to Dravidian, which was more south-eastern in its center of gravity.
Disclaimer: While we disagree with the authors’ conclusions on this matter we have nothing personal against them. We describe their attempts as amateurish because they reduce the Indo-European hypothesis to being a mere speculation of Max Mueller; to the contrary it is a rather robust hypothesis that has currently withstood all objective tests. It is also abundantly clear based on this hypothesis that it was intrusive into India rather than moving out of it. The only question is whether the ancient spread of the Indo-European languages to India was accompanied by a corresponding spread of genetic material. Unfortunately, the authors hardly present this issue clearly.
Shared and unique components of human population structure and genome-wide signals of positive selection in South Asia.
Metspalu M, Romero IG, Yunusbayev B, Chaubey G, Mallick CB, Hudjashov G, Nelis M, Mägi R, Metspalu E, Remm M, Pitchappan R, Singh L, Thangaraj K, Villems R, Kivisild T.
Am J Hum Genet. 2011 Dec 9;89(6):731-44.
Evolutionary origins of insulin resistance: a behavioral switch hypothesis.
Watve MG, Yajnik CS.
BMC Evol Biol. 2007 Apr 17;7:61.
Genetics of obesity.
O’Rahilly S, Farooqi IS.
Philos Trans R Soc Lond B Biol Sci. 2006 Jul 29;361(1471):1095-105. Review.
Human adaptive evolution at Myostatin (GDF8), a regulator of muscle growth.
Saunders MA, Good JM, Lawrence EC, Ferrell RE, Li WH, Nachman MW.
Am J Hum Genet. 2006 Dec;79(6):1089-97. Epub 2006 Oct 10.
Exercise-training attenuates the hyper-muscular phenotype and restores skeletal muscle function in the myostatin null mouse.
Matsakas A, Macharia R, Otto A, Elashry M, Mouisel E, Romanello V, Sartori R, Amthor H, Sandri M, Narkar V, Patel K.
Exp Physiol. 2011 Nov 4. [Epub ahead of print]
Suppression of body fat accumulation in myostatin-deficient mice.
McPherron AC, Lee SJ.
J Clin Invest. 2002 Mar;109(5):595-601.
Myostatin mutation associated with gross muscle hypertrophy in a child.
Schuelke M, Wagner KR, Stolz LE, Hübner C, Riebel T, Kömen W, Braun T, Tobin JF, Lee SJ.
N Engl J Med. 2004 Jun 24;350(26):2682-8. No abstract available.
A genome-wide association study of amygdala activation in youths with and without bipolar disorder.
Liu X, Akula N, Skup M, Brotman MA, Leibenluft E, McMahon FJ.
J Am Acad Child Adolesc Psychiatry. 2010 Jan;49(1):33-41.
Evaluation of DOK5 as a susceptibility gene for type 2 diabetes and obesity in North Indian population.
Tabassum R, Mahajan A, Chauhan G, Dwivedi OP, Ghosh S, Tandon N, Bharadwaj D.
BMC Med Genet. 2010 Feb 27;11:35.
Acute inflammation increases pituitary and hypothalamic glycoprotein hormone subunit B5 mRNA expression in association with decreased thyrotrophin receptor mRNA expression in mice.
van Zeijl CJ, Surovtseva OV, Wiersinga WM, Fliers E, Boelen A.
J Neuroendocrinol. 2011 Apr;23(4):310-9. doi: 10.1111/j.1365-2826.2011.02116.x.
Resistance to diet-induced obesity in mice globally overexpressing OGH/GPB5.
Macdonald LE, Wortley KE, Gowen LC, Anderson KD, Murray JD, Poueymirou WT, Simmons MV, Barber D, Valenzuela DM, Economides AN, Wiegand SJ, Yancopoulos GD, Sleeman MW, Murphy AJ.
Proc Natl Acad Sci U S A. 2005 Feb 15;102(7):2496-501. Epub 2005 Feb 7.
Disruption of the clock components CLOCK and BMAL1 leads to hypoinsulinaemia and diabetes.
Marcheva B, Ramsey KM, Buhr ED, Kobayashi Y, Su H, Ko CH, Ivanova G, Omura C, Mo S, Vitaterna MH, Lopez JP, Philipson LH, Bradfield CA, Crosby SD, JeBailey L, Wang X, Takahashi JS, Bass J.
Nature. 2010 Jul 29;466(7306):627-31.