The coronavirus that made its way to humans aided by the Cīna-s at Wuhan has now been with us for nearly an year. Right from the early days of this outbreak, one thing has been notable about this virus: some people got very ill from it, while others breezed through a relatively mild or supposedly “asymptomatic” infection (though we still do not know the long term consequences of the mild infection). This made the disease way more deadly than its cousin SARS as potentially infectious individuals with the mild form of the disease could wander about spreading it. As a result, at the time of writing, at least 1,085,000 people have died from it the world over, and anywhere between 40-300 million could have been infected by it. Some factors affecting the differential outcome were clear even when the virus was still only with Cīna-s: it affected older people and men more severely. In the early days of the pandemic, several other factors were also proposed to affect the outcome of the disease, like temperature extremes, humidity, prior vaccination with BCG. However, these, especially the environmental ones, have not been supported by the data coming from the explosive pandemic that followed. It was also clear that there were going to be genetic factors that influence the outcome. These are becoming clearer only now and are the topic of this note. This note is based on data from several recent studies that have tried to identify genetic risk factors in various populations. What we do here is to briefly look at the genes that have been identified and give some commentary on them and what can be inferred from them.
The first set of studies by Bastard (yes, that is the author’s name; not an easy one to bear in the English-speaking world) and Zhang et al took a directed approach to look at 13 genes in the Toll-like receptor-3 (TLR3)- type-I interferon system. Mutations in these genes have previously been implicated in severe influenza with involvement of the lower respiratory tract and other viral diseases. They found that potential loss-of-function variants in these genes were enriched in patients with a severe outcome of the Wuhan disease. In a related study they found, that an autoimmune condition with antibodies against the type-I interferons also correlated with a similar outcome as the potential loss-of-function mutations. This supported the idea that defects in the interferon-I (IFN) system are a predictor of disease outcome even in the case of the current coronavirus. This is rather interesting as the bats show distinct alternations to their IFN-I system relative to other mammals. First, black flying foxes have been shown to have a higher and potentially constitutive expression of IFN-I genes. Second, the Egyptian fruit bats show and expansion of the IFN-I genes, especially the subtype IFNW (interferon ). These observations, together with the fact that bats have a high level of tolerance to SARS-like CoVs (and other viruses) support the idea that the type-I IFN system is important in surviving not just SARS-CoV-2 but also other viruses.
As a simple caricature, the following pathway describes the role of products of the 13 genes in the IFN-I system in cells infected by a virus (say the respiratory epithelial cells) or specialized blood cells, which are part of the immune system, that sense the virus (plasmacytoid dendritic cells):
1. Recognition of the invading virus by the leucine-rich repeats of the TLR3 protein triggers a signaling response that additionally involves TRIF, UNC93B1, TRAF3, TBK1 and NEMO proteins which ultimately results in activating of a transcription factor IRF3 in the nucleus.
2. Consequently, IRF3 induces the transcription of IFN-Is, which is further amplified by a related transcription factor IRF7 which is induced by IRF3.
3. The secretion of INF-Is is followed by their binding of receptors on other cells like epithelial cells in the respiratory tract. The receptors are dimers of the two paralogous proteins IFNRA1 and IFNRA1.
4. The receptors activate the associated transcription factors STAT1 and STAT2, which then associates with another transcription factor IRF9 (a paralog of IRF3 and IRF7) to activate the interferon-stimulated genes that mediate the immune response to the virus.
This is the well-known INF-I immune response. Of these proteins, the TLR3 and TRIF/TICAM1 are proteins with TIR domains, which we had earlier shown to have very ancient roots in the immune response of bacteria against the viruses that infect them. UNC93B1 is a membrane protein involved in the trafficking of the TLR3 protein from the endoplasmic reticulum to endolysosome where it can intercept the endocytosed virus. TLR3 additionally has the receptor portion in the form of leucine-rich repeats that recognize the invasive virus. TRIF has an -helical tetratricopeptide repeats that keep its TIR domain inactive till TLR3 is activated. At that point, it associates with TLR3’s TIR domain. TRIF also has an RHIM motif, a short sequence that allows the protein to form homotypic oligomers which are important for the downstream signaling. Thus, it serves as a platform for initiating a signal with the cell in response to the sensing of the virus by TLR3. The signal is set off first by TRAF3 which is an E3 ubiquitin-ligase that is recruited to the platform formed by TRIF. It consequently conjugates Lysine-63 ubiquitins to its targets. This signal is transmitted further via the kinase TBK1, which associates with NEMO to form a signaling-kinase complex similar to the kinase complex that activates the inflammatory transcription factor NF
B by phosphorylating its inhibitor IKK. TBK1 in addition to its kinase domain has a Ubiquitin-like domain that we had discovered a while back. The presence of a ubiquitin-like domain in TBK1 allows it to associate with the ubiquitins conjugated by TRAF3. As a consequence of this interaction via its ubiquitin-like domain, it becomes functionally active to phosphorylate the DNA-binding transcription factor IRF3. This then dimerizes to activate the transcription of the interferon genes. This response to the virus can be triggered in different ways but this is the typical mechanism for the RNA viruses like influenza or DNA viruses like Herpes simplex virus. Thus, mutations in this system have previously shown to impair the response to influenza resulting in severe pneumonia or HSV resulting in encephalitis.
The second part of this response is signal transduced by the IFN-I via its receptor. This is via the famous JAK-STAT pathway that involves the kinases JAK which phosphorylate the STATs. These and their partner IRF9, all DNA-binding transcription factors, induce the IFN-I stimulated genes, many of which are the “sword-arm” of the antiviral defense. Thus, mutations in the two IFNAR genes, IRF9 and STATs also result in negative outcomes from viral infections and adverse reactions to live measles and Yellow fever vaccines. However, interestingly, a mutation in the IFNAR1 gene resulting in an impaired receptor that binds the type-I IFN, IFNB, weakly results in greater resistance to tuberculosis. This is rather striking as, unlike with the viral diseases, it selects in the opposite direction for the strength of IFN-I signaling. The complexity of this situation even with SARS-CoV-2 is suggested by reports that the localized hyper-expression of type-I and III IFNs in the lung results in a more severe disease poor lung-repair. However, in contrast, reduced IFN-I production by peripheral blood immunocytes is associated with a severe form of the disease. Thus, over the IFN-I is important for the defense against SARS-CoV-2 but the location of over-expression seems to matter.
A notable point is that while both the life-threatening and benign forms of the disease are fairly uniformly distributed across populations with diverse ancestries, these IFN-I related loss-of-function variants reported by the authors are primarily found in Europeans, with some presence in diverse Asian populations (Figure 1). While the numbers are small, it is still significant that they did not get any of these variants in Africans. This is striking given that, another study found that in the USA infection and death rates are 2 to 3 times higher in people of African ancestry than their proportion of the population. This, suggests that in Africa there has possibly been selection against these variants due to pressure from other viruses which are prevalent there. Indeed, the related coronavirus MERS might have had its ultimate origins in Africa even suggesting direct events of selection by coronaviruses in the past. However, notably, the researchers found that African ancestry people in the US have significantly higher expression in the nasal epithelium of the transmembrane serine protease 2 (TMPRSS2) which along with the other protease ACE2 is a receptor used by SARS-CoV-2 to invade target cells.
Also related to the above complex of 13 genes, was a small study by van der Made et al based on exome sequencing that identified rare loss-of-function mutations in TLR7 in 4 young men with severe disease. This resulted in defective type-I and type-II interferon production. While a small study, it is notable that it recovered these mutations in TLR7. This gene is in a cluster with its paralog TLR8 on the X-chromosome; hence, males have only one copy. Importantly, both of them, like TLR3 are sensors the detect viruses which enter cells via endocytosis. It specifically senses single-stranded RNA fragments that are enriched in guanine and uracil in the endosome of plasmacytoid dendritic cells and B cells, raising the possibility that impairment of these virus-specific TLRs might be part of the increased susceptibility to SARS-CoV-2 of males.
Figure 1. The mapping of different forms of the disease on to the 1000 genomes populations modified from Zhang and Bastard et al. LOF are the loss-of-function variants they identified.
The next study by Zeberg and Pääbo discovered a genomic segment of kb that confers an elevated risk of severe disease which is inherited from Neanderthals. This region on chromosome 3 kept coming up repeatedly in multiple investigations for genetic determinants of disease severity. This core region of 49.4 Kb and the larger surrounding region of ~333.8 Kb shows strong linkage disequilibrium and appears to have introgressed from a Neanderthal ~60-40 Kya. This region is rather interesting because it encodes 5 chemokine receptor genes, namely XCR1, CXCR6, CCR9, CCR1 and CCR3. These are all receptors for the signaling proteins known as chemokines, which transmit various immune signals such as in the recruitment of effector immunocytes to the site of inflammation (e.g. various lineages of cytotoxic cells and antibody-producing B-cells) or in directing T-cells to guard different parts of the lungs. Gene-knockouts pf CCR1 suggest that it plays a role in protecting against inflammation and increases susceptibility to fatal infection of the central nervous system by the coronavirus MHV1 in mice. Reducing signaling via this receptor has also been shown to increase susceptibility to the herpes simplex virus type 2. Some chemokine receptors are used by viruses and other pathogens to enter the vertebrate cells. For example, CCR3 and CXCR6 from this locus code for the co-receptor for the AIDS virus HIV-1 and/or SIV. The human herpesvirus 8 encodes its own chemokine vMIP-II, which targets the protein XCR1 encoded by this locus and blocks signaling via it. Thus, the chemokine receptors are a central part of the immune response of jawed vertebrates and under strong selection from the host-pathogen arms race.
What is most striking about this region is that it is elevated in frequency in the Indian subcontinent (~50%; It is found in ~16% of Europeans) while absent or rare in East Asia. Indeed, after the mating with Neanderthals, the introgressed regions from them have been routinely purged off the genome of Homo sapiens suggesting a degree of incompatibility with the sapiens alleles. This is consistent with the association of Neanderthal alleles with certain immune dysfunctions. However, this region has followed the converse pattern. If it has been retained after coming from a Neanderthal ancestor and elevated in frequencies it must be due to selection for it in the subcontinent likely due to some relatively recent or extant pathogen. The region has been previously noted as being under selection in East Bengal. This raises the possibility that it could have conferred an advantage to diseases such as cholera. However, it is rather notable that despite gene flow between and geographic proximity it is so rare in East Asia. We and others have long held that several extant CoV diseases (today relatively mild) have originated in East Asia, likely China, potentially as a side effect of their culinary habits. This would imply that there was strong selection from these CoVs against this Neanderthal-derived variant in East Asia when those CoVs were still severe, even as it was selected for in India by other pathogens. Thus, it is a classic evolutionary phenomenon of bidirectional selection in action. Such selection events often leave their mark in immune molecules driving them in different directions. The Duffy Chemokine receptor by which the Plasmodium vivax and P. knowlesi malarial parasites enter cells is likely to be another such. Loss or reduced expression of the Duffy receptor favors resistance to vivax malaria. But the protein is retained widely in humans suggesting some immune function.
Figure 2. Distribution of Neanderthal variant across populations from Zeberg and Pääbo .
Finally, another set of genome-wide association studies by Ellinghaus et al and Roberts et al identified multiple single nucleotide polymorphisms (SNPs) associated with a severe form of the disease. One of these in chromosome 3 corresponds to the same region as identified by the above study as coming from the Neanderthals. Another SNP was identified on chromosome 9 which is in the vicinity of the ABO gene that determines the ABO blood type. The ABO blood group is determined by the oligosaccharide synthesized by 4 glycosyltransferases: the two closely linked paralogs FUT1 and FUT2 make the base oligosaccharide by adding a fucose. This is modified further by the products of the ABO gene, the A-variant glycosyltransferase which adds an 1-3-N-acetylgalactosamine and the B-variant glycosyltransferase which adds a 1-3-galactose. This oligosaccharide is the conjugated to lipid head-groups and proteins (as on the RBC surface) to give rise to the A/B/AB antigen. If this gene is dysfunctional, it results in O where neither sugar is added. These sugars are believed to play a role in cell-cell adhesion. The polymorphism in ABO across humans suggests that it has been under some kind of immune selection. Indeed, there have been studies claiming an association of this gene with susceptibility to various bacterial and viral infections (noroviruses and rotaviruses). Interestingly, a knockdown of the ABO gene has been reported to inhibit HIV-1 replication in HeLa P4/R5 cells. This could be because of multiple reasons: 1. pathogens specifically binding cells with glycoproteins decorated by particular versions of the sugar. 2. Viruses themselves possess various glycoproteins against which antibodies develop. These could cross-react with the host glycoproteins exerting selection via autoimmunity. Alternatively, the absence of a certain modification on the host protein could help the host to develop better neutralizing antibodies against certain viral glycoproteins. It has been suggested that the influenza viral glycoproteins and ABO locus might be in some such evolutionary interaction. 3. Immunocytes localize to specific parts of the body by recognizing the sugars on surface proteins and lipids. These might play a role in response to pathogens. Indeed, other than the ABO (H included) blood group, other blood group systems are also based on polymorphisms of glycosyltransferases (PIPK, Lewis, I, Globoside, FORS, Sid) or extracellular ADP-ribosyltransferases (Dombrock) suggesting that such evolutionary entanglements between pathogens and cell-surface modifications might be more widespread. However, the role of ABO in susceptibility to SARS-CoV-2, even if plausible, remains unclear.
The Roberts et al study also identified a SNP on chromosome 22 possibly associated with the -immunoglobulin locus that codes for the antibody light chains. This is again consistent with a defect in antibody production by B-cells. Another SNP identified by them lies on chromosome 1 in the vicinity of the IVNS1ABP gene. The SWT1 gene also lies some distance away from the former gene. Interestingly, IVNS1ABP has been shown to interact with the influenza virus NS1 protein. This NS1-IVNS1ABP complex targets the mRNA of another influenza gene M1 to nuclear speckles enriched in splicing factors for alternative splicing. The result is an alternatively spliced mRNA M2 that codes for a proton channel needed for acidification and release of viral ribonucleoproteins in the endosome during invasion. Interestingly, IVNS1ABP belongs to large class POZ domain proteins with central HEAT and C-terminal Kelch repeats that also function as cullin-E3 ubiquitin ligases, several of which have antiviral roles. Hence, it showing up in the context of SARS-CoV-2 is rather interesting as it raises multiple possibilities: 1. Is it involved in the trafficking of viral mRNA as in influenza? 2. Is it an intracellular antiviral factor that recruits an E3 ligase complex for tagging viral proteins for destructions?
It is also possible that this SNP affects the nearby SWT1 gene. Sometime back we had shown that this protein contains 2 endoRNase domains. It prevents the cytoplasmic leakage of defective unspliced mRNAs by cleaving such RNAs at the nuclear pore. It is hence possible that this protein also interacts with viral RNA in someway. In either case, it is notable that this SNP is associated with disease severity only in males and not females. The cause for this again remains a mystery. Finally, this screen recovered a SNP on chromosome 1 close to the SRRM1 whose product is also involved in pre-mRNA splicing. This again raises the possibility of interaction with viral RNA.
In conclusion, the risk factors have pointed in many different directions, some relatively well understood from susceptibility to other viruses and yet others which remain murky. Evidently, there will be more rare ones which remain to be uncovered. However, even with the current examples, there are hints of bidirectional selection at multiple loci suggesting that sweeps of dominant pathogens have optimized our immune systems in different directions. The victory against one could leave one susceptible to another.
Further reading:
https://jamanetwork.com/journals/jama/fullarticle/2770682
https://www.medrxiv.org/content/10.1101/2020.10.06.20205864v1.full.pdf
https://www.nature.com/articles/s41586-020-2818-3
https://science.sciencemag.org/content/early/2020/09/29/science.abd4570
https://science.sciencemag.org/content/early/2020/09/23/science.abd4585