Once again, the flood of rare variants. Deep sequencing studies have revealed an unexpected plethora of rare variants, i.e. genetic variants that can only be found in few or even single individuals. While the genetic architecture of more common genetic variants, so-called Single Nucleotide Polymorphisms (SNPs) is well known through the HapMap project, the role of rare variants identified with recent sequencing studies is difficult to interpret. Basically, for an individual variant it is difficult to establish whether this variant is disease-causing or disease-related based on the frequency in cases. Establishing association at the same level of statistical significance as required for SNPs is difficult given that much larger samples are needed. Furthermore, protein prediction algorithms have their limitations and might not be able to discriminate an accidental from a causal variant, given that every individual might be homozygous or compound homozygous for gene-disrupting variants in at least three genes. We are drowning in a flood of rare variants and cannot distinguish pathological from benign variants very well yet.
Genetic variants tell a story about their evolutionary past. Evolutionary considerations can help to assess the role of rare variants on a global scale, i.e. for all variants in a given exome in general. Genetic variants in the protein-coding region can be silent by not changing the protein sequence, for example affecting the last base pair of a triplet resulting in a “silent” substitution. These variants are called synonymous. Consequently, non-synonymous variants result in an amino acid substitution. In order to assess the overall role of rare variants, the relative frequency of non-synonymous and synonymous variants can be compared. If the frequency of non-synonymous variants is smaller than the frequency of synonymous variants, this might be an indication for purifying selection at work, i.e. many of the variants acquired in these genes over time have been lost since they caused disease and thereby reduced the probability to be transmitted to the next generation. This comparison between synonymous and non-synonymous variants can be used to compare genes that affect different biological systems.
Low frequency of protein-changing variants in CNS genes. Freudenberg and colleagues compare the relative frequency of non-synonymous variants in genes involved in the nervous system, genes for the immune system and randomly sampled genes. The authors use a measure called relative density of non-synonymous variants (rdnsv), which basically assesses the frequency of non-synonymous variants compared to their neutral, synonymous counterparts. Interestingly, the authors find a low rdnsv for genes implicated in nervous system genes compared to other gene families or gene clusters implicated in other biological functions. This discrepancy appears to hold true over a broad range of variant frequencies found in published exome datasets. The authors suggest that brain-related processes are more susceptible to perturbation through damaging variants than other systems.
Epilepsy, CNS diseases are more likely to be monogenic. This relative lack of rare variants in genes for CNS genes, in turn, would suggest that CNS diseases may be caused by monogenic variants more frequently than diseases affecting other biological systems. The brain might simply be more susceptible to mutations in single genes, which affect the overall biological system compared to the immune system or other tissues. The authors find support for their thesis in the fact that OMIM disorders affecting the brain are almost 3 times more frequent than diseases affecting the immune system. On the contrary, SNPs in the NHGRI GWAS catalogue for immune disease are more than 3 times more frequent than SNPs for neurological disorders, suggesting that common variants explain less of the heritability for neurological disorders compared to immune disorders, leaving “more room” for rare and monogenic variants. This argument is reminiscent of the almost historical debate between the common-disease-common-variant and common-disease-multiple-rare-variant models for epilepsy. In fact, as of 2012, no single SNP has been shown to be associated with any form of epilepsy on a genome-wide level, suggesting that either larger samples sizes or different types of studies looking at rare rather than common variants are required in epilepsy research.
Implications for EuroEPINOMICS. The EuroEPINOMICS consortium is mainly relying on deep sequencing studies using exome sequencing and whole genome sequencing and has virtually given up on pursuing association studies for common variants. The theoretical considerations by Freudenberg and coworkers provide further fuel for this overall concept, suggesting that CNS disease might be more monogenic in nature than diseases in other biological systems. However, the precise proportion of epilepsies identified to be monogenic still needs to be established. Genetic epidemiological studies clearly suggest a complex genetic inheritance to most forms of epilepsy. This argument, however, does not rule out a significant fraction with monogenic inheritance, particularly for more severe, rare epilepsies. As an educated guess, I would suggest that 25-30% of severe epileptic encephalopathies are monogenic and a similar frequency might hold up for less severe epilepsies such as the Genetic Generalized Epilepsies (formerly Idiopathic Generalized Epilepsies, IGE).