Let face it: Current exome sequencing technologies will illuminate only a small fraction of the genetic load in seizure disorders. This post might not motivate you starting another large-scale gene hunt using exome sequencing. Also, it won’t cheer you up if you have promised your funding agency that new techniques will discover “a large fraction of the genes implicated in seizure disorders” or “explain a large proportions of the missing heritability” – phrases frequently used in modern grant proposals?
With the current study designs, present exome sequencing technologies will only help us understand a small fraction of the genetic load in seizure disorders. The writing is on the wall with the most recent exome sequencing studies in autism: Exome sequencing, this long-sought and widely praised technology does not explain more of the genetic contribution than copy-number variations. Naively, I would have expected a 10-100fold increase compared to CNV screening, but exome sequencing seems to fall behind its initial promises. Moreover, it becomes increasingly difficult to distinguish normal from pathological variation, given that the de novo rate in patients with autism is only twice as high as in unaffected siblings. In other words, you would expect at least 1 in 3 credible de novo mutations to occur by chance. If you sum up the known and expected results from massive parallel sequencing studies, more than 60% of cases with autism are without a genetic variant that explains the phenotype (Figure 1). Taking into account the winner’s curse usually seen in initial studies, we suddenly find ourselves in a position that had hoped we could abandon: the reliance on p-values and the constant call for larger sample sizes. Since many genes look like promising epilepsy genes, we can either look for candidate genes we already know (which doesn’t provide us with new concepts) or need to establish novel statistics – again. The prime time of Genome-Wide-Association Studies (GWAS) has long passed and we had hoped that we would find clear and unambiguous calls from exome sequencing. But with things as they are we can predict a revival and further development of the statistical concepts applied to massive parallel sequencing studies.
Why does the gene sequence at individual base pair level not provide us with definite information on the genetic causes of diseases? There are at least three answers to explain the apparent paradoxon.
Go for the genome. We might be looking in the wrong places. Much of the “genetic load” of neurodevelopmental disorders is not hidden in the coding part of the human genome (exome), but in non-coding regions of the human genome, such as promotors, regulatory regions and gene desert. For example, a risk factor for prostate cancer replicated in several studies lies with in a gene-empty region of the human genome. In addition, the exome is far from complete and some genes for neuropsychiatric disorders may not covered by present sequencing platforms. Hence, improvement of exome coverage and genome sequencing might up the fraction of cases-to-be-explained by these studies.
Causative variation is not rare but extremely rare. Causative variation might be very rare and hard to interpret and increase in numbers might help solve this problem. Recent studies have identified very rare microdeletions and microduplications using sample sizes exceeding 10,000 samples. Even though the overall burden of very rare variants is difficult to estimate, collectively, these variants might contribute significantly.
Neglecting complex genetics comes back full circle. The term complex genetic disorder is frequently used, but very few concepts have been developed to actually model how risk factors might interact to cause human disease. To date, there are only few studies, which have tried to apply novel concepts to actual datasets, and the complexity used by these studies has not advanced much beyond a “two-hit model”. Strategies dealing with dozens or hundreds of risk factors still lack a theoretical foundation.
Furthermore, the model studies in autism are misleading to a certain degree, as they assume that a large fraction of cases to be caused by de novo variants, a concept that has some theoretical foundation. This might not necessarily apply to the epilepsies, where many milder phenotypes such as IGE/GGE are observed with multiplex families that still defy a clear monogenic inheritance model. Therefore, estimates suggesting up to 400 autism genes might not apply to seizure disorders. Copy number variations have already given us a first taste of a novel kind of genetic variant, so-called rare variants, which occupy the grey zone between strong monogenic mutations and common variants. A recent study estimating the penetrance of these variants in the population arrives at surprising low estimates. Established risk factors such as the 15q13.3 microdeletion or the 16p13.11 microdeletion have a penetrance of not more than 10%, i.e. only 1 in 10 carriers of these variants in the population will be affected. If variants of this magnitude might be the main contributors to the genetic architecture, they will be notoriously hard to identify, also because exomic variation is likely to be much rarer than variation at genomic hotspots due to the peculiar architecture of the human genome.
Don’t give up, the truth is out there. Despite all the criticism, there is no reason for pessimism. The field of neurogenetics and epilepsy is constantly advancing and stumbling upon novel, unexpected conceptual problems through novel technologies is a recurrent experience. The exome studies in EuroEPINOMICS will be able to capture the low-hanging fruits, i.e. strong monogenic variants and I still stand by our initial estimate that the entire program is capable of finding more novel causative genes than have been discovered during the last decade. However, we should also keep our eyes open for novel concepts and frameworks to approach the complex interaction of genetic risk factors, possibly through second-line meta-analysis and open data sharing with other consortia. These studies, however, would require careful data curation both for genotypes and phenotypes. Accordingly, we are not only producing data for the fast analysis of the first novel epilepsy genes, but also create an important data resource for subsequent studies that might eventually prove more important than the initial gene findings. Epilepsy genetics will be improved by novel concepts and paradigms and sequencing technologies this decade but not as simply as we thought a year ago.