DDD. On January 25, the most recent publication of the Deciphering Developmental Disorders (DDD) study appeared online in Nature. This unprecedented study analyzed the data of 4,293 patient-parent trios with existing data from 3,287 published trios to identify de novo mutations in neurodevelopmental disorders. A study of this size has many aspects that are difficult to fully cover within the limited space of a journal article. Browsing through the data is interesting and will be the foundation for many studies utilizing this data in the near future. Within this first comprehensive blog post of 2017, I try to answer the question what this study means for the field of epilepsy genetics. For example, it provides us with more than 20 epilepsy genes that we did not know about so far.
From Bioxriv to Nature. I had actually been mining the DDD dataset for a while when I received the alert about the study in Nature. As with many studies in the field, the authors had put the data on the bioxriv preprint server under a CC license that allowed people to access, comment, and use the data of the publication prior to the actual publication. This strategy of making manuscripts available pre-publication is increasingly used in the biomedical field, especially for mega-multiauthor studies. Many larger institutions have adopted this as an institutional policy already and this practice is actively supported by many scientific journals. To cut a long story short, the DDD study has been out there for a while, which allowed us to look into the data already. Now that the study is published, we can officially refer to it within a blog post.
Disclaimer. Prior to continuing with my impression of the DDD study, I would like to add a comment on what we consider epilepsy genes. When I asked Katie to proofread this blog post, it generated quite some controversy. In her role as an exome reporting genetic counselor, she has seen pretty much any of the genes before in patients with neurodevelopmental disorders. Therefore, you may question whether you can rightfully refer to them as epilepsy genes and whether they represent “new epilepsy genes”. For example, SLC35A2 was known as a gene associated with epileptic encephalopathy prior to Epi4K, which I was not aware of. For this blog post, I used my own impression of which genes are new rather than an objective measure of which genes have been reported previously. Also, given that some of the genes have a very strong link to known syndromic phenotypes, you may wonder whether the initial index patient in Epi4K may have had an epileptic encephalopathy in the context of a syndromic disorder such as Bohring-Opitz-Syndrome due to mutations in ASXL1.
DDD in a nutshell. The authors sequenced 4,293 patient-parents trios with neurodevelopmental disorders and assessed the trios for de novo mutations. In total, 93 genes were significant for enrichment of de novo mutations on a genome-wide level, providing an independent, robust confirmation that these genes are implicated in neurodevelopmental disorders. These genes include many of the known genes for epilepsies and add several new candidates. The authors were able to estimate that 42% of the patients in their cohort carried pathogenic de novo mutations and that half of the de novo mutations have haploinsufficiency as a disease mechanism, while the other half are disease-causing through altered function such as gain-of-function or dominant-negative effects. The bottom line of the DDD study is that approximately half of all severe neurodevelopmental disorders are caused by de novo mutations.
The phenotypes. The authors had access to Human Phenotype Ontology (HPO) terms for all patients as well as facial photographs of many patients that underwent an automated analysis for dysmorphic features, an increasingly used technology to objectively quantify facial features. HPO phenotypes work best for dysmorphology syndromes, but also include epilepsy-related HPO terms that we have helped developed within the initial stages of the EuroEPINOMICS project as well as other standardized phenotypic data. The authors combined both photos and HPO terms into clusters they refer to as PhenIcons for the specific genes. However, in contrast to their study on recessive phenotypes, the authors do include the phenotyping into the primary analysis. Given the vast amount of data present in this study, this will likely be the subject of a future study.
The epilepsy phenotypes. The authors pay particular attention to the epilepsy-related phenotypes and find epilepsy-related HPO terms in 20% of all patients, which makes DDD the largest epilepsy study to date with more that 1,000 patient-parent trios. The brevity of the overall manuscript makes it difficult for the authors to go into detail about the epilepsy-related findings. However, they make an interesting observation. If they select known epilepsy-related genes (it is unfortunately difficult to find their list of epilepsy genes in the publication), more than 50% of patients with de novo mutations in these genes did not have seizures. This is not really surprising given the known overlap of epilepsy with neurodevelopmental disorders, but these observations and the subsequent conclusions need to be taken with a grain of salt. Ever since SCN2A was “rediscovered” as an autism gene, we have tried to caution the large-scale data field that dichotomizing seizures as a phenotype in neurodevelopmental disorders is conceptually fraught if the longitudinal data is not taken into account. For example, many patients with SCN8A or SCN2A may develop epilepsy later in childhood, but would be coded as “non-epilepsy” if they are seen by a geneticist for developmental delay in the first year of life. As epilepsy may well become the most clinically relevant phenotype later in life, providing categorical phenotypes without explicitly referring to the age of assessment is difficult and may lead to erroneous conclusions.
Mining DDD for epilepsy genes. In order to dig into the DDD data a bit further, I used the de novo mutations in the initial Epi4K study and compared them to DDD. This basically asks the question if there are any genes found in patients with epilepsy that are now validated. If we look at the list of genes that are genome-wide significant in DDD, there are 10 new genes that were not previously connected with epilepsy. This list of genes includes KMT2A, TCF4, ITPR1, CHD4, HNRNPU, ASXL1, TRIO, KCNQ3, PTEN, and SLC35A2. These genes have been found previously in patients with epileptic encephalopathies and are now bona fide neurodevelopmental genes. Not all patients in the DDD cohort with de novo mutations have epilepsy, but given the Epi4K data, we know that Infantile Spasms or Lennox-Gastaut-Syndrome can be a feature of these genes. Using the DDD genome-wide significance is a very strict criterion, as we can also think of DDD as a validation cohort to Epi4K. If we arbitrarily select genes that had at least three de novo mutations in the DDD cohort and had an RVIS in the top 25, we can add 17 additional genes that are likely true neurodevelopmental genes, namely TRRAP, RYR2, ANK3, DIP2C, ZFHX3, FLNC, ZSWIM8, KMT2B, TAF1, CELSR1, FASN, XPO1, ABCA2, PACS2, PLCG2, SLC5A10, and WNK1. Taken together, these 27 genes explain an additional 10% (27/264) of the Epi4K cohort. Finally, the authors point out the X-linked SMC1A gene as a novel epilepsy gene. The SMC1A gene has been previously associated with Cornelia de Lange Syndrome. However, patients with truncating mutations rather than missense mutation have epilepsy and not the typical dysmorphic features seen in Cornelia de Lange Syndrome. This is not unlike our previous observation in the X-linked KIAA2022 gene.
This is what you need to know. DDD pushes the boundaries of exome sequencing, providing evidence that nearly 50% of severe neurodevelopmental disorders are explained by de novo pathogenic variants. Mining the data of the DDD study will keep us busy for a while. The DDD study is one of the publications where going through the Supplement is actually more informative that reading the actual publication and it will serve as the backdrop to many of the current genetic studies that look at de novo pathogenic variants. It lifts the entire field up to a new level of data availability and will help us make sense of genetic data in the future.