Introducing the revised Human Phenotype Ontology (HPO) – a new language for Big Data in the epilepsies

Classification. Our classification of the epilepsies periodically undergoes revision to align the way we think about the epilepsies with scientific progress in the field. While it is intuitive that relatively novel frameworks such as the 2017 International League Against Epilepsy (ILAE) Operational Classification of Seizure Types capture the current spirit of the field more accurately than prior classifications, one relatively simple question is not easily answered: how much more accurate? How we get to such an answer requires us to take a step back and think about how the value of clinical information can be measured and compared. In our recent publication, we describe the revision of the Human Phenotype Ontology (HPO) according to the most recent ILAE classifications and other respected definitions in current use. This gives the answer to the prior question: 40% (which is a lot). Continue reading

SCN2A – a neurodevelopmental disorder digitized through 10,860 phenotypic annotations

HPO. SCN2A-related disorders represent one of the most common causes of neurodevelopmental disorders and developmental and epileptic encephalopathies (DEE). However, while a genetic diagnosis is easily made through high-throughput genetic testing, SCN2A-related disorders have such a broad phenotypic range that understanding the full scale of the clinical features has been traditionally difficult. In our recent study, we used a harmonized framework for phenotypes based on the Human Phenotype Ontology (HPO) to systematically curate phenotypic annotations in all individuals reported in the literature and followed at our center, a total of 413 unrelated individuals. Mapping phenotypic data onto 10,860 terms with 562 unique concepts and applying some of the computational tools we have developed over the last three years, we were able to delineate the phenotypic range in unprecedented detail. SCN2A is now the first DEE with all available data systematically curated and harmonized in a computable format, allowing for entirely novel insights. Continue reading

Make data speak in rare childhood epilepsies

Capturing data. While genetic analysis can be performed and investigated on an industrial scale in thousands of individuals in parallel, the analysis of clinical data is frequently still the domain of manual data curation. Clinical data is typically collected in a non-standardized way, which makes it difficult for information generated in a clinical context to be used in a systematic data analysis as can be performed with genomic data. However, the tide is turning, and we are slowly coming around to the idea that clinical data also requires the same degree of standardization in order to be used at scale. For none of the epilepsies is such standardization more important than for the rare epilepsies, which include many of the genetic epilepsies. Our lab has been working on frameworks and methods to allow for this kind of analysis in genetic epilepsies. Here is a brief summary of what it actually means to “make data speak”, which has become the mission statement of our lab. Continue reading

The spectrum of de novo variants in 30,000 individuals with neurodevelopmental disorders

NDD. Trio-exome sequencing is the gold standard to identify the underlying genetic basis in individuals with neurodevelopmental disorders. De novo variants account for the vast majority of causative genetic findings once a diagnosis is made, but the overall genetic landscape is very heterogeneous, with few genes explaining more than 1% of the genetic morbidity. As the largest study of its kind to date, a recent publication in Nature assessed the spectrum of de novo variants in neurodevelopmental disorders in more than 31,000 individuals. The authors identify more than 250 disease-associated genes, highlight 28 novel genetic etiologies, and highlight signals in their data that hint at more than 1,000 disease-associated genes yet to be discovered. In this blog post, I have summarized the five take-home messages from this large study. Continue reading

OMIM to retire EIEE classification – an important step to overhaul terminology for genetic epilepsies

EIEE. Online Mendelian Inheritance in Man (OMIM) is the undisputed main resource for information regarding genes and disease. It is the resource that the majority of clinicians and researchers in the field turn to in order to get information about established or novel genetic etiologies in genetic epilepsies and neurodevelopmental disorders. However, historically, OMIM had decided to enumerate many of the genes for developmental and epileptic encephalopathies within a phenotypic series called Early Infantile Epileptic Encephalopathies (EIEE). The field has advanced, and we now understand that most genetic etiologies have a broad phenotypic range and can cause a wide range of epilepsy phenotypes. Accordingly, in collaboration and consultation with our ClinGen epilepsy clinical domain working group, OMIM will retire the EIEE classification and refer to them as developmental and epileptic encephalopathies (DEE). Dravet Syndrome, formerly EIEE6 will now become DEE6, which is the secondary annotation to the actual term “Dravet Syndrome”. For some, this might be a small change in semantics. However, as a clinician trying to make sure that the uniqueness and distinctiveness of childhood epilepsies in the era of large-scale data analysis is appreciated, this small step is likely to be highly influential in the future. Here is some background on how the EIEEs finally became DEEs. Continue reading

Understanding patient advocacy – the Rare Epilepsy Landscape Analysis (RELA)

The Rares. The increasing number of genetic diagnoses in rare epilepsies has resulted in the formation of a large number of non-profit organizations and support groups over the last decade.  These support organizations for rare epilepsies (“Rares”) have already had an important impact on the epilepsy genetics field. However, the overall impact, direction, and needs of the Rares have never been assessed systematically.  In a recent editorial, Ilene Penn Miller summarized the findings of the Rare Epilepsy Landscape Analysis (RELA), which surveyed 44 advocacy and support organizations for rare epilepsies. Continue reading

The SCN1A rs6732655 enigma – a reply

rs6732655. I acknowledge that the title of this blog post looks like my keyboard is broken, but please bear with me. Last month, I blogged about a recent genome-wide association by the BioBank Japan (BBJ), discussing the evidence for a Single Nucleotide Polymorphism (SNP) in the vicinity of the SCN1A gene (rs6732655). In a prior study, the SNP in question was initially found to be associated with epilepsy and I discussed the fact that this SNP, albeit not significant by itself, was also seen at a higher frequency in cases than in controls in the epilepsy cohort of the BBJ study. I received some comments regarding this post and it was pointed out that my reasoning was incorrect given that rs6732655 was not nominally significant in the BBJ study. Therefore, this study was not a replication study in itself. Let me retrace my steps and revisit where my hunch came from to write the initial blog post. Continue reading

Entering the phenotype era – HPO-based similarity, big data, and the genetic epilepsies

Semantic similarity. The phenotype era in the epilepsies has now officially started. While it is possible for us to generate and analyze genetic data in the epilepsies at scale, phenotyping typically remains a manual, non-scalable task. This contrast has resulted in a significant imbalance where it is often easier to obtain genomic data than clinical data. However, it is often not the lack of clinical data that causes this problem, but our ability to handle it. Clinical data is often unstructured, incomplete and multi-dimensional, resulting in difficulties when trying to meaningfully analyze this information. Today, our publication on analyzing more than 31,000 phenotypic terms in 846 patient-parent trios with developmental and epileptic encephalopathies (DEE) appeared online. We developed a range of new concepts and techniques to analyze phenotypic information at scale, identified previously unknown patterns, and were bold enough to challenge the prevailing paradigms on how statistical evidence for disease causation is generated. Continue reading

Copy Number Variations in the epilepsies – a 2020 update

CNV. There are different forms of genetic variation and historically, our ability to query the entire exome or genome is a relatively recent development. However, the first type of genetic variation that could be assessed in the epilepsies in large cohorts were copy number variations (CNV), small gains or losses of chromosomal materials. In a recent study, the entire Epi25 cohort was analyzed for CNVs, giving a long-needed update on the role of the structural genomic variations in various forms of epilepsies and highlighting that the overall landscape of CNVs in the epilepsies is well understood and delineated. With up to 3% of individuals with epilepsies carrying some of the recurrent CNVs, this type of genomic variation remains a rare, but important source of genetic morbidity in the epilepsies. Continue reading

The natural history of genetic epilepsies as told by 3,200 years of electronic medical records

EMR. When we consider the natural history of rare diseases like the genetic epilepsies, we typically think about a lack of longitudinal data that contrasts with the abundant genetic information that is available nowadays – the so-called phenotyping gap. We typically suggest that we need to obtain this information in future prospective studies to better understand long-term outcome, response to medications, and potential early warning signs for an adverse disease course. However, a vast amount of clinical data is collected on an ongoing basis through electronic medical records (EMR) as a byproduct of regular patient care. In a recent study, our group built tools to mine the electronic medical records to assess the disease history of 658 individuals with known or presumed epilepsies using clinical information collected at more than 62,000 patients encounters across more than 3,200 patient years. Here is a brief summary of our first study on EMR genomics, an untapped resource that has the potential to improve our understanding of the genetic epilepsies. Continue reading