SCN2A – a neurodevelopmental disorder digitized through 10,860 phenotypic annotations

HPO. SCN2A-related disorders represent one of the most common causes of neurodevelopmental disorders and developmental and epileptic encephalopathies (DEE). However, while a genetic diagnosis is easily made through high-throughput genetic testing, SCN2A-related disorders have such a broad phenotypic range that understanding the full scale of the clinical features has been traditionally difficult. In our recent study, we used a harmonized framework for phenotypes based on the Human Phenotype Ontology (HPO) to systematically curate phenotypic annotations in all individuals reported in the literature and followed at our center, a total of 413 unrelated individuals. Mapping phenotypic data onto 10,860 terms with 562 unique concepts and applying some of the computational tools we have developed over the last three years, we were able to delineate the phenotypic range in unprecedented detail. SCN2A is now the first DEE with all available data systematically curated and harmonized in a computable format, allowing for entirely novel insights.

Figure 1. Overview of SCN2A variants and associated phenotypic features. (a) The NaV1.2 channel (above) and gene (below), highlighting a selection of recurrent variants. (b) The frequency of phenotypic features within categorized phenotypic subgroups: developmental and epileptic encephalopathy (DEE, n = 255), autism (ASD, n = 60), benign familial neonatal–infantile seizures (BFNIS, n = 53), other epilepsy (n = 27), and atypical SCN2A-related phenotypes (n = 18). Boxed frequencies indicate the five most frequent Human Phenotype Ontology (HPO) terms within each respective phenotypic subgroup. CNS central nervous system, EEG electroencephalogram, PTV protein-truncating variant. Adapted from Crawford, Xian et al. under a CC BY 4.0 license (link)

The genetic shapeshifter. A few years ago, I was trying to capture the range of features in the SCN2A-related disorders, only to realize the broadness of the phenotype. At the time, it became clear that SCN2A had actually been discovered three times almost independently. First as a gene for self-limiting infantile seizures, then as an autism gene, and finally as a gene for the Developmental and Epileptic Encephalopathies (DEE). To add to this confusion, the sequence of discoveries is basically in inverse relation to the frequency of SCN2A-related phenotypes. While self-limiting infantile seizures were described as the first phenotype, this entity basically constitutes one of the smallest groups of SCN2A-related disorders where DEE and autism account for the majority of cases, including a proportion of individuals initially classified as having autism only, who later met the criteria for DEE. In brief, our understanding of SCN2A-related disorders required some major paradigm changes that occurred in two phases. First, gain-of-function variants and loss-of-function variants were recognized to result in different phenotypes and different medication responses. While gain-of-function variants are seen in early-onset epilepsies that may respond to sodium-channel blockers, loss-of-function variants predominantly result in autism with later seizure onset and often an adverse reaction to sodium channel blockers. This first paradigm change allowed us to conceptualize the SCN2A-related disorders as distinct groups. However, it left the following questions unanswered: how useful is this distinction, are there other groups that need to be distinguished, and what criteria would we use to make such a distinction, anyway?

Human Phenotype Ontology. When you look through the literature on SCN2A-related disorders and also include information we have on patients seen at our center, one thing stands out. Clinical information is typically very heterogenous, including different concepts and frameworks that have been used and, most importantly, different levels of detail. How can all this heterogeneous data jointly make sense? The answer is data mapping to a common language that captures clinical terms and concepts at various level of complexity, which represents the second paradigm change. In brief, our lab has worked extensively with the Human Phenotype Ontology (HPO) that we have used for this purpose, which helped us analyze a total of 10,860 clinical annotations in 413 individuals, including individuals never reported in the scientific literature.

Unprecedented detail. In our study by Crawford, Xian et al., we used the concept of HPO-based data mapping to capture the phenotypic complexity of the SCN2A-related disorders in 562 distinct clinical concepts with the HPO framework helping us to harmonize the clinical information. Figure 1 shows an overview of the most common clinical concepts in the five major clinical subgroups that we captured. The key in this table is that these frequencies depict accurate representations of phenotype frequencies through computational reasoning of clinical concepts, including broader phenotypes that were implicit in their phenotypic description within the original publication but not explicitly documented. Using such computational tools, we found that the frequency of neurodevelopmental abnormalities or seizures is higher than a first overview of the literature might suggest.

PTV vs. missense variants. With both genomic and clinical data captured in a computable format, we could assess associations with specific variant classes and locations. Despite a large degree of heterogeneity, individuals with protein-truncating variants (PTV) were less likely than those with missense variants to have seizures. Individuals with missense variants were more likely to have early-onset epilepsy with multiple seizure types and EEG abnormalities. These were the associations that we expected based on previous studies, but the harmonized data provided granularity far beyond this, with 22 HPO terms associated with missense variants and 13 HPO terms associated with PTV when corrected for multiple testing (Table 2 in our publication).

Signatures in recurrent variants. One important aspect of our study was the analysis of recurrent SCN2A variants, such as p.R853Q. We used semantic similarity analysis, a framework to capture relatedness between individuals across a variety of terms that we had used previously. We found that eight recurrent variants show prominent phenotypic similarity, suggesting a distinct phenotypic pattern of these variants that sets the phenotypes of these individuals apart from the larger group of individuals with SCN2A-related disorders. For example, these phenotypic similarities in individuals with the recurrent p.R853Q variant were due to a more than two-fold increase of infantile spasms and hypsarrhythmia, as well as a much higher frequency of chorea and movement disorders. However, the phenotypic relatedness extended far beyond this, emphasizing that the overall “gestalt” of clinical phenotypes in complex neurodevelopmental disorders cannot simply be broken down to individual phenotypic features, but is due to a complex pattern.

PCA, variant prediction, negative terms. In this blog post, I was only able to highlight some aspect of our larger study that also included a specific form of Principle Component Analysis (PCA) to define subgroups that validate our clinical distinction into discrete larger groups such as DEE, autism, and self-limiting familial seizures. We were able to demonstrate that this analysis, in many ways, builds a prediction tool that allows us to reliably assess whether a specific phenotypic is due to a gain-of-function or loss-of-function variant. Finally, we assessed the power of “negative” phenotypes, terms that indicate whether a specific concept is explicitly absent when describing an individual’s phenotype. This pioneering type of phenotype analysis in our study enabled us to make some relatively strong statements about SCN2A-related phenotypes. For example, across all individuals with SCN2A-related disorders, individuals with a novel missense variant are three times more likely not to have autism and almost 20 times more likely not to have any form of intellectual disability. This result may be surprising at first. However, it is largely driven by the higher frequency of individuals with self-limiting seizures in the overall cohort and may potentially help with providing families in the setting of counseling and prognostication, especially if a new diagnosis is made in a young infant.

What you need to know. SCN2A is the first developmental and epileptic encephalopathy and neurodevelopmental disorder to be “digitized” with the full clinical information available in the literature mapped to a computable format. We have used this unique dataset in our new study by Crawford, Xian et al. to outline the overall clinical landscape of the SCN2A-related disorders and explore how SCN2A phenotypes are related to specific variant classes and locations. By doing this, we attempted to provide a blueprint for how complex clinical information can be meaningfully analyzed, which will create the backbone for natural history studies and outcome analyses in SCN2A-related disorders and many other genetic epilepsies.

Ingo Helbig is a child neurologist and epilepsy genetics researcher working at the Children’s Hospital of Philadelphia (CHOP), USA.