Synaptic. Identifying the genetic changes underlying severe childhood epilepsies is one of the key steps for predicting outcomes and developing better treatments. However, while our ability to analyze genetic data at scale allows us to simultaneously query tens of thousands of exomes or genomes, our understanding of large phenotypic data has been limited. This limitation, the “phenotypic bottleneck”, is often frustrating, especially as many developmental and epileptic encephalopathies present with unusual and very complex phenotypic features that we would like to better understand for our clinical decision making. The lack of concepts and methods to handle large amounts of phenotypic data has been one of the main contributing factors to this shortcoming. In a new publication in the American Journal of Human Genetics, we aim to overcome this problem by identifying a measurement for phenotypic similarity, using a computational approach to determine how similar patients are to each other based on Human Phenotype Ontology terms. When combined with exome sequencing data, we identified AP2M1, a gene that caused such a similar phenotype that it stood out from the remainder of the cohort. It is the first epilepsy-associated gene identified not from a genetic association, but from phenotypic similarity.
The Rett example. Rett Syndrome was initially identified when Dr. Andreas Rett saw two patients with unusual clinical features, including the stereotypic hand movements, developmental issues, and regression with postnatal microcephaly. The clinical features in both individuals stood out to Dr. Rett and he concluded that both girls must have the same disease, a condition that we eventually found to be due to disease-causing variants in MECP2. The cognitive process going through Dr. Rett’s mind is something that clinicians do on a daily basis – picking up cues from the clinical features seen in their patients and constantly evaluating whether these features stand out and connect to other individuals that we care for.
Computational phenotyping. Our study on AP2M1 is unrelated to Rett Syndrome, but I wanted to use this example as an introduction to a new direction in epilepsy genetics, a field that I refer to as “computational phenotyping”. Basically, we are taking the same approach as Andreas Rett did, but we try to emulate the cognitive process of identifying unusual clinical features through a computational algorithm. Over the last two years, we have rebuilt our team from a traditional epilepsy genetics group to a computational phenotype and data science lab. Our current study identifying AP2M1 as a new genetic etiology in developmental and epileptic encephalopathies now is our first result of this new direction. Accordingly, I would like to dive a bit deeper into what we actually did.
Human Phenotype Ontology. While it seems intuitive that some phenotypes are more similar than others, trying to put this difference into words is not often straightforward. Different clinicians use different terms and features that may seem important to one researcher but may not be relevant to somebody else. Also, using a pre-formatted template always has limitations – if we only ask for a certain number of predefined seizures types, this is all we get back and we would not pick up other features that may be highly relevant. The Human Phenotype Ontology (HPO), founded by Peter Robinson, overcomes some of these issues. Ontologies are interconnected hierarchies of phenotypic terms that allow you to connect phenotypes. For example, if clinician A phenotypes a patient with “focal impaired awareness seizures” and clinician B uses the term “partial motor seizures”, it is immediately clear to us that both features are quite similar. However, a phenotyping algorithm wouldn’t necessarily know this unless you provide a dictionary that connects these terms. This is basically what the HPO does, with the additional advantage that terms on different levels of specificity and certainty can be connected. For example, if the only information we have about an individual is the fact that this individual has “seizures”, this would still add information, even though this is a very generic term. Our technical terminology for this is “mapping” and “information content”. Using a framework such as the HPO allows us to use any type of clinical data and match it to the > 13,000 terms in the large tree of the HPO. But what is this good for?
Computationally phenotyping of 314 individuals. In our study, we using an algorithm based on HPO to compare terms in 314 individuals with DEE that had been collected over space and time, including some of our initial EuroEPINOMICS data that is almost a decade old, to our most recent data at the Children’s Hospital of Philadelphia. We then used a newly developed computational algorithm to compare each patient with each other and quantify the degree of phenotypic similarity. While quantifying phenotypic similarity sounds somewhat nebulous, it is actually relatively straightforward. It basically means that we determine the probability that the clinical features we see in two patients occur by chance. For example, if only 10/314 individuals had status epilepticus, the probability of status epilepticus to occur in two individuals by chance is relatively low. What we did was to apply this approach across all > 3,000 phenotypic terms in 314 individuals with DEE and determine how the phenotypic features in all individuals compare to each other, building a large “phenotype matrix” that can also be shown as a big dendrogram (Figure 1). Using this matrix, we could then determine whether a group of individuals is more similar than expected when compared to everybody else. This was the first step of our analysis. But what does this have to do with epilepsy genes?
Identifying AP2M1. When we combine our phenotype measurement with exome data, we can now ask a relatively simple question: are individuals with a de novovariant in a given gene more similar than we would expect? We asked this question for all 11 genes that were found in two or more individuals. The answer was yes for DNM1, SCN8A and KCNB1, with many additional genes including KCNQ2 and SCN2A also approaching significance. In addition, this was true for a new gene that was previously not on our radar– AP2M1, the gene coding for the mu-subunit of the endocytic clathrin receptor complex AP-2. Both individuals that carried a de novovariant in AP2M1 had an identical de novovariant that we subsequently also identified in two other individuals with related phenotypes. The AP-2 complex is important for synaptic vesicle recycling and initiates the first step for vesicles to be internalized. Synaptic vesicle recycling is an emerging theme in epilepsy genetics with several of the recently discovered genes including DMN1, CLTC, or PPP3CA involved in this process. We were able to show through functional studies and modeling that the recurrent AP2M1 c.508C>T (p.Arg170Trp) variant impairs clathrin-mediated endocytosis. The ability of neurons to recycle synaptic vesicles is likely reduced.
The AP2M1 phenotype. The reason why both individuals with AP2M1 de novo variants were initially identified through our HPO analysis was an unusually similar phenotype. Both individuals had been initially contributed by two separate centers to the EuroEPINOMICS study, but it was only the phenotypic similarity analysis that eventually connected both patients. Both individuals had a phenotype that has some resemblance to Doose Syndrome (Myoclonic-Astatic Epilepsy). The clinical features that stood out were the presence of atonic seizures and atypical absence seizures with generalized EEG discharges, developmental delay, and ataxia. While any of these features alone would not have been specific for any gene, the combination of all these features in both individuals was so significant that it stood out from the wider cohort – somewhat mirroring what Andreas Rett first did when he first identified the syndrome that eventually was named after him.
\What you need to know. We identified four individuals with a recurrent variant in AP2M1 using a new phenotype-based approach, defining AP2M1 as a novel gene for developmental and epileptic encephalopathies. AP2M1 is important for synaptic vesicle recycling, and the recurrent de novo variant affects the thermodynamic stability of the AP-2 complex and impairs clathrin-mediated endocytosis. Our strategy used to identify AP2M1 was based on the systematic comparison of phenotypic terms mapped to the Human Phenotype Ontology (HPO), indicating that deep phenotyping in the epilepsies can be used to provide statistical evidence to identify novel genes.