Narrowing the phenotype gap through vector embedding

Sparse data. Trying to match the growing body of genomic datasets with associated clinical data is difficult for a variety of reasons. Most importantly, while genomic data are standardized and can be generated at scale, clinical data are often unstructured and sparse, making it difficult to represent a phenotype fully through any type of abbreviated format. Quite frequently in our prior blog posts, we have discussed the Human Phenotype Ontology (HPO), a standardized dictionary where all phenotypic features can be mapped and linked. But these data also quickly become large and the question on how best to handle them remains. In a recent publication, we translated more than 53M patient notes using HPO and explored the utility of vector embedding, a method that currently forms the basis of many AI-based applications. Here is a brief summary on how these technologies can help us to better understand phenotypes. Continue reading

Unlocking STXBP1 through Electronic Medical Records

Understanding the EMR. Several weeks ago, I gave a presentation at the STXBP1 Summit conference, the third annual meeting since the first in 2019 – a time when I had just entered the field of neurogenetics. It has been fascinating to follow one of the neurodevelopmental genes with the “fastest growing knowledge,” with the expanded scope of clinical studies and emergence of novel avenues for targeted gene therapies on the horizon. However, one of the many projects our STXBP1 team is currently working on takes a somewhat atypical approach – we aimed to map the natural disease history of STXBP1-related disorders based entirely on reconstructed Electronic Medical Records (EMR). Here are some of the challenges we have had to confront and what we learned searching for meaning in the depth of the EMR. Continue reading

Make data speak in rare childhood epilepsies

Capturing data. While genetic analysis can be performed and investigated on an industrial scale in thousands of individuals in parallel, the analysis of clinical data is frequently still the domain of manual data curation. Clinical data is typically collected in a non-standardized way, which makes it difficult for information generated in a clinical context to be used in a systematic data analysis as can be performed with genomic data. However, the tide is turning, and we are slowly coming around to the idea that clinical data also requires the same degree of standardization in order to be used at scale. For none of the epilepsies is such standardization more important than for the rare epilepsies, which include many of the genetic epilepsies. Our lab has been working on frameworks and methods to allow for this kind of analysis in genetic epilepsies. Here is a brief summary of what it actually means to “make data speak”, which has become the mission statement of our lab. Continue reading

Big data, ontologies, and the phenotypic bottle neck in epilepsy research

Unconnected data. Within the field of biomedicine, large datasets are increasingly emerging. These datasets include the genomic, imaging, and EEG datasets that we are somewhat familiar with, but also many large unstructured datasets, including data from biomonitors, wearables, and the electronic medical records (EMR). It appears that the abundance of these datasets makes the promise of precision medicine tangible – achieving an individualized treatment that is based on data, synthesizing available information across various domains for medical decision-making. In a recent review in the New England Journal of Medicine, Haendel and collaborators discuss the need in the biomedical field to focus on the development of terminologies and ontologies such as the Human Phenotype Ontology (HPO) that help put data into context. This review is a perfect segue to introduce the increasing focus on computational phenotypes within our group in order to overcome the phenotypic bottleneck in epilepsy genetics. Continue reading