Unlocking STXBP1 through Electronic Medical Records

Understanding the EMR. Several weeks ago, I gave a presentation at the STXBP1 Summit conference, the third annual meeting since the first in 2019 – a time when I had just entered the field of neurogenetics. It has been fascinating to follow one of the neurodevelopmental genes with the “fastest growing knowledge,” with the expanded scope of clinical studies and emergence of novel avenues for targeted gene therapies on the horizon. However, one of the many projects our STXBP1 team is currently working on takes a somewhat atypical approach – we aimed to map the natural disease history of STXBP1-related disorders based entirely on reconstructed Electronic Medical Records (EMR). Here are some of the challenges we have had to confront and what we learned searching for meaning in the depth of the EMR.

Figure 1. The changes in seizure frequencies in individuals with STXBP1-related disorders with epilepsy in the first three years of life, stratified by individuals with protein-truncating variants (PTV) versus individuals with missense variants. We reconstructed the seizure histories of 93 individuals with STXBP1-related epilepsy, capturing >25 seizure types and categorical seizure severities in one-month time increments using the Epilepsy Learning Health System (ELHS) and Pediatric Epilepsy Learning Health System (PELHS)-championed framework. The dynamic pattern of seizures between the two STXBP1 subgroups over time (shown in the alluvial graphs) can be compared to determine time windows of statistical difference.

Carl Sagan and a medium for phenotypic atoms. As this is a post rooted in concepts established in Ingo’s phenotypic atomism series, it is of note that I am an “observer” of the EMR; I approach the medical records from a research standpoint, with no say in changing the events and movements that constitute it. In fact, as a non-clinical researcher, I would even encounter an error message if I tried to edit a patient’s chart (I have “view-only” access). Nevertheless, and even though the EMR was initially implemented for billing and workflow purposes, there remains much to be observed. From clinical and genetic diagnoses, medication histories, exam findings and developmental histories, the clinical footprints from each patient that passes through is captured. The EMR stands as a medium that combines the dimensions of the phenotypic atom and time into its own version of space-time. When building concepts and the foundational framework on which we can capture and understand disease histories in, it reminded me of a remark from astronomer and astrophysicist Carl Sagan: “science is a way of thinking much more than it is a body of knowledge.” Making sense of EMR-based data is a way of thinking, and here is how we have approached it so far.

Redefining the building blocks of the EMR. Admittedly, reconstructing seizure histories on a monthly time interval took some effort. It took time to conceptualize the framework on how we should capture the data; to then comb through patient charts in search of any and all information on seizures; and finally, to develop and fine-tune the approaches to analyze and visualize the data. For example, while we created our initial version of the STXBP1 seizure alluvial over a year ago, we only recently decided to flip the entire graph upside-down to better display the relative proportion of individuals with seizures during a certain month (Figure 1). Searching through notes in patient charts took even longer and remains an active project for our group, including expanding to other genes including SCN8A, SCN2A, and SYNGAP1.  While time-consuming, this granularity is critical, especially in capturing the wide range of changes that can happen both from month-to-month and over time.

Working in ambiguity. Following my talk at the STXBP1 summit, I was asked two questions that touched upon the uncertainty when it comes to working with EMR-based data. First, how to capture the nuance in observing and capturing developmental milestones. And second, how do we tease out the natural history of a disorder amid the effects and response from treatment strategies? To elaborate on the basis of the second question, could it be that the statistical difference in seizure histories in PTV versus missense variants in STXBP1 shown in Figure 1 is the result of our ability to better treat clinically infantile spasms, a seizure type more common in individuals with PTV? While we can further stratify our cohort by seizure types – infantile spasms versus other seizure types such as focal-onset seizures – and by treatment strategy –specific individual anti-seizure medications or co-prescriptions, when working with EMR data, we also go by, “perfect is the enemy of the good.”

Understanding of implicit assumptions. Much of our EMR research is rooted in how we handle missing data, address assumptions we must make with incomplete information, and make generalizable knowledge from what we have. Fortunately, in the pediatric sphere where follow-up visits are typically frequent, we have sufficiently powered datasets in which we can explore these concepts. In doing so, much of what we do comes down to how we define and give meaning to seemingly nebulous concepts like, for example, how we measure and characterize the degree of an improvement in seizure severity and whether it is associated with a specific treatment. Or even simply how we define the frequency of a phenotypic feature, when taking into account whether the clinical feature is explicitly absent versus simply not captured. However, grounding our observations in objective data allows us to detect clinical patterns such as age-related phenotypes or medication efficacy that are lost amid the variability and heterogeneity of many human disorders and are complicated when limited to personal anecdote. As in the STXBP1 summit, it is encouraging when the premises to our approaches are confronted, and it helps guide future efforts and considerations when developing and improving later methods.

A learning health system. So, what else can we use an EMR-based approach for? Almost in parallel, we have started to accumulate a wealth of data captured in routine clinical care in the format of Common Data Elements (CDEs), which have allowed for our group to, for example, characterize the seizure burden across a pediatric epilepsy cohort of 1,038 individuals and stratify by epilepsy syndrome. This cohort is growing exponentially and has already surpassed 2,500 individuals with CDE data as of early this year. However, seizures stand as only one component of the neurodevelopmental disorders our lab focuses on, and we have begun to expand our work to investigate developmental histories captured through standardized exams recorded in flowsheets and milestones documented in the EMR. Our efforts are ongoing and at this stage and require patience as datasets grow, in essence – just giving it time.

What do you need to know? When trying to understand longitudinal disease histories, the EMR provides a wealth of data capturing phenotypic traces and clinical snapshots of each individual. In our ongoing efforts to assess the natural history of STXBP1-related disorders through deep phenotyping of EMR-based data, we continue to explore and build the framework upon which computational approaches and standardized data can be leveraged to map disease and subgroup-specific trajectories across various clinical domains alongside the dimension of time. With a better grasp on how to navigate the space encapsulated in the EMR, we can start to bridge the divide between research and clinical care.

Julie Xian is a Data Scientist in the Helbig Lab at Children’s Hospital of Philadelphia (CHOP).