EMR. Genomic data is increasingly available for large patient cohorts. In parallel, healthcare is increasingly digitized and large amounts of data can easily be extracted and analyzed at the click of a button. In principle, this should provide tremendous opportunities to understand how epilepsy care can be personalized based on genetic factors. However, we quickly run into challenges. Obtaining information on seizure frequencies, for example, requires manual chart review. Trying to understand how a person’s genetic makeup affects responses to anti-seizure medications is therefore not possible in large healthcare systems where related questions in other diseases can increasingly be answered. Here is a brief overview of how we can meaningfully engage with clinical data when outcomes are simply not available.
Outcomes. Improvements in epilepsy care require a reliable and efficient framework to measure outcomes. Clinical documentation, however, is usually heterogeneous, making it difficult to analyze data at scale outside of dedicated research studies. Manual curation is often required when retrospective data is analyzed and the overall throughput of manual phenotypes is much slower than our ability for genotyping, which can be performed at scale. One solution for this dilemma is the introduction of Common Data Elements (CDE) in the Electronic Medical Records (EMR) that provides a modality for documentation and later assessment of standardized data.
CDE Standards. There are ongoing efforts to integrate tools for CDE acquisition into the EMR and we have already seen in our research that the amount of data increases rapidly if a sufficiently large user group engages in the generation of standardized data from routine clinical care. However, while this sounds promising, this does not help with existing clinical data and individuals where this information has not been obtained by providers. But how can these issues be addressed?
Computable phenotypes. In the figure above, I wanted to give an example on what can be done at a time when there is still a significant backlog of phenotypic data. In brief, we can try to use other existing pieces of information to obtain some information about how individuals respond to specific medications. One framework that our lab has worked with over the last few years is called “computable phenotypes.” The idea behind computable phenotypes is that we can take a subset of commonly used data fields and make inferences about other phenotypes. While this concept sounds futuristic, there are several applications of this. For example, the Phenotype KnowledgeBase (PheKB) lists more than 100 EMR-based computable phenotypes. For example, for ADHD, the combination of a clinical ICD code plus prescription of at least one ADHD medication is reliable way to identify individual with a positive predictive value of 90%.
Medication prescriptions. The different types of information in the EMR can be more or less reliable. For example, the description of clinical features is so heterogeneous and billing-focused that the validity of this data is sometimes not clear. Medication prescriptions, however, are very reliable measures as the prescriptions form the basis for having these medications filled at a pharmacy. Therefore, trying to understand how more complex epilepsy phenotypes can be inferred from medication prescription data, is worthwhile. For example, if an individual is subsequently started on various anti-seizure medications (ASM), we can conclude that this individual had seizures refractory to the earlier medications. Making these assumptions will never be 100%, but the amount of clinical data that can be unlocked by applying such algorithms is large. And while the field is slowly transitioning to systematic documentation of Common Data Elements, these tools may provide critically needed information in the interim.
Conclusion. When I went through some of my old presentations over the weekend, I came across a figure that I had used five years ago for introducing the concept of EMR genomics (Figure 1). In brief, looking at this medication trajectory allows for the conclusion that we are looking at the disease history of an individual with refractory infantile spasms not responding to steroids, topiramate, vigabatrin, and rufinamide. Accordingly, we can build computational algorithms based on this pattern recognition and apply them at scale with datasets that have been genotyped in order to identify genetic factors driving particular drug responses. And even in the absence of available outcome measures, there is meaningful clinical information that can be used as a proxy.