The human pangenome and the flavor of epilepsy variant interpretation

Reference. Today, the human pangenome was announced, the first reference of the human genome that systematically includes a cohort of genetically diverse individuals. The human genome, once thought to be a linear reference, is now a graph with nodes and edges. I came across the pangenome publications when I was thinking about a comment that I made earlier this week, when I was asked whether people on our team have their own flavor of variant interpretation. Let me share with you how both topics connect.

Figure 1. Visualizing the human genome as a graph, not as a reference. Using a linear reference makes it difficult to describe human variation in a consistent way and separates the sequence from the variation. Capturing DNA sequence as a graph makes it possible to understand how each genome relates to any other genome, preserving information about variation more comprehensively [NGHRI].

Being the flavor. When asked about the flavor of variant interpretation, I gave a seemingly non-sensical answer: “We don’t have individuals flavors, we are the flavor”. What I meant by this cryptic remark was the following: variant interpretation is only really functional if it is reproducible and if everybody has internalized the same set of rules. These rules are not individual and don’t take into account your personal interest or expertise. It is the other way around: if you want to contribute to improvements to variant interpretation, you have to think yourself into the existing frameworks, not adding your personal flavor, but changing the system from the inside by being the flavor. Ok, this is mysterious enough – back to reference genomes and the actual topic of this blog post, namely biases in understanding genomic variation in health in disease.

The reference genome. Every scientist working with genomic data knows the idea of reference genome, such as the current reference for most genomic studies, GRCh38. The idea of a reference genome is simple – there is a single standard genome that can be thought of as a linear strand, and then there are differences, either benign variations or variations related to disease. In fact, on my slide where I typically introduce phenotypic complexity, I usually point out how simple genomic data is – it has a position and variation. However, the more variation within humans is analyzed, the more we realize that a single reference does not exist. There are single variants, but also duplications, deletions, inversions, and so forth as part of the typical genomic variation. In brief, you need to let go of the relatively simply idea of a single, linear strand. Our genome does not have a single reference, but many references. And when trying to understand variation, comparing to the various references is important. This is what a graph does.

Biases. Thinking of the genome as a single strand to which everything can be compared is a long-standing bias that is hard to shake off. We were trained this way and don’t even realize this mental image as a framing bias anymore. Over the next few years, we will see how the pangenome concept will seep into representation of human variation in diagnostic reports and genetic studies. But for now, we can ask what other biases we might have. I would like to point out three biases when interpreting variants. Here is where the flavor comes in again.

Case-only bias for functional data. For the last few years, we were involved in making sure that functional data in epilepsy-related ion channels is adequately represented in variant interpretation. For example, what could be more convincing for the pathogenicity of an SCN2A variant than an observed gain-of-function effect? However, when trying to convince the governing bodies for variant curation, we were faced with a seemingly trivial question. We were asked whether we have measured sufficient controls to know that variation is not also found in the unaffected population. In the epilepsy field, we feel strongly about functional data. However, we are often unaware that we lack information about normal variation, making functional results much less powerful for variant classification than we would usually think.

VUS pareidolia. Pareidolia is a psychological phenomenon where the mind perceives a familiar pattern, image, or meaning where it does not actually exist. This happens when we look at ambiguous stimuli, such as clouds, tree bark, or stains on a wall, and see a recognizable image, such as a face or an animal. “Variant of Uncertain Significance Pareidolia” is a term I came up with to refer to a situation where a clinician is seeing a meaningful pattern in a VUS, even though the actual significance of the variant is unknown. We have a hard time letting true variants of uncertain significance be uncertain – we have a tendency to either craft narratives why uncertain variants are causative or benign, a process that I refer to as “biopoetry” in our teaching sessions. However, this is also a cognitive bias – for some variants, we simply need to accept that we cannot know anything else about them at this point in time. There is no flavor.

GDR category mistakes. When it comes to understanding causation, genes and phenotypes come as bundles, the so-called gene-disease relationships. For example, SCN1A has an established gene-disease relationship with Dravet Syndrome, but not with Juvenile Myoclonic Epilepsy (JME) or self-limiting neonatal seizures. Therefore, the idea of an epilepsy gene is somewhat misleading, as genetic etiologies only have validity once a specific phenotype is considered. Everything else is uncertain. For example, it might well be that SCN1A can cause JME, but we don’t know this yet. The established gene-disease relationship with Dravet Syndrome adds nothing to a potential link to JME – this gene-disease relationship needs to stand on its own feet. Making category mistakes for gene-disease relationship are probably the most common mistakes in our teaching session. For example, bi-allelic protein-truncating variants in SZT2 are known to cause a developmental and epileptic encephalopathy (DEE). When we then encounter a child with a mild infantile epilepsy with bi-allelic missense variants that do not have a predicted deleterious effect, we may be inclined to implicate these variants in the individual’s epilepsy. Along the lines of: “SZT2 is established, so milder variants cause a milder disease”. However, while theoretically possible, the evidence for this assertion is lacking. Thinking of SZT2 as a causative etiology for a self-limited infantile epilepsy implies an entirely new gene-disease relationship that requires its own, independent evidence.

What you need to know. In the same way that the idea of a pangenome challenges our framing bias of the genome as a linear reference, we have several biases when it comes to variant interpretation. Overestimating the weight of functional data while underestimating normal variation, creating biopoetry narratives for true variants of uncertain significance, and overestimating gene-disease validity in novel phenotypes are amongst the most common biases that we encounter when training up providers in epilepsy genetics. However, there are established frameworks for both gene and variant interpretation that are best understood by being aware of the underlying principles and by becoming the flavor, not having a flavor.

Ingo Helbig is a child neurologist and epilepsy genetics researcher working at the Children’s Hospital of Philadelphia (CHOP), USA.