The Hague, winter of 1997. Last week challenged my most basic beliefs, which reminded me of “Axiomatic”, a collection of science fiction short stories by Greg Egan. While on holiday in the Netherlands in 1997, I had bought this book in the Den Haag Centraal bookstore, and subsequently lost it or gave it away. I only remembered the title three weeks ago, and ordered it online. The book arrived at the same time that news from Antwerp twisted my brain. In the signature story of “Axiomatic” with the same title, a man acquires a nano-robot based implant that allows him to change his innermost convictions (I told you that it’s science fiction, right?). He basically wants to have the courage to kill the man who murdered his wife. After carrying out his revenge and after the effect of the axiomatic implant has worn off, he starts craving for more, since he is missing the certainty in his life that the implant had given him. I needed to adjust my deeply held expectation on how to find de novo mutations after Tania, a PhD student in the Antwerp lab, had pulled out a de novo mutation in one of our trios that Denovogear had missed. This mutation turned out to be another hit in a gene that we had seen before.
How to find de novo mutations. Exome sequencing gives you so-called variant files (for example through the GATK algorithm) that highlight all the variants in a given exome that are different from the reference genome. In a previous post, I commented on how futile it is to use variant files to identify de novo mutation in contrast to the more algorithmic way using Bayesian algorithms as implemented in Denovogear (DNG). In principle, it should be a piece of cake to take the variant file of father, mother and child and look for variants that are exclusive to the child. Practically, however, you may get swamped by artefacts. Or so I thought. I tried the filtering in DRA7, a trio with a known SCN1A mutation. Filtering down for variants not present in the parents is a reduction from ~86,000 variants to 4200 variants. If we apply additional filters (rare, exonic/splicing, deleterious, sufficient coverage), we end up with the single SCN1A variant that we were looking for. There are apparently some situations in which alignment of variant files is as good as DNG. But how could this “crude” filtering beat the apparently better algorithm?
Providing context. In our trio with the new mutation (called NLES8), DNG did not find any de novo variant with a sufficiently high posterior probability that would have warranted a closer look. And this closer look is all that matters. At the base pair in question, some reads in the parental genome showed a different call. This artefact was sufficiently powerful to throw the Bayesian algorithm off track and we ended up with a low posterior probability. However, keeping in mind that this gene might have potential and that it was seen before in another trio, Tania pushed further and included this variant in the PCR validation step. And it came back de novo.
Take home message. What take home message should I engrave on the axiomatic implant that I need right now to update my belief system? First, we need a second look for de novo mutations. A strategy for a second look should be contextual, i.e. you assign a higher probability to genes that you already saw once. If these genes actually have a higher probability or whether there is really much more de novo variation out there than picked up by DNG, will need to be seen (this is the question whether Tania was simply lucky or highly systematic). Second, we need to understand why DNG failed. When we looked back at the posterior probability, it came back as 0.2. Usually, we set the cut-off at 0.7 to sort out artefacts. Knowing that occasionally real variants may have a low posterior probability, additional criteria (genomic context, per base mutation rate, black list of known false positives etc.) might be needed to trim back the flood of false positives if we lower the threshold.
The new gene. Our finding is still preliminary and we don’t want to name the gene at this point due to confidentiality issues. However, it is an epilepsy candidate gene that makes immediate sense and calls for validation and functional studies. Tania found this gene a week after the last NLES working group teleconference and we will discuss this during our next teleconference.