Overview. There have been numerous publications on de novo mutations in autism and intellectual disability over the last three years. Many of these studies struggle to distinguish signal from noise, and the plethora of findings leaves the reader wondering which genes are bona fide autism genes and in which cases the evidence is limited. A recent paper in Nature Genetics uses a new metric to assess expected versus observed de novo mutations in more than published 1000 autism patient-parent trios – and the answers appear to be straightforward.
Three points. I chose the paper by Samocha and collaborators for three specific reasons. First, it clears up some of confusion around de novo mutations in autism. Secondly, it provides a good explanation on why case-control studies for de novo mutations are futile. Third (my favorite point), it is the first publication to specifically mention the titin dilemma. TTN coding for titin is the gene with the most mutations identified in patients with autism. However, it is also one of the largest genes in the human genome. This dilemma highlights the fact that counting mutations might be misleading – we need a better measurement to assess the relevance of any particular gene.
Association. Traditionally, when we want to assess whether a certain gene or variant is associated with disease, we would use a comparison of the frequency of cases and controls – we use the basic principle of an association study that will give us a p-value. In the discussion of their publication, Samocha and collaborators provide a good example why this concept is difficult for rare de novo mutations. Even if we had a mutation in a specific gene in 3/1000 cases and 0/1000 control trios, we would still not reach a level of significance that we would be happy with. And by the way, it would be hard to convince anybody to sequence that many control trios. This is reason why researchers increasingly use deviation from expected mutation rates. Basically, if we look at 1000 autism trios, we have a baseline of mutations per exome, which can be broken down by gene size into the number of expected mutations per any given gene. A deviation from this indicates that the gene is mutated more frequently than expected. For example, in their cohort of 1078 autism trios, loss-of-function mutations in DYRK1A would only be expected in 0.0072 cases. However, three patients carry such mutations, which is highly significant.
Constraint. Samocha and collaborators also assess possible candidate genes in the human genome from a different angle. Based on the distribution of mutations, they identify 1003 genes that seem to be significantly intolerant to variations that change the coding sequence of the gene. Basically, these genes accumulate synonymous mutations as predicted by the overall mutation rate, but the rate of missense, splice site, and stop mutations is severely constrained. These genes are highly overrepresented amongst the genes that are found to be mutated in patients with autism and intellectual disability, but not in their unaffected sibs. This method of assessing genes with low functional variation is reminiscent of the RVIS that was used in the Epi4K paper. In fact, both scores are correlated, but there are some small differences. Some genes, for example, do not have an RVIS value, but are found to be amongst the 1003 genes with mutational constraint identified by Samocha and collaborators. Amongst these genes, for example, is KIAA0100, a gene found to be mutated in two patients with autism. Given the new metric developed by Samocha and collaborators, KIAA0100 now becomes a prime candidate gene.
Global mutations and candidates. When Samocha and collaborators assess the overall mutation rate in patients with autism spectrum disorder (ASD) and intellectual disability (ID) and unaffected siblings, they find a comparable mutation rate for overall mutations and missense mutations. However, the rate for loss of function mutation is increased. This result has not always been clear from earlier studies, but clearly stands out in this larger analysis. When they assess the genes that accumulate significantly more mutations than expected, the number of genes with significant results is actually quite low. For ASD, this is the case for DYRK1A, SCN2A, CHD8, KATNAL2, POGZ and ARID1B. If strict statistical criteria are applied, only DYRK1A and SCN2A are significant. For ID, the authors find significant results for SYNGAP1, SCN2A, STXBP1, TCF4, GRIN2A and TRIO. These genes have more mutations than expected by the overall distribution – which is the most reasonable way to assess this. As mentioned earlier, TTN does not come up as a significant gene because of its size. TTN does not accumulate more mutations than expected by the general distribution, which is a good final footnote on a gene that comes up quite frequently.
What you need to know. Samocha and collaborators demonstrate that the number of established autism/ID genes is slowly growing and that there is sufficient evidence to implicate some of the gene mentioned above in the etiology of these disorders. Their paper demonstrates nicely that we need to assess two components when trying to assess the possible pathogenicity of a given gene: the mutational spectrum of this gene, which is reflected in the mutational constraint, and the overall expected rate of mutations. With the next level of magnitude, several more genes are likely to become significant, which is a phenomenon reminiscent of GWAS studies.