GABRB3, 15q dups, and CNVs from exomes

GABAergic. Let’s start out with a provocative statement. There is a single gene that may explain more cases of Lennox-Gastaut Syndrome (LGS) and Infantile Spasms (IS) than you would expect, rivalling SCN1A for the most common gene found in this group of patients. It’s a gene that you are probably aware of but that you may think to be a very rare finding. In a recent publication in Annals of Neurology, the Epi4K consortium published their recent analysis of copy number variations that were derived from exome data. Combining de novo mutations and copy number variations points to GABRB3 as a major player in LGS and IS, explaining probably more than 2% of patients. Let’s find out about the twilight zone, strategies to obtain structural variants from exomes, and the re-emergence of the 15q duplication syndrome.

Into the twilight zone. With all this technology around, you would naturally assume that our genome is well covered. However, this is only partially true. We are very good at detecting larger structural genomic variants or copy number variations, i.e. duplications or deletions of genomic material with genome-wide arrays including Single Nucleotide Polymorphism (SNP) arrays or array Comparative Genomic Hybridization (CGH). We’re also very good a detecting sequence variants. This is what gene panels and exomes do. However, between single base pairs (1 base pair) and detectable copy number variation (larger than 50,000-100,000 base pairs) is a grey zone that is difficult to probe. It’s the genetic twilight zone of small deletions and duplications that we are virtually blind to. To put it differently, if you use both exome and array CGH, you may miss a single exon deletion in common genes like SCN1A. Some commercial gene panels include exon-level deletion and duplication testing that addresses this shortcoming, but this technology is not available on a genome-wide scale. Nobody knows what is hidden out there and how much small deletions and duplications contribute to disease risk. However, there are some strategies.

UCSC genome browser graphic of the de novo duplications in the 15q region in Infantile Spasms and Lennox-Gastaut Syndrome (blue) and de novo mutations in GABRB3 in the same cohort. GABRB3 is the only gene in the region that has de novo mutation in the Epi4K cohort. Figure modified from UCSC genome browser.

UCSC genome browser graphic of the de novo duplications in the 15q region in Infantile Spasms and Lennox-Gastaut Syndrome (blue) and de novo mutations in GABRB3 in the same cohort. GABRB3 is the only gene in the region that has de novo mutation in the Epi4K cohort. Figure modified from UCSC genome browser.

CNVs from exomes. Detecting structural genomic variants from SNP arrays was a major breakthrough in genetic research. SNP arrays cover hundreds of thousands of pre-defined common variants in the human genome and were the main tool for genome-wide association studies (GWAS). However, when we got increasingly comfortable with these arrays, we realized that we could also use them to not only assess whether a variant was present in a particular genotype, but we were also able to assess the signal intensity. And when we combined information about intensity from hundreds, if not thousands of neighboring SNPs, we were able to detect deletions and duplications. This technical trick proved to be immensely valuable. It allowed the large GWAS datasets to be assessed for structural variants without repeating genotyping and it led to the discovery of the recurrent microdeletions in human epilepsy. Basically, this is how we found the 15q13.3 microdeletion. Taken one step further, why shouldn’t the same strategy work for exomes, as well? Why not use read depth to look for structural genomic variants – read depth tells you how often a given base pair has been sequenced within an exome. There are various studies and strategies out there to accomplish this task. However, exomes for CNV discovery have turned out to be much more resilient than we initially thought. The currently best strategy is to query exomes with bioinformatics tools such as CoNIFER and validate the findings with conventional methods. This is what the researchers of the Epi4K consortium did to look at deletions and duplications within their cohort.

CNVs. The numbers from the recent Epi4K publication tell us why validation for exome-derived CNVs is necessary. The validation rate of CNV predicted by exome data ranged between 24% and 66%, depending on what kind of CNV was validated. Given this variability, the Epi4K researcher subjected their candidate CNVs to a stringent validation process and were able to detect and validate 18 de novo CNVs in 17 patients. None of the patients had another causative de novo mutation by exome. The only recurrent de novo CNV was a duplication on chromosome 15q, which was found in three individuals (figure). In addition, the authors found a deletion and a duplication involving SCN1A and SCN2A, a MAGI2 deletion, a 9p terminal deletion and a 14q23 deletion involving GPHN, the gene coding for Gephrin. Several de novo CNVs were thought to be of unknown significance including a 8p23 deletion involving MCPH1, the gene coding for a recessive microcephaly syndrome and a 17q12 deletion.

15q duplication syndrome. The current study sheds a new light on the 15q duplication syndrome, a long-known cytogenetic duplication syndrome that arises due to the particular genomic breakpoint architecture on chromosome 15. With the current study, the main candidate gene in this region shifts from UBE3A (Angelman Syndrome) to GABRB3, coding for the beta 3 receptor of the GABA-A receptor. Also, this finding puts new emphasis on the need to combine CNV and exome data, given that GABRB3 is the only gene in the overlapping 15q region that has de novo mutations in the same dataset. In total, seven patients in the Epi4K cohort either have de novo mutations in this gene or de novo duplications spanning this gene, accounting for more than 2% of the overall cohort.

This is what you need to know. 15q is back. The 15q region harbors more genes relevant to the genetic morbidity of IS and LGS than the 15q13.3 microdeletion. In fact, the 15q13.3 microdeletion was not even found in this cohort, supporting the notion that it might be associated with generalized epilepsies and intellectual disability rather than with a broad range of epilepsies. The frequency of pathogenic CNVs in this cohort of classic epileptic encephalopathies approaches 3%, highlighting the fact than even in the era of exome sequencing, structural genomic variations remain an important source of genomic morbidity. These findings add to a rate of more than 15% of patients whose epilepsy can be confidently explained by de novo mutations found by exome or by structural genomic variants.

Ingo Helbig is a child neurologist and epilepsy genetics researcher working at the Children’s Hospital of Philadelphia (CHOP), USA.

Twitter