What is the genomic blind spot?

Beneath the surface. Even though the comprehensiveness of next generation sequencing technologies may suggest that we can capture all the variation in the human genome, there is an entire gray zone of small rearrangements that current technologies are blind to. In a recent publication in the American Journal of Human Genetics, Brand and collaborators now use a novel technology to explore the twilight zone of genomics, the realm of small deletions, duplication, inversions and cryptic complex rearrangements.

The blind spot. Genetic variation in the human genome occurs on a spectrum from small to large rearrangements. Historically, we have approached the analysis of the human genome from both extremes – with the traditional technologies, we were able to reliably assess single base variation and alterations of entire or partial chromosomes. Everything in between, however, was considered dark matter, largely inaccessible to existing technologies up until the early 2000s. Newer methods including array comparative genomic hybridization (aCGH) and next generation sequencing made it possible to narrow this gap – we are now able to reliably assess structural genomic variations including duplications and deletions down to a resolution of 10-20 kB depending on the platform. Also, next generation sequencing (NGS) technologies are getting better and better at predicting indels, small insertions and deletions spanning several base pairs. However, there is still an enormous gap between the upper limit of what NGS can find and the lower limit of what array CGH can detect. This is the genomic blind spot.

The genomic blind spot. There is a gray zone between the lower level of variation that can be detected with microarrays and the upper level of what conventional NGS technologies such as exome sequencing can detect. Variants of a certain size are difficult to assess using these technologies, and modification of whole-genome sequencing approaches may capture part of the variation hidden in this gray zone.

The genomic blind spot. There is a gray zone between the lower level of variation that can be detected with microarrays and the upper level of what conventional NGS technologies such as exome sequencing can detect. Variants of a certain size are difficult to assess using these technologies, and modification of whole-genome sequencing approaches may capture part of the variation hidden in this gray zone.

Inversions, complex rearrangements. In addition to the obvious gap that is not covered by either technology in terms of variant size, there are other variations in the human genome that we cannot reliably detect. Inversion, variations of the human genome where parts of a chromosome are simply turned around, are basically invisible, unless there are obvious changes at the breakpoints. Also, more complex rearrangements with multiple breakpoints are not detectable. Even though normal sequencing and unremarkable exon-level deletion/duplication screening may suggest that a given gene is “intact” in the human genome, it might still be the case that parts of the gene are inverted. Unless the breakpoints are within the coding region, we would have no way of telling this.

The study. By using a modified next-generation whole genome sequencing approach referred to as large-insert jumping libraries, Brand and collaborators manage to explore small deletions and duplication, insertions and more complex rearrangements on a genome-wide basis. They focus on patients with neurodevelopmental disorders, trying to maximize the potential of finding possibly explanatory variation. Genetic variation outside the detection threshold of microarray technologies is referred to as “cryptic” variation. The amount of cryptic genetic variation identified by Brand and collaborators is remarkable. The authors identify ~96 cryptic structural variants per individual at high confidence and additional 111 small structural variations with lower confidence below 6 kB. In total, they find that 3.8 Mb of genome rearranged per genome. In addition, these alterations often affected the protein-coding part of the human genome. On average, they identify 42 additional loss of function mutations per individual. Interestingly, 80% of these variants would be cryptic to conventional microarrays. In summary, the study by Brand and collaborators offer us a sneak peek at the plethora of cryptic genomic variation in the human genome. Their study suggests that there is much more cryptic variation than we would have initially thought, including tandem duplications, balanced inversions, intrachromosomal insertions and interchromosomal insertions, and complex rearrangements with multiple breakpoints.

The IQGAP1 story. The authors manage to identify a balanced in version of 5.2 Mb on chromosome 15q25-26 in a single patient with autism and intellectual disability. Part of the genomic material that includes the IQGAP1 gene and AKAP13 gene is turned around, disrupting both genes. While the role of AKAP13 is not known, IQGAP1 is an interesting candidate gene, as it is known to be regulator of dendritic spine function. Also, with respect to single base pair variation, it is one of the top 5% most intolerant genes to functional variation. This cryptic inversion results in two small duplications at the breakpoints that are visible on conventional array CGH. Accordingly, it was possible for the authors to screen a large cohort of individuals genotyped within a diagnostic setting for a similar duplication signature, which represents the “echo” of this inversion on array CGH. Interestingly, the authors found 7 additional individuals in 30,000 cases with neurodevelopmental disorders compared to no cases in 12,000 controls. Brand and collaborators may have stumbled upon a novel genetic variation for autism and intellectual disability.

This is what you need to know. There is a gray zone of genomic variation that is invisible to us with conventional genetic and genomic technologies. The study by Brand and collaborators and other previous studies find that this twilight zone harbors an abundance of genomic variation, some of which may be relevant to human disease. As with exome sequencing and CNV analysis, we will require large case and control samples to tell causative variation from random genetic variation. Independent of this, the study by Brand and collaborators reminds us that there are significant parts of the human genome that we don’t understand.

Ingo Helbig

Child Neurology Fellow and epilepsy genetics researcher at the Children’s Hospital of Philadelphia (CHOP), USA and Department of Neuropediatrics, Kiel, Germany

Facebook Twitter