Improving diagnostic yield in rare diseases through phenotypic-driven approaches

NDD. Family-based (trio) exome sequencing has become the standardized method for identifying genetic etiologies that cause neurodevelopmental disorders. De novo variants have been responsible for the majority of pathogenic genetic findings, although the landscape of genetic disorders overall is highly heterogeneous. In a recently published study, the authors assessed variant classification to identify new molecular diagnoses and factors influencing the likelihood of receiving a diagnosis. The study reported a diagnostic yield of over 41%, highlighting 60 new genes associated with developmental disorders. The authors also emphasized the importance of structured and detailed phenotypic information for improving variant interpretation. This blog post provides a brief review of their publication in the context of improving diagnostic yield using a phenotypically driven approach in rare diseases.

Figure 1. Distribution of number of diagnoses per gene from the DDG2P database reported by Wright et al. Out of 825 genes, more than 600 genes (75%) only have less than 5 reported diagnoses per gene. The genes with more than 50 reported diagnoses (ANKRD11, KMT2A, ARID1B, and DDX3X) represent the broader genetic causes of developmental disorders.

Computational approaches for classifying pathogenicity. The authors analyzed exome sequencing data from 13,610 individuals in conjunction with standardized phenotypic data available from the DDD study. Diagnostic variants were identified using several approaches. First, variants in genes overlapping with those previously reported in the DDG2P database were prioritized. Additionally, a Bayesian framework was used to predict variant pathogenicity, and computational methods utilizing phenotype-based likelihoods were employed to combine predicted variant classification with known gene-disease models. Figure 1 provides a summary of the number of genes based on the frequency of diagnosis.

Diagnostic yield and factors affecting. The authors identified 19,285 potentially pathogenic variants in the probands, which were further evaluated by clinicians. A total of 60 new genes associated with developmental disorders were discovered in the study and contributed to the existing variant database. The study reported that 41% of the probands in the cohort received either a predicted or clinical diagnosis. By utilizing the computational approach for variant pathogenicity, over 1000 VUS (variants of uncertain significance) were classified as pathogenic, increasing the overall yield by approximately 15%. As expected, de novo variants in OMIM genes and variants inherited from mosaic parents accounted for about 80% of the reported variants. The authors also identified several key factors that increased the probability of receiving a diagnosis, such as recruiting a family trio, presence of developmental delay in the proband, clinical features indicative of a syndrome, and being the only affected member. The authors further estimated that there are still numerous genes or diagnoses to be identified by improving the evaluation of incomplete penetrance and discovering novel associations between genes and developmental disorders.

This is what you need to know. The study conducted by Wright and collaborators involved large-scale sequencing in developmental disorders. In over 13,000 families, the study reported a diagnostic yield of 41% after employing a computational approach to aid variant classification. Additionally, the authors identified 60 genes associated with developmental disorders. The study highlights the genetic heterogeneity in developmental disorders and underscores the importance of detailed phenotyping combined with a genome-wide approach, utilizing diverse variant detection and filtration processes, to identify new diagnoses from existing datasets.

Shiva Ganesan

Shiva is a bioinformatics scientist in the Helbig lab.