BACKGROUND - Next-generation sequencing of individuals with genetic diseases often detects candidate rare variants in numerous genes, but determining which are causal remains challenging. We hypothesized that the spatial distribution of missense variants in protein structures contains information about function and pathogenicity that can help prioritize variants of unknown significance (VUS) and elucidate the structural mechanisms leading to disease.
RESULTS - To illustrate this approach in a clinical application, we analyzed 13 candidate missense variants in regulator of telomere elongation helicase 1 (RTEL1) identified in patients with Familial Interstitial Pneumonia (FIP). We curated pathogenic and neutral RTEL1 variants from the literature and public databases. We then used homology modeling to construct a 3D structural model of RTEL1 and mapped known variants into this structure. We next developed a pathogenicity prediction algorithm based on proximity to known disease causing and neutral variants and evaluated its performance with leave-one-out cross-validation. We further validated our predictions with segregation analyses, telomere lengths, and mutagenesis data from the homologous XPD protein. Our algorithm for classifying RTEL1 VUS based on spatial proximity to pathogenic and neutral variation accurately distinguished 7 known pathogenic from 29 neutral variants (ROC AUC = 0.85) in the N-terminal domains of RTEL1. Pathogenic proximity scores were also significantly correlated with effects on ATPase activity (Pearson r = -0.65, p = 0.0004) in XPD, a related helicase. Applying the algorithm to 13 VUS identified from sequencing of RTEL1 from patients predicted five out of six disease-segregating VUS to be pathogenic. We provide structural hypotheses regarding how these mutations may disrupt RTEL1 ATPase and helicase function.
CONCLUSIONS - Spatial analysis of missense variation accurately classified candidate VUS in RTEL1 and suggests how such variants cause disease. Incorporating spatial proximity analyses into other pathogenicity prediction tools may improve accuracy for other genes and genetic diseases.