The publication data currently available has been vetted by Vanderbilt faculty, staff, administrators and trainees. The data itself is retrieved directly from NCBI's PubMed and is automatically updated on a weekly basis to ensure accuracy and completeness.
If you have any questions or comments, please contact us.
BACKGROUND - An emerging standard-of-care for long-QT syndrome uses clinical genetic testing to identify genetic variants of the KCNQ1 potassium channel. However, interpreting results from genetic testing is confounded by the presence of variants of unknown significance for which there is inadequate evidence of pathogenicity.
METHODS AND RESULTS - In this study, we curated from the literature a high-quality set of 107 functionally characterized KCNQ1 variants. Based on this data set, we completed a detailed quantitative analysis on the sequence conservation patterns of subdomains of KCNQ1 and the distribution of pathogenic variants therein. We found that conserved subdomains generally are critical for channel function and are enriched with dysfunctional variants. Using this experimentally validated data set, we trained a neural network, designated Q1VarPred, specifically for predicting the functional impact of KCNQ1 variants of unknown significance. The estimated predictive performance of Q1VarPred in terms of Matthew's correlation coefficient and area under the receiver operating characteristic curve were 0.581 and 0.884, respectively, superior to the performance of 8 previous methods tested in parallel. Q1VarPred is publicly available as a web server at http://meilerlab.org/q1varpred.
CONCLUSIONS - Although a plethora of tools are available for making pathogenicity predictions over a genome-wide scale, previous tools fail to perform in a robust manner when applied to KCNQ1. The contrasting and favorable results for Q1VarPred suggest a promising approach, where a machine-learning algorithm is tailored to a specific protein target and trained with a functionally validated data set to calibrate informatics tools.
© 2017 American Heart Association, Inc.
The impact of inherited genetic variation on gene expression in humans is well-established. The majority of known expression quantitative trait loci (eQTLs) impact expression of local genes (-eQTLs). More research is needed to identify effects of genetic variation on distant genes (-eQTLs) and understand their biological mechanisms. One common -eQTLs mechanism is "mediation" by a local () transcript. Thus, mediation analysis can be applied to genome-wide SNP and expression data in order to identify transcripts that are "-mediators" of -eQTLs, including those "-hubs" involved in regulation of many -genes. Identifying such mediators helps us understand regulatory networks and suggests biological mechanisms underlying -eQTLs, both of which are relevant for understanding susceptibility to complex diseases. The multitissue expression data from the Genotype-Tissue Expression (GTEx) program provides a unique opportunity to study -mediation across human tissue types. However, the presence of complex hidden confounding effects in biological systems can make mediation analyses challenging and prone to confounding bias, particularly when conducted among diverse samples. To address this problem, we propose a new method: Genomic Mediation analysis with Adaptive Confounding adjustment (GMAC). It enables the search of a very large pool of variables, and adaptively selects potential confounding variables for each mediation test. Analyses of simulated data and GTEx data demonstrate that the adaptive selection of confounders by GMAC improves the power and precision of mediation analysis. Application of GMAC to GTEx data provides new insights into the observed patterns of -hubs and -eQTL regulation across tissue types.
© 2017 Yang et al.; Published by Cold Spring Harbor Laboratory Press.
Gene co-expression networks capture biologically important patterns in gene expression data, enabling functional analyses of genes, discovery of biomarkers, and interpretation of genetic variants. Most network analyses to date have been limited to assessing correlation between total gene expression levels in a single tissue or small sets of tissues. Here, we built networks that additionally capture the regulation of relative isoform abundance and splicing, along with tissue-specific connections unique to each of a diverse set of tissues. We used the Genotype-Tissue Expression (GTEx) project v6 RNA sequencing data across 50 tissues and 449 individuals. First, we developed a framework called Transcriptome-Wide Networks (TWNs) for combining total expression and relative isoform levels into a single sparse network, capturing the interplay between the regulation of splicing and transcription. We built TWNs for 16 tissues and found that hubs in these networks were strongly enriched for splicing and RNA binding genes, demonstrating their utility in unraveling regulation of splicing in the human transcriptome. Next, we used a Bayesian biclustering model that identifies network edges unique to a single tissue to reconstruct Tissue-Specific Networks (TSNs) for 26 distinct tissues and 10 groups of related tissues. Finally, we found genetic variants associated with pairs of adjacent nodes in our networks, supporting the estimated network structures and identifying 20 genetic variants with distant regulatory impact on transcription and splicing. Our networks provide an improved understanding of the complex relationships of the human transcriptome across tissues.
© 2017 Saha et al.; Published by Cold Spring Harbor Laboratory Press.
Studies of regulatory activity and gene expression have revealed an intriguing dichotomy: There is substantial turnover in the regulatory activity of orthologous sequences between species; however, the expression level of orthologous genes is largely conserved. Understanding how distal regulatory elements, for example, enhancers, evolve and function is critical, as alterations in gene expression levels can drive the development of both complex disease and functional divergence between species. In this study, we investigated determinants of the conservation of regulatory enhancer activity for orthologous sequences across mammalian evolution. Using liver enhancers identified from genome-wide histone modification profiles in ten diverse mammalian species, we compared orthologous sequences that exhibited regulatory activity in all species (conserved-activity enhancers) to shared sequences active only in a single species (species-specific-activity enhancers). Conserved-activity enhancers have greater regulatory potential than species-specific-activity enhancers, as quantified by both the density and diversity of transcription factor binding motifs. Consistent with their greater regulatory potential, conserved-activity enhancers have greater regulatory activity in humans than species-specific-activity enhancers: They are active across more cellular contexts, and they regulate more genes than species-specific-activity enhancers. Furthermore, the genes regulated by conserved-activity enhancers are expressed in more tissues and are less tolerant of loss-of-function mutations than those targeted by species-specific-activity enhancers. These consistent results across various stages of gene regulation demonstrate that conserved-activity enhancers are more pleiotropic than their species-specific-activity counterparts. This suggests that pleiotropy is associated with the conservation of regulatory across mammalian evolution.
© The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
We hypothesize that the relative mitochondria copy number (MTCN) can be estimated by comparing the abundance of mitochondrial DNA to nuclear DNA reads using high throughput sequencing data. To test this hypothesis, we examined relative MTCN across 13 breast cancer cell lines using the RT-PCR based NovaQUANT Human Mitochondrial to Nuclear DNA Ratio Kit as the gold standard. Six distinct computational approaches were used to estimate the relative MTCN in order to compare to the RT-PCR measurements. The results demonstrate that relative MTCN correlates well with the RT-PCR measurements using exome sequencing data, but not RNA-seq data. Through analysis of copy number variants (CNVs) in The Cancer Genome Atlas, we show that the two nuclear genes used in the NovaQUANT assay to represent the nuclear genome often experience CNVs in tumor cells, questioning the accuracy of this gold-standard method when it is applied to tumor cells.
Copyright © 2017 Elsevier Inc. All rights reserved.
Over the past 20 years, high-penetrance pathogenic mutations in genes BRCA1, BRCA2, TP53, PTEN, STK11 and CDH1 and moderate-penetrance mutations in genes CHEK2, ATM, BRIP1, PALB2, RAD51C, RAD50 and NBN have been identified for breast cancer. In this study, we investigated whether there are additional variants in these 13 genes associated with breast cancer among women of Asian ancestry. We analyzed up to 654 single nucleotide polymorphisms (SNPs) from 6269 cases and 6624 controls of Asian descent included in the Breast Cancer Association Consortium (BCAC), and up to 236 SNPs from 5794 cases and 5529 controls included in the Shanghai Breast Cancer Genetics Study (SBCGS). We found three missense variants with minor allele frequency (MAF) <0.05: rs80358978 (Gly2508Ser), rs80359065 (Lys2729Asn) and rs11571653 (Met784Val) in the BRCA2 gene, showing statistically significant associations with breast cancer risk, with P-values of 1.2 × 10-4, 1.0 × 10-3 and 5.0 × 10-3, respectively. In addition, we found four low-frequency variants (rs8176085, rs799923, rs8176173 and rs8176258) in the BRCA1 gene, one common variant in the CHEK2 gene (rs9620817), and one common variant in the PALB2 gene (rs13330119) associated with breast cancer risk at P < 0.01. Our study identified several new risk variants in BRCA1, BRCA2, CHEK2, and PALB2 genes in relation to breast cancer risk in Asian women. These results provide further insights that, in addition to the high/moderate penetrance mutations, other low-penetrance variants in these genes may also contribute to breast cancer risk.
© The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: firstname.lastname@example.org.
The potential impact of using human genetic data linked to longitudinal electronic medical records on drug development is extraordinary; however, the practical application of these data necessitates some organizational innovations. Vanderbilt has created resources such as an easily queried database of >2.6 million de-identified electronic health records linked to BioVU, which is a DNA biobank with more than 230,000 unique samples. To ensure these data are used to maximally benefit and accelerate both de novo drug discovery and drug repurposing efforts, we created the Accelerating Drug Development and Repurposing Incubator, a multidisciplinary think tank of experts in various therapeutic areas within both basic and clinical science as well as experts in legal, business, and other operational domains. The Incubator supports a diverse pipeline of drug indication finding projects, leveraging the natural experiment of human genetics.
To identify novel single nucleotide polymorphisms (SNPs) associated with venous thromboembolism (VTE) in African-Americans (AAs), we performed a genome-wide association study (GWAS) of VTE in AAs using the Electronic Medical Records and Genomics (eMERGE) Network, comprised of seven sites each with DNA biobanks (total ~39,200 unique DNA samples) with genome-wide SNP data (imputed to 1000 Genomes Project cosmopolitan reference panel) and linked to electronic health records (EHRs). Using a validated EHR-driven phenotype extraction algorithm, we identified VTE cases and controls and tested for an association between each SNP and VTE using unconditional logistic regression, adjusted for age, sex, stroke, site-platform combination and sickle cell risk genotype. Among 393 AA VTE cases and 4,941 AA controls, three intragenic SNPs reached genome-wide significance: LEMD3 rs138916004 (OR=3.2; p=1.3E-08), LY86 rs3804476 (OR=1.8; p=2E-08) and LOC100130298 rs142143628 (OR=4.5; p=4.4E-08); all three SNPs validated using internal cross-validation, parametric bootstrap and meta-analysis methods. LEMD3 rs138916004 and LOC100130298 rs142143628 are only present in Africans (1000G data). LEMD3 showed a significant differential expression in both NCBI Gene Expression Omnibus (GEO) and the Mayo Clinic gene expression data, LOC100130298 showed a significant differential expression only in the GEO expression data, and LY86 showed a significant differential expression only in the Mayo expression data. LEMD3 encodes for an antagonist of TGF-β-induced cell proliferation arrest. LY86 encodes for MD-1 which down-regulates the pro-inflammatory response to lipopolysaccharide; LY86 variation was previously associated with VTE in white women; LOC100130298 is a non-coding RNA gene with unknown regulatory activity in gene expression and epigenetics.
Emerging scientific endeavors are creating big data repositories of data from millions of individuals. Sharing data in a privacy-respecting manner could lead to important discoveries, but high-profile demonstrations show that links between de-identified genomic data and named persons can sometimes be reestablished. Such re-identification attacks have focused on worst-case scenarios and spurred the adoption of data-sharing practices that unnecessarily impede research. To mitigate concerns, organizations have traditionally relied upon legal deterrents, like data use agreements, and are considering suppressing or adding noise to genomic variants. In this report, we use a game theoretic lens to develop more effective, quantifiable protections for genomic data sharing. This is a fundamentally different approach because it accounts for adversarial behavior and capabilities and tailors protections to anticipated recipients with reasonable resources, not adversaries with unlimited means. We demonstrate this approach via a new public resource with genomic summary data from over 8,000 individuals-the Sequence and Phenotype Integration Exchange (SPHINX)-and show that risks can be balanced against utility more effectively than with traditional approaches. We further show the generalizability of this framework by applying it to other genomic data collection and sharing endeavors. Recognizing that such models are dependent on a variety of parameters, we perform extensive sensitivity analyses to show that our findings are robust to their fluctuations.
Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Proteins involved in tumor cell migration can potentially serve as markers of invasive disease. Activated Leukocyte Cell Adhesion Molecule (ALCAM) promotes adhesion, while shedding of its extracellular domain is associated with migration. We hypothesized that shed ALCAM in biofluids could be predictive of progressive disease. ALCAM expression in tumor (n = 198) and shedding in biofluids (n = 120) were measured in two separate VUMC bladder cancer cystectomy cohorts by immunofluorescence and enzyme-linked immunosorbent assay, respectively. The primary outcome measure was accuracy of predicting 3-year overall survival (OS) with shed ALCAM compared to standard clinical indicators alone, assessed by multivariable Cox regression and concordance-indices. Validation was performed by internal bootstrap, a cohort from a second institution (n = 64), and treatment of missing data with multiple-imputation. While ALCAM mRNA expression was unchanged, histological detection of ALCAM decreased with increasing stage (P = 0.004). Importantly, urine ALCAM was elevated 17.0-fold (P < 0.0001) above non-cancer controls, correlated positively with tumor stage (P = 0.018), was an independent predictor of OS after adjusting for age, tumor stage, lymph-node status, and hematuria (HR, 1.46; 95% CI, 1.03-2.06; P = 0.002), and improved prediction of OS by 3.3% (concordance-index, 78.5% vs. 75.2%). Urine ALCAM remained an independent predictor of OS after accounting for treatment with Bacillus Calmette-Guerin, carcinoma in situ, lymph-node dissection, lymphovascular invasion, urine creatinine, and adjuvant chemotherapy (HR, 1.10; 95% CI, 1.02-1.19; P = 0.011). In conclusion, shed ALCAM may be a novel prognostic biomarker in bladder cancer, although prospective validation studies are warranted. These findings demonstrate that markers reporting on cell motility can act as prognostic indicators.