The publication data currently available has been vetted by Vanderbilt faculty, staff, administrators and trainees. The data itself is retrieved directly from NCBI's PubMed and is automatically updated on a weekly basis to ensure accuracy and completeness.
If you have any questions or comments, please contact us.
Although a subset of clear cell renal cell carcinoma (ccRCC) patients respond to immune checkpoint blockade (ICB), predictors of response remain uncertain. We investigated whether abnormal expression of endogenous retroviruses (ERVs) in tumors is associated with local immune checkpoint activation (ICA) and response to ICB. Twenty potentially immunogenic ERVs (πERVs) were identified in ccRCC in The Cancer Genome Atlas data set, and tumors were stratified into 3 groups based on their expression levels. πERV-high ccRCC tumors showed increased immune infiltration, checkpoint pathway upregulation, and higher CD8+ T cell fraction in infiltrating leukocytes compared with πERV-low ccRCC tumors. Similar results were observed in ER+/HER2- breast, colon, and head and neck squamous cell cancers. ERV expression correlated with expression of genes associated with histone methylation and chromatin regulation, and πERV-high ccRCC was enriched in BAP1 mutant tumors. ERV3-2 expression correlated with ICA in 11 solid cancers, including the 4 named above. In a small retrospective cohort of 24 metastatic ccRCC patients treated with single-agent PD-1/PD-L1 blockade, ERV3-2 expression in tumors was significantly higher in responders compared with nonresponders. Thus, abnormal expression of πERVs is associated with ICA in several solid cancers, including ccRCC, and ERV3-2 expression is associated with response to ICB in ccRCC.
Motivation - Phenome-wide association studies (PheWAS) have been used to discover many genotype-phenotype relationships and have the potential to identify therapeutic and adverse drug outcomes using longitudinal data within electronic health records (EHRs). However, the statistical methods for PheWAS applied to longitudinal EHR medication data have not been established.
Results - In this study, we developed methods to address two challenges faced with reuse of EHR for this purpose: confounding by indication, and low exposure and event rates. We used Monte Carlo simulation to assess propensity score (PS) methods, focusing on two of the most commonly used methods, PS matching and PS adjustment, to address confounding by indication. We also compared two logistic regression approaches (the default of Wald versus Firth's penalized maximum likelihood, PML) to address complete separation due to sparse data with low exposure and event rates. PS adjustment resulted in greater power than PS matching, while controlling Type I error at 0.05. The PML method provided reasonable P-values, even in cases with complete separation, with well controlled Type I error rates. Using PS adjustment and the PML method, we identify novel latent drug effects in pediatric patients exposed to two common antibiotic drugs, ampicillin and gentamicin.
Availability and implementation - R packages PheWAS and EHR are available at https://github.com/PheWAS/PheWAS and at CRAN (https://www.r-project.org/), respectively. The R script for data processing and the main analysis is available at https://github.com/choileena/EHR.
Supplementary information - Supplementary data are available at Bioinformatics online.
The latent structure of schizotypy and psychosis-spectrum symptoms remains poorly understood. Furthermore, molecular genetic substrates are poorly defined, largely due to the substantial resources required to collect rich phenotypic data across diverse populations. Sample sizes of phenotypic studies are often insufficient for advanced structural equation modeling approaches. In the last 50 years, efforts in both psychiatry and psychological science have moved toward (1) a dimensional model of psychopathology (eg, the current Hierarchical Taxonomy of Psychopathology [HiTOP] initiative), (2) an integration of methods and measures across traits and units of analysis (eg, the RDoC initiative), and (3) powerful, impactful study designs maximizing sample size to detect subtle genomic variation relating to complex traits (the Psychiatric Genomics Consortium [PGC]). These movements are important to the future study of the psychosis spectrum, and to resolving heterogeneity with respect to instrument and population. The International Consortium of Schizotypy Research is composed of over 40 laboratories in 12 countries, and to date, members have compiled a body of schizotypy- and psychosis-related phenotype data from more than 30000 individuals. It has become apparent that compiling data into a protected, relational database and crowdsourcing analytic and data science expertise will result in significant enhancement of current research on the structure and biological substrates of the psychosis spectrum. The authors present a data-sharing infrastructure similar to that of the PGC, and a resource-sharing infrastructure similar to that of HiTOP. This report details the rationale and benefits of the phenotypic data collective and presents an open invitation for participation.
Tract-based spatial statistics (TBSS) has proven to be a popular technique for performing voxel-wise statistical analysis that aims to improve sensitivity and interpretability of analysis of multi-subject diffusion imaging studies in white matter. With the advent of advanced diffusion MRI models - e.g., the neurite orientation dispersion density imaging (NODDI), it is of interest to analyze microstructural changes within gray matter (GM). A recent study has proposed using NODDI in gray matter based spatial statistics (N-GBSS) to perform voxel-wise statistical analysis on GM microstructure. N-GBSS adapts TBSS by skeletonizing the GM and projecting diffusion metrics to a cortical ribbon. In this study, we propose an alternate approach, known as gray matter surface based spatial statistics (GS-BSS), to perform statistical analysis using gray matter surfaces by incorporating established methods of registration techniques of GM surface segmentation on structural images. Diffusion microstructure features from NODDI and GM surfaces are transferred to standard space. All the surfaces are then projected onto a common GM surface non-linearly using diffeomorphic spectral matching on cortical surfaces. Prior post-mortem studies have shown reduced dendritic length in prefrontal cortex region in schizophrenia and bipolar disorder population. To validate the results, statistical tests are compared between GS-BSS and N-GBSS to study the differences between healthy and psychosis population. Significant results confirming the microstructural changes are presented. GS-BSS results show higher sensitivity to group differences between healthy and psychosis population in previously known regions.
With the proliferation of multi-site neuroimaging studies, there is a greater need for handling non-biological variance introduced by differences in MRI scanners and acquisition protocols. Such unwanted sources of variation, which we refer to as "scanner effects", can hinder the detection of imaging features associated with clinical covariates of interest and cause spurious findings. In this paper, we investigate scanner effects in two large multi-site studies on cortical thickness measurements across a total of 11 scanners. We propose a set of tools for visualizing and identifying scanner effects that are generalizable to other modalities. We then propose to use ComBat, a technique adopted from the genomics literature and recently applied to diffusion tensor imaging data, to combine and harmonize cortical thickness values across scanners. We show that ComBat removes unwanted sources of scan variability while simultaneously increasing the power and reproducibility of subsequent statistical analyses. We also show that ComBat is useful for combining imaging data with the goal of studying life-span trajectories in the brain.
Copyright © 2017 Elsevier Inc. All rights reserved.
Objective - Biomedical science is driven by datasets that are being accumulated at an unprecedented rate, with ever-growing volume and richness. There are various initiatives to make these datasets more widely available to recipients who sign Data Use Certificate agreements, whereby penalties are levied for violations. A particularly popular penalty is the temporary revocation, often for several months, of the recipient's data usage rights. This policy is based on the assumption that the value of biomedical research data depreciates significantly over time; however, no studies have been performed to substantiate this belief. This study investigates whether this assumption holds true and the data science policy implications.
Methods - This study tests the hypothesis that the value of data for scientific investigators, in terms of the impact of the publications based on the data, decreases over time. The hypothesis is tested formally through a mixed linear effects model using approximately 1200 publications between 2007 and 2013 that used datasets from the Database of Genotypes and Phenotypes, a data-sharing initiative of the National Institutes of Health.
Results - The analysis shows that the impact factors for publications based on Database of Genotypes and Phenotypes datasets depreciate in a statistically significant manner. However, we further discover that the depreciation rate is slow, only ∼10% per year, on average.
Conclusion - The enduring value of data for subsequent studies implies that revoking usage for short periods of time may not sufficiently deter those who would violate Data Use Certificate agreements and that alternative penalty mechanisms may need to be invoked.
© The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: firstname.lastname@example.org
BACKGROUND - Pediatric oncology patients are at an increased risk of invasive bacterial infection due to immunosuppression. The risk of such infection in the absence of severe neutropenia (absolute neutrophil count ≥ 500/μL) is not well established and a validated prediction model for blood stream infection (BSI) risk offers clinical usefulness.
METHODS - A 6-site retrospective external validation was conducted using a previously published risk prediction model for BSI in febrile pediatric oncology patients without severe neutropenia: the Esbenshade/Vanderbilt (EsVan) model. A reduced model (EsVan2) excluding 2 less clinically reliable variables also was created using the initial EsVan model derivative cohort, and was validated using all 5 external validation cohorts. One data set was used only in sensitivity analyses due to missing some variables.
RESULTS - From the 5 primary data sets, there were a total of 1197 febrile episodes and 76 episodes of bacteremia. The overall C statistic for predicting bacteremia was 0.695, with a calibration slope of 0.50 for the original model and a calibration slope of 1.0 when recalibration was applied to the model. The model performed better in predicting high-risk bacteremia (gram-negative or Staphylococcus aureus infection) versus BSI alone, with a C statistic of 0.801 and a calibration slope of 0.65. The EsVan2 model outperformed the EsVan model across data sets with a C statistic of 0.733 for predicting BSI and a C statistic of 0.841 for high-risk BSI.
CONCLUSIONS - The results of this external validation demonstrated that the EsVan and EsVan2 models are able to predict BSI across multiple performance sites and, once validated and implemented prospectively, could assist in decision making in clinical practice. Cancer 2017;123:3781-3790. © 2017 American Cancer Society.
© 2017 American Cancer Society.
Heart Failure (HF) is one of the most common indications for readmission to the hospital among elderly patients. This is due to the progressive nature of the disease, as well as its association with complex comorbidities (e.g., anemia, chronic kidney disease, chronic obstructive pulmonary disease, hyper- and hypothyroidism), which contribute to increased morbidity and mortality, as well as a reduced quality of life. Healthcare organizations (HCOs) have established diverse treatment plans for HF patients, but such routines are not always formalized and may, in fact, arise organically as a patient's management evolves over time. This investigation was motivated by the hypothesis that patients associated with a certain subgroup of HF should follow a similar workflow that, once made explicit, could be leveraged by an HCO to more effectively allocate resources and manage HF patients. Thus, in this paper, we introduce a method to identify subgroups of HF through a similarity analysis of event sequences documented in the clinical setting. Specifically, we 1) structure event sequences for HF patients based on the patterns of electronic medical record (EMR) system utilization, 2) identify subgroups of HF patients by applying a k-means clustering algorithm on utilization patterns, 3) learn clinical workflows for each subgroup, and 4) label each subgroup with diagnosis and procedure codes that are distinguishing in the set of all subgroups. To demonstrate its potential, we applied our method to EMR event logs for 785 HF inpatient stays over a 4 month period at a large academic medical center. Our method identified 8 subgroups of HF, each of which was found to associate with a canonical workflow inferred through an inductive mining algorithm. Each subgroup was further confirmed to be affiliated with specific comorbidities, such as hyperthyroidism and hypothyroidism.
Resistant hypertension is defined as high blood pressure that remains above treatment goals in spite of the concurrent use of three antihypertensive agents from different classes. Despite the important health consequences of resistant hypertension, few studies of resistant hypertension have been conducted. To perform a genome-wide association study for resistant hypertension, we defined and identified cases of resistant hypertension and hypertensives with treated, controlled hypertension among >47,500 adults residing in the US linked to electronic health records (EHRs) and genotyped as part of the electronic MEdical Records & GEnomics (eMERGE) Network. Electronic selection logic using billing codes, laboratory values, text queries, and medication records was used to identify resistant hypertension cases and controls at each site, and a total of 3,006 cases of resistant hypertension and 876 controlled hypertensives were identified among eMERGE Phase I and II sites. After imputation and quality control, a total of 2,530,150 SNPs were tested for an association among 2,830 multi-ethnic cases of resistant hypertension and 876 controlled hypertensives. No test of association was genome-wide significant in the full dataset or in the dataset limited to European American cases (n = 1,719) and controls (n = 708). The most significant finding was CLNK rs13144136 at p = 1.00x10-6 (odds ratio = 0.68; 95% CI = 0.58-0.80) in the full dataset with similar results in the European American only dataset. We also examined whether SNPs known to influence blood pressure or hypertension also influenced resistant hypertension. None was significant after correction for multiple testing. These data highlight both the difficulties and the potential utility of EHR-linked genomic data to study clinically-relevant traits such as resistant hypertension.
The importance of epistasis-or statistical interactions between genetic variants-to the development of complex disease in humans has been controversial. Genome-wide association studies of statistical interactions influencing human traits have recently become computationally feasible and have identified many putative interactions. However, statistical models used to detect interactions can be confounded, which makes it difficult to be certain that observed statistical interactions are evidence for true molecular epistasis. In this study, we investigate whether there is evidence for epistatic interactions between genetic variants within the cis-regulatory region that influence gene expression after accounting for technical, statistical, and biological confounding factors. We identified 1,119 (FDR = 5%) interactions that appear to regulate gene expression in human lymphoblastoid cell lines, a tightly controlled, largely genetically determined phenotype. Many of these interactions replicated in an independent dataset (90 of 803 tested, Bonferroni threshold). We then performed an exhaustive analysis of both known and novel confounders, including ceiling/floor effects, missing genotype combinations, haplotype effects, single variants tagged through linkage disequilibrium, and population stratification. Every interaction could be explained by at least one of these confounders, and replication in independent datasets did not protect against some confounders. Assuming that the confounding factors provide a more parsimonious explanation for each interaction, we find it unlikely that cis-regulatory interactions contribute strongly to human gene expression, which calls into question the relevance of cis-regulatory interactions for other human phenotypes. We additionally propose several best practices for epistasis testing to protect future studies from confounding.
Copyright © 2016 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.