The publication data currently available has been vetted by Vanderbilt faculty, staff, administrators and trainees. The data itself is retrieved directly from NCBI's PubMed and is automatically updated on a weekly basis to ensure accuracy and completeness.
If you have any questions or comments, please contact us.
Scrubbing identifying information from narrative clinical documents is a critical first step to preparing the data for secondary use purposes, such as translational research. Evidence suggests that the differential distribution of protected health information (PHI) in clinical documents could be used as additional features to improve the performance of automated de-identification algorithms or toolkits. However, there has been little investigation into the extent to which such phenomena transpires in practice. To empirically assess this issue, we identified the location of PHI in 140,000 clinical notes from an electronic health record system and characterized the distribution as a function of location in a document. In addition, we calculated the 'word proximity' of nearby PHI elements to determine their co-occurrence rates. The PHI elements were found to have non-random distribution patterns. Location within a document and proximity between PHI elements might therefore be used to help de-identification systems better label PHI.
BACKGROUND - While randomized controlled trials represent the highest level of evidence we can generate in comparative effectiveness research, there are clinical scenarios where this type of study design is not feasible. The Comparative Effectiveness Analyses of Surgery and Radiation in localized prostate cancer (CEASAR) study is an observational study designed to compare the effectiveness and harms of different treatments for localized prostate cancer, a clinical scenario in which randomized controlled trials have been difficult to execute and, when completed, have been difficult to generalize to the population at large.
METHODS - CEASAR employs a population-based, prospective cohort study design, using tumor registries as cohort inception tools. The primary outcome is quality of life after treatment, measured by validated instruments. Risk adjustment is facilitated by capture of traditional and nontraditional confounders before treatment and by propensity score analysis.
RESULTS - We have accrued a diverse, representative cohort of 3691 men in the USA with clinically localized prostate cancer. Half of the men invited to participate enrolled, and 86% of patients who enrolled have completed the 6-month survey.
CONCLUSION - Challenging comparative effectiveness research questions can be addressed using well-designed observational studies. The CEASAR study provides an opportunity to determine what treatments work best, for which patients, and in whose hands.
OBJECT - Recent legislation and media coverage have heightened awareness of concussion in youth sports. Previous work by the authors' group defined significant variation of care in management of children with concussion. To address this variation, a multidisciplinary concussion program was established based on a uniform management protocol, with emphasis on community outreach via traditional media sources and the Internet. This retrospective study evaluates the impact of standardization of concussion care and resource utilization before and after standardization in a large regional pediatric hospital center.
METHODS - This retrospective study included all patients younger than 18 years of age evaluated for sports-related concussion between January 1, 2007, and December 31, 2011. Emergency department, sports medicine, and neurosurgery records were reviewed. Data collected included demographics, injury details, clinical course, Sports Concussion Assessment Tool-2 (SCAT2) scores, imaging, discharge instructions, and referral for specialty care. The cohort was analyzed comparing patients evaluated before and after standardization of care.
RESULTS - Five hundred eighty-nine patients were identified, including 270 before standardization (2007-2011) and 319 after standardization (2011-2012). Statistically significant differences (p < 0.0001) were observed between the 2 groups for multiple variables: there were more girls, more first-time concussions, fewer initial presentations to the emergency department, more consistent administration of the SCAT2, and more consistent supervision of return to play and return to think after adoption of the protocol.
CONCLUSIONS - A combination of increased public awareness and legislation has led to a 5-fold increase in the number of youth athletes presenting for concussion evaluation at the authors' center. Establishment of a multidisciplinary clinic with a standardized protocol resulted in significantly decreased institutional resource utilization and more consistent concussion care for this growing patient population.
Inferring precise phenotypic patterns from population-scale clinical data is a core computational task in the development of precision, personalized medicine. The traditional approach uses supervised learning, in which an expert designates which patterns to look for (by specifying the learning task and the class labels), and where to look for them (by specifying the input variables). While appropriate for individual tasks, this approach scales poorly and misses the patterns that we don't think to look for. Unsupervised feature learning overcomes these limitations by identifying patterns (or features) that collectively form a compact and expressive representation of the source data, with no need for expert input or labeled examples. Its rising popularity is driven by new deep learning methods, which have produced high-profile successes on difficult standardized problems of object recognition in images. Here we introduce its use for phenotype discovery in clinical data. This use is challenging because the largest source of clinical data - Electronic Medical Records - typically contains noisy, sparse, and irregularly timed observations, rendering them poor substrates for deep learning methods. Our approach couples dirty clinical data to deep learning architecture via longitudinal probability densities inferred using Gaussian process regression. From episodic, longitudinal sequences of serum uric acid measurements in 4368 individuals we produced continuous phenotypic features that suggest multiple population subtypes, and that accurately distinguished (0.97 AUC) the uric-acid signatures of gout vs. acute leukemia despite not being optimized for the task. The unsupervised features were as accurate as gold-standard features engineered by an expert with complete knowledge of the domain, the classification task, and the class labels. Our findings demonstrate the potential for achieving computational phenotype discovery at population scale. We expect such data-driven phenotypes to expose unknown disease variants and subtypes and to provide rich targets for genetic association studies.
OBJECT - Postresection hydrocephalus is observed in approximately 30% of pediatric patients with posterior fossa tumors. However, which patients will develop postresection hydrocephalus is not known. The Canadian Preoperative Prediction Rule for Hydrocephalus (CPPRH) was developed in an attempt to identify this subset of patients, allowing for the optimization of their care. The authors sought to validate and critically appraise the CPPRH.
METHODS - The authors conducted a retrospective chart review of 99 consecutive pediatric patients who presented between 2002 and 2010 with posterior fossa tumors and who subsequently underwent resection. The data were then analyzed using bivariate and multivariate analyses, and a modified CPPRH (mCPPRH) was applied.
RESULTS - Seventy-six patients were evaluated. Four variables were found to be significant in predicting postresection hydrocephalus: age younger than 2 years, moderate/severe hydrocephalus, preoperative tumor diagnosis, and transependymal edema. The mCPPRH produced observed likelihood ratios of 0.737 (95% CI 0.526-1.032) and 4.688 (95% CI 1.421-15.463) for low- and high-risk groups, respectively.
CONCLUSIONS - The mCPPRH utilizes readily obtainable and reliable preoperative variables that together stratify children with posterior fossa tumors into high- and low-risk categories for the development of postresection hydrocephalus. This new predictive model will aid patient counseling and tailor the intensity of postoperative clinical and radiographic monitoring for hydrocephalus, as well as provide evidence-based guidance for the use of prophylactic CSF diversion.
OBJECTIVE - An accurate computable representation of food and drug allergy is essential for safe healthcare. Our goal was to develop a high-performance, easily maintained algorithm to identify medication and food allergies and sensitivities from unstructured allergy entries in electronic health record (EHR) systems.
MATERIALS AND METHODS - An algorithm was developed in Transact-SQL to identify ingredients to which patients had allergies in a perioperative information management system. The algorithm used RxNorm and natural language processing techniques developed on a training set of 24 599 entries from 9445 records. Accuracy, specificity, precision, recall, and F-measure were determined for the training dataset and repeated for the testing dataset (24 857 entries from 9430 records).
RESULTS - Accuracy, precision, recall, and F-measure for medication allergy matches were all above 98% in the training dataset and above 97% in the testing dataset for all allergy entries. Corresponding values for food allergy matches were above 97% and above 93%, respectively. Specificities of the algorithm were 90.3% and 85.0% for drug matches and 100% and 88.9% for food matches in the training and testing datasets, respectively.
DISCUSSION - The algorithm had high performance for identification of medication and food allergies. Maintenance is practical, as updates are managed through upload of new RxNorm versions and additions to companion database tables. However, direct entry of codified allergy information by providers (through autocompleters or drop lists) is still preferred to post-hoc encoding of the data. Data tables used in the algorithm are available for download.
CONCLUSIONS - A high performing, easily maintained algorithm can successfully identify medication and food allergies from free text entries in EHR systems.
BACKGROUND AND OBJECTIVE - We recently reported that kidney function declined faster among initiators of sulfonylureas compared to metformin; however, sulfonylurea use compared to metformin use was also associated with increases in body mass index (BMI) and systolic blood pressure (SBP). We sought to determine if differences between sulfonylureas and metformin on kidney function decline were mediated by differential effects on BMI, SBP, or glucose control.
METHODS - We identified 13,238 veterans who initiated sulfonylurea or metformin treatment (2000–2007) with a baseline estimated glomerular filtration rate (eGFR) >60 mL/minute, and followed them until a study event occurred, non-persistence on treatment, loss of follow-up, or end of the study. The composite outcome was a sustained decline from baseline eGFR of ≥25%, end-stage renal disease, or death. We estimated the association of cumulative measurements of potential mediators including BMI, SBP, and glycated hemoglobin on the study outcome. We determined if controlling for these time-varying covariates accounted for the differences in outcome between sulfonylurea and metformin initiators.
RESULTS - Compared to sulfonylurea use, metformin use was associated with a lower risk for renal function decline or death [adjusted hazard ratio (aHR) 0.82, 95% confidence interval 0.70, 0.97]. This protective association remained significant [aHR 0.83 (0.70–0.98)] when accounting for the cumulative time-varying measurements of the three mediators of interest.
CONCLUSION - Metformin initiation was associated with a lower risk of kidney function decline or death compared to sulfonylureas, which which appeared to be independent of changes in BMI, SBP, and glycated hemoglobin over time.
Health information technologies facilitate the collection of massive quantities of patient-level data. A growing body of research demonstrates that such information can support novel, large-scale biomedical investigations at a fraction of the cost of traditional prospective studies. While healthcare organizations are being encouraged to share these data in a de-identified form, there is hesitation over concerns that it will allow corresponding patients to be re-identified. Currently proposed technologies to anonymize clinical data may make unrealistic assumptions with respect to the capabilities of a recipient to ascertain a patients identity. We show that more pragmatic assumptions enable the design of anonymization algorithms that permit the dissemination of detailed clinical profiles with provable guarantees of protection. We demonstrate this strategy with a dataset of over one million medical records and show that 192 genotype-phenotype associations can be discovered with fidelity equivalent to non-anonymized clinical data.
Clinically oriented interface terminologies support interactions between humans and computer programs that accept structured entry of healthcare information. This manuscript describes efforts over the past decade to introduce an interface terminology called CHISL (Categorical Health Information Structured Lexicon) into clinical practice as part of a computer-based documentation application at Vanderbilt University Medical Center. Vanderbilt supports a spectrum of electronic documentation modalities, ranging from transcribed dictation, to a partial template of free-form notes, to strict, structured data capture. Vanderbilt encourages clinicians to use what they perceive as the most appropriate form of clinical note entry for each given clinical situation. In this setting, CHISL occupies an important niche in clinical documentation. This manuscript reports challenges developers faced in deploying CHISL, and discusses observations about its usage, but does not review other relevant work in the field.