The publication data currently available has been vetted by Vanderbilt faculty, staff, administrators and trainees. The data itself is retrieved directly from NCBI's PubMed and is automatically updated on a weekly basis to ensure accuracy and completeness.
If you have any questions or comments, please contact us.
The latent structure of schizotypy and psychosis-spectrum symptoms remains poorly understood. Furthermore, molecular genetic substrates are poorly defined, largely due to the substantial resources required to collect rich phenotypic data across diverse populations. Sample sizes of phenotypic studies are often insufficient for advanced structural equation modeling approaches. In the last 50 years, efforts in both psychiatry and psychological science have moved toward (1) a dimensional model of psychopathology (eg, the current Hierarchical Taxonomy of Psychopathology [HiTOP] initiative), (2) an integration of methods and measures across traits and units of analysis (eg, the RDoC initiative), and (3) powerful, impactful study designs maximizing sample size to detect subtle genomic variation relating to complex traits (the Psychiatric Genomics Consortium [PGC]). These movements are important to the future study of the psychosis spectrum, and to resolving heterogeneity with respect to instrument and population. The International Consortium of Schizotypy Research is composed of over 40 laboratories in 12 countries, and to date, members have compiled a body of schizotypy- and psychosis-related phenotype data from more than 30000 individuals. It has become apparent that compiling data into a protected, relational database and crowdsourcing analytic and data science expertise will result in significant enhancement of current research on the structure and biological substrates of the psychosis spectrum. The authors present a data-sharing infrastructure similar to that of the PGC, and a resource-sharing infrastructure similar to that of HiTOP. This report details the rationale and benefits of the phenotypic data collective and presents an open invitation for participation.
Pregnant women have been called therapeutic orphans because data supporting common interventions, medications, health teaching, and models of care are meager. The generation of quality evidence benefits from proactive approaches that ensure ethical standards are met to protect participants. The purpose of this article is to differentiate among health care, quality improvement, and research and to discuss ethical involvement of women who are pregnant and potentially childbearing in these initiatives. Health care is provided to protect and improve individual health. Quality improvement aims to enhance delivery of care for all those receiving care in particular settings. Research, whether retrospective or prospective, is designed to contribute to generalizable knowledge. This review includes vignettes to distinguish between research, quality improvement, and case study dissemination and to highlight the value of publication of information with applicability beyond a single site. As a community, perinatal care providers will be able to contribute more evidence to guide care if they err on the side of seeking institutional review board approval for activities that examine the care and outcomes of pregnant women and the fetus. Traditional research activities, including clinical trials, remain crucial. However, to fill gaps in knowledge, we must expedite our ability to report informative cases, examine clinical data, share lessons learned during quality improvement campaigns, and publish and disseminate these findings. Accelerating improvements in care demands expansion of the evidence base.
© 2017 by the American College of Nurse-Midwives.
Objective - Biomedical science is driven by datasets that are being accumulated at an unprecedented rate, with ever-growing volume and richness. There are various initiatives to make these datasets more widely available to recipients who sign Data Use Certificate agreements, whereby penalties are levied for violations. A particularly popular penalty is the temporary revocation, often for several months, of the recipient's data usage rights. This policy is based on the assumption that the value of biomedical research data depreciates significantly over time; however, no studies have been performed to substantiate this belief. This study investigates whether this assumption holds true and the data science policy implications.
Methods - This study tests the hypothesis that the value of data for scientific investigators, in terms of the impact of the publications based on the data, decreases over time. The hypothesis is tested formally through a mixed linear effects model using approximately 1200 publications between 2007 and 2013 that used datasets from the Database of Genotypes and Phenotypes, a data-sharing initiative of the National Institutes of Health.
Results - The analysis shows that the impact factors for publications based on Database of Genotypes and Phenotypes datasets depreciate in a statistically significant manner. However, we further discover that the depreciation rate is slow, only ∼10% per year, on average.
Conclusion - The enduring value of data for subsequent studies implies that revoking usage for short periods of time may not sufficiently deter those who would violate Data Use Certificate agreements and that alternative penalty mechanisms may need to be invoked.
© The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: email@example.com
BACKGROUND - Genomic data is increasingly collected by a wide array of organizations. As such, there is a growing demand to make summary information about such collections available more widely. However, over the past decade, a series of investigations have shown that attacks, rooted in statistical inference methods, can be applied to discern the presence of a known individual's DNA sequence in the pool of subjects. Recently, it was shown that the Beacon Project of the Global Alliance for Genomics and Health, a web service for querying about the presence (or absence) of a specific allele, was vulnerable. The Integrating Data for Analysis, Anonymization, and Sharing (iDASH) Center modeled a track in their third Privacy Protection Challenge on how to mitigate the Beacon vulnerability. We developed the winning solution for this track.
METHODS - This paper describes our computational method to optimize the tradeoff between the utility and the privacy of the Beacon service. We generalize the genomic data sharing problem beyond that which was introduced in the iDASH Challenge to be more representative of real world scenarios to allow for a more comprehensive evaluation. We then conduct a sensitivity analysis of our method with respect to several state-of-the-art methods using a dataset of 400,000 positions in Chromosome 10 for 500 individuals from Phase 3 of the 1000 Genomes Project. All methods are evaluated for utility, privacy and efficiency.
RESULTS - Our method achieves better performance than all state-of-the-art methods, irrespective of how key factors (e.g., the allele frequency in the population, the size of the pool and utility weights) change from the original parameters of the problem. We further illustrate that it is possible for our method to exhibit subpar performance under special cases of allele query sequences. However, we show our method can be extended to address this issue when the query sequence is fixed and known a priori to the data custodian, so that they may plan stage their responses accordingly.
CONCLUSIONS - This research shows that it is possible to thwart the attack on Beacon services, without substantially altering the utility of the system, using computational methods. The method we initially developed is limited by the design of the scenario and evaluation protocol for the iDASH Challenge; however, it can be improved by allowing the data custodian to act in a staged manner.
Emerging scientific endeavors are creating big data repositories of data from millions of individuals. Sharing data in a privacy-respecting manner could lead to important discoveries, but high-profile demonstrations show that links between de-identified genomic data and named persons can sometimes be reestablished. Such re-identification attacks have focused on worst-case scenarios and spurred the adoption of data-sharing practices that unnecessarily impede research. To mitigate concerns, organizations have traditionally relied upon legal deterrents, like data use agreements, and are considering suppressing or adding noise to genomic variants. In this report, we use a game theoretic lens to develop more effective, quantifiable protections for genomic data sharing. This is a fundamentally different approach because it accounts for adversarial behavior and capabilities and tailors protections to anticipated recipients with reasonable resources, not adversaries with unlimited means. We demonstrate this approach via a new public resource with genomic summary data from over 8,000 individuals-the Sequence and Phenotype Integration Exchange (SPHINX)-and show that risks can be balanced against utility more effectively than with traditional approaches. We further show the generalizability of this framework by applying it to other genomic data collection and sharing endeavors. Recognizing that such models are dependent on a variety of parameters, we perform extensive sensitivity analyses to show that our findings are robust to their fluctuations.
Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
BACKGROUND AND OBJECTIVE - There is an increasing desire to share de-identified electronic health records (EHRs) for secondary uses, but there are concerns that clinical terms can be exploited to compromise patient identities. Anonymization algorithms mitigate such threats while enabling novel discoveries, but their evaluation has been limited to single institutions. Here, we study how an existing clinical profile anonymization fares at multiple medical centers.
METHODS - We apply a state-of-the-artk-anonymization algorithm, withkset to the standard value 5, to the International Classification of Disease, ninth edition codes for patients in a hypothyroidism association study at three medical centers: Marshfield Clinic, Northwestern University, and Vanderbilt University. We assess utility when anonymizing at three population levels: all patients in 1) the EHR system; 2) the biorepository; and 3) a hypothyroidism study. We evaluate utility using 1) changes to the number included in the dataset, 2) number of codes included, and 3) regions generalization and suppression were required.
RESULTS - Our findings yield several notable results. First, we show that anonymizing in the context of the entire EHR yields a significantly greater quantity of data by reducing the amount of generalized regions from ∼15% to ∼0.5%. Second, ∼70% of codes that needed generalization only generalized two or three codes in the largest anonymization.
CONCLUSIONS - Sharing large volumes of clinical data in support of phenome-wide association studies is possible while safeguarding privacy to the underlying individuals.
© The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: firstname.lastname@example.org.
The Vanderbilt University Institute for Imaging Science (VUIIS) Center for Computational Imaging (CCI) has developed a database built on XNAT housing over a quarter of a million scans. The database provides framework for (1) rapid prototyping, (2) large scale batch processing of images and (3) scalable project management. The system uses the web-based interfaces of XNAT and REDCap to allow for graphical interaction. A python middleware layer, the Distributed Automation for XNAT (DAX) package, distributes computation across the Vanderbilt Advanced Computing Center for Research and Education high performance computing center. All software are made available in open source for use in combining portable batch scripting (PBS) grids and XNAT servers.
Copyright © 2015 Elsevier Inc. All rights reserved.
Given the potential wealth of insights in personal data the big databases can provide, many organizations aim to share data while protecting privacy by sharing de-identified data, but are concerned because various demonstrations show such data can be re-identified. Yet these investigations focus on how attacks can be perpetrated, not the likelihood they will be realized. This paper introduces a game theoretic framework that enables a publisher to balance re-identification risk with the value of sharing data, leveraging a natural assumption that a recipient only attempts re-identification if its potential gains outweigh the costs. We apply the framework to a real case study, where the value of the data to the publisher is the actual grant funding dollar amounts from a national sponsor and the re-identification gain of the recipient is the fine paid to a regulator for violation of federal privacy rules. There are three notable findings: 1) it is possible to achieve zero risk, in that the recipient never gains from re-identification, while sharing almost as much data as the optimal solution that allows for a small amount of risk; 2) the zero-risk solution enables sharing much more data than a commonly invoked de-identification policy of the U.S. Health Insurance Portability and Accountability Act (HIPAA); and 3) a sensitivity analysis demonstrates these findings are robust to order-of-magnitude changes in player losses and gains. In combination, these findings provide support that such a framework can enable pragmatic policy decisions about de-identified data sharing.