, a bio/informatics shared resource is still "open for business" - Visit the CDS website
The publication data currently available has been vetted by Vanderbilt faculty, staff, administrators and trainees. The data itself is retrieved directly from NCBI's PubMed and is automatically updated on a weekly basis to ensure accuracy and completeness.
If you have any questions or comments, please contact us.
The sizes of the data matrices assembled to resolve branches of the tree of life have increased dramatically, motivating the development of programs for fast, yet accurate, inference. For example, several different fast programs have been developed in the very popular maximum likelihood framework, including RAxML/ExaML, PhyML, IQ-TREE, and FastTree. Although these programs are widely used, a systematic evaluation and comparison of their performance using empirical genome-scale data matrices has so far been lacking. To address this question, we evaluated these four programs on 19 empirical phylogenomic data sets with hundreds to thousands of genes and up to 200 taxa with respect to likelihood maximization, tree topology, and computational speed. For single-gene tree inference, we found that the more exhaustive and slower strategies (ten searches per alignment) outperformed faster strategies (one tree search per alignment) using RAxML, PhyML, or IQ-TREE. Interestingly, single-gene trees inferred by the three programs yielded comparable coalescent-based species tree estimations. For concatenation-based species tree inference, IQ-TREE consistently achieved the best-observed likelihoods for all data sets, and RAxML/ExaML was a close second. In contrast, PhyML often failed to complete concatenation-based analyses, whereas FastTree was the fastest but generated lower likelihood values and more dissimilar tree topologies in both types of analyses. Finally, data matrix properties, such as the number of taxa and the strength of phylogenetic signal, sometimes substantially influenced the programs' relative performance. Our results provide real-world gene and species tree phylogenetic inference benchmarks to inform the design and execution of large-scale phylogenomic data analyses.
© The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
The key idea of statistical hypothesis testing is to fix, and thereby control, the Type I error (false positive) rate across samples of any size. Multiple comparisons inflate the global (family-wise) Type I error rate and the traditional solution to maintaining control of the error rate is to increase the local (comparison-wise) Type II error (false negative) rates. However, in the analysis of human brain imaging data, the number of comparisons is so large that this solution breaks down: the local Type II error rate ends up being so large that scientifically meaningful analysis is precluded. Here we propose a novel solution to this problem: allow the Type I error rate to converge to zero along with the Type II error rate. It works because when the Type I error rate per comparison is very small, the accumulation (or global) Type I error rate is also small. This solution is achieved by employing the likelihood paradigm, which uses likelihood ratios to measure the strength of evidence on a voxel-by-voxel basis. In this paper, we provide theoretical and empirical justification for a likelihood approach to the analysis of human brain imaging data. In addition, we present extensive simulations that show the likelihood approach is viable, leading to "cleaner"-looking brain maps and operational superiority (lower average error rate). Finally, we include a case study on cognitive control related activation in the prefrontal cortex of the human brain.
Copyright © 2015 Elsevier Inc. All rights reserved.
The pulse oximeter is a critical monitor in anesthesia practice designed to improve patient safety. Here, we present an approach to improve the ability of anesthesiologists to monitor arterial oxygen saturation via pulse oximetry through an audiovisual training process. Fifteen residents' abilities to detect auditory changes in pulse oximetry were measured before and after perceptual training. Training resulted in a 9% (95% confidence interval, 4%-14%, P = 0.0004, t(166) = 3.60) increase in detection accuracy, and a 72-millisecond (95% confidence interval, 40-103 milliseconds, P < 0.0001, t(166) = -4.52) speeding of response times in attentionally demanding and noisy conditions that were designed to simulate an operating room. This study illustrates the benefits of multisensory training and sets the stage for further work to better define the role of perceptual training in clinical anesthesiology.
SMARCAL1, a DNA remodeling protein fundamental to genome integrity during replication, is the only gene associated with the developmental disorder Schimke immuno-osseous dysplasia (SIOD). SMARCAL1-deficient cells show collapsed replication forks, S-phase cell cycle arrest, increased chromosomal breaks, hypersensitivity to genotoxic agents, and chromosomal instability. The SMARCAL1 catalytic domain (SMARCAL1(CD)) is composed of an SNF2-type double-stranded DNA motor ATPase fused to a HARP domain of unknown function. The mechanisms by which SMARCAL1 and other DNA translocases repair replication forks are poorly understood, in part because of a lack of structural information on the domains outside of the common ATPase motor. In the present work, we determined the crystal structure of the SMARCAL1 HARP domain and examined its conformation and assembly in solution by small angle X-ray scattering. We report that this domain is conserved with the DNA mismatch and damage recognition domains of MutS/MSH and NER helicase XPB, respectively, as well as with the putative DNA specificity motif of the T4 phage fork regression protein UvsW. Loss of UvsW fork regression activity by deletion of this domain was rescued by its replacement with HARP, establishing the importance of this domain in UvsW and demonstrating a functional complementarity between these structurally homologous domains. Mutation of predicted DNA-binding residues in HARP dramatically reduced fork binding and regression activities of SMARCAL1(CD). Thus, this work has uncovered a conserved substrate recognition domain in DNA repair enzymes that couples ATP-hydrolysis to remodeling of a variety of DNA structures, and provides insight into this domain's role in replication fork stability and genome integrity.
OBJECTIVES - In the emergency department (ED), health care providers miss delirium approximately 75% of the time, because they do not routinely screen for this syndrome. The Confusion Assessment Method for the Intensive Care Unit (CAM-ICU) is a brief (<1 minute) delirium assessment that may be feasible for use in the ED. The study objective was to determine its validity and reliability in older ED patients.
METHODS - In this prospective observational cohort study, patients aged 65 years or older were enrolled at an academic, tertiary care ED from July 2009 to February 2012. An emergency physician (EP) and research assistants (RAs) performed the CAM-ICU. The reference standard for delirium was a comprehensive (~30 minutes) psychiatrist assessment using the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, Text Revision criteria. All assessments were blinded to each other and were conducted within 3 hours. Sensitivities, specificities, and likelihood ratios were calculated for both the EP and the RAs using the psychiatrist's assessment as the reference standard. Kappa values between the EP and RAs were also calculated to measure reliability.
RESULTS - Of 406 patients enrolled, 50 (12.3%) had delirium. The median age was 73.5 years old (interquartile range [IQR] = 69 to 80 years), 202 (49.8%) were female, and 57 (14.0%) were nonwhite. The CAM-ICU's sensitivities were 72.0% (95% confidence interval [CI] = 58.3% to 82.5%) and 68.0% (95% CI = 54.2% to 79.2%) in the EP and RAs, respectively. The CAM-ICU's specificity was 98.6% (95% CI = 96.8% to 99.4%) for both raters. The negative likelihood ratios (LR-) were 0.28 (95% CI = 0.18 to 0.44) and 0.32 (95% CI = 0.22 to 0.49) in the EP and RAs, respectively. The positive likelihood ratios (LR+) were 51.3 (95% CI = 21.1 to 124.5) and 48.4 (95% CI = 19.9 to 118.0), respectively. The kappa between the EP and RAs was 0.92 (95% CI = 0.85 to 0.98), indicating excellent interobserver reliability.
CONCLUSIONS - In older ED patients, the CAM-ICU is highly specific, and a positive test is nearly diagnostic for delirium when used by both the EP and RAs. However, the CAM-ICU's sensitivity was modest, and a negative test decreased the likelihood of delirium by a small amount. The consequences of a false-negative CAM-ICU are unknown and deserve further study.
© 2014 by the Society for Academic Emergency Medicine.
BACKGROUND - Sample size calculation is an important issue in the experimental design of biomedical research. For RNA-seq experiments, the sample size calculation method based on the Poisson model has been proposed; however, when there are biological replicates, RNA-seq data could exhibit variation significantly greater than the mean (i.e. over-dispersion). The Poisson model cannot appropriately model the over-dispersion, and in such cases, the negative binomial model has been used as a natural extension of the Poisson model. Because the field currently lacks a sample size calculation method based on the negative binomial model for assessing differential expression analysis of RNA-seq data, we propose a method to calculate the sample size.
RESULTS - We propose a sample size calculation method based on the exact test for assessing differential expression analysis of RNA-seq data.
CONCLUSIONS - The proposed sample size calculation method is straightforward and not computationally intensive. Simulation studies to evaluate the performance of the proposed sample size method are presented; the results indicate our method works well, with achievement of desired power.
De novo mutations affect risk for many diseases and disorders, especially those with early-onset. An example is autism spectrum disorders (ASD). Four recent whole-exome sequencing (WES) studies of ASD families revealed a handful of novel risk genes, based on independent de novo loss-of-function (LoF) mutations falling in the same gene, and found that de novo LoF mutations occurred at a twofold higher rate than expected by chance. However successful these studies were, they used only a small fraction of the data, excluding other types of de novo mutations and inherited rare variants. Moreover, such analyses cannot readily incorporate data from case-control studies. An important research challenge in gene discovery, therefore, is to develop statistical methods that accommodate a broader class of rare variation. We develop methods that can incorporate WES data regarding de novo mutations, inherited variants present, and variants identified within cases and controls. TADA, for Transmission And De novo Association, integrates these data by a gene-based likelihood model involving parameters for allele frequencies and gene-specific penetrances. Inference is based on a Hierarchical Bayes strategy that borrows information across all genes to infer parameters that would be difficult to estimate for individual genes. In addition to theoretical development we validated TADA using realistic simulations mimicking rare, large-effect mutations affecting risk for ASD and show it has dramatically better power than other common methods of analysis. Thus TADA's integration of various kinds of WES data can be a highly effective means of identifying novel risk genes. Indeed, application of TADA to WES data from subjects with ASD and their families, as well as from a study of ASD subjects and controls, revealed several novel and promising ASD candidate genes with strong statistical support.
PURPOSE - Multi-atlas segmentation has been shown to be highly robust and accurate across an extraordinary range of potential applications. However, it is limited to the segmentation of structures that are anatomically consistent across a large population of potential target subjects (i.e., multi-atlas segmentation is limited to "in-atlas" applications). Herein, the authors propose a technique to determine the likelihood that a multi-atlas segmentation estimate is representative of the problem at hand, and, therefore, identify anomalous regions that are not well represented within the atlases.
METHODS - The authors derive a technique to estimate the out-of-atlas (OOA) likelihood for every voxel in the target image. These estimated likelihoods can be used to determine and localize the probability of an abnormality being present on the target image.
RESULTS - Using a collection of manually labeled whole-brain datasets, the authors demonstrate the efficacy of the proposed framework on two distinct applications. First, the authors demonstrate the ability to accurately and robustly detect malignant gliomas in the human brain-an aggressive class of central nervous system neoplasms. Second, the authors demonstrate how this OOA likelihood estimation process can be used within a quality control context for diffusion tensor imaging datasets to detect large-scale imaging artifacts (e.g., aliasing and image shading).
CONCLUSIONS - The proposed OOA likelihood estimation framework shows great promise for robust and rapid identification of brain abnormalities and imaging artifacts using only weak dependencies on anomaly morphometry and appearance. The authors envision that this approach would allow for application-specific algorithms to focus directly on regions of high OOA likelihood, which would (1) reduce the need for human intervention, and (2) reduce the propensity for false positives. Using the dual perspective, this technique would allow for algorithms to focus on regions of normal anatomy to ascertain image quality and adapt to image appearance characteristics.
The analysis of longitudinal trajectories usually focuses on evaluation of explanatory factors that are either associated with rates of change, or with overall mean levels of a continuous outcome variable. In this article, we introduce valid design and analysis methods that permit outcome dependent sampling of longitudinal data for scenarios where all outcome data currently exist, but a targeted substudy is being planned in order to collect additional key exposure information on a limited number of subjects. We propose a stratified sampling based on specific summaries of individual longitudinal trajectories, and we detail an ascertainment corrected maximum likelihood approach for estimation using the resulting biased sample of subjects. In addition, we demonstrate that the efficiency of an outcome-based sampling design relative to use of a simple random sample depends highly on the choice of outcome summary statistic used to direct sampling, and we show a natural link between the goals of the longitudinal regression model and corresponding desirable designs. Using data from the Childhood Asthma Management Program, where genetic information required retrospective ascertainment, we study a range of designs that examine lung function profiles over 4 years of follow-up for children classified according to their genotype for the IL 13 cytokine.
© 2013, The International Biometric Society.
Accurate and precise displacement estimation has been a hallmark of clinical ultrasound. Displacement estimation accuracy has largely been considered to be limited by the Cramer-Rao lower bound (CRLB). However, the CRLB only describes the minimum variance obtainable from unbiased estimators. Unbiased estimators are generally implemented using Bayes' theorem, which requires a likelihood function. The classic likelihood function for the displacement estimation problem is not discriminative and is difficult to implement for clinically relevant ultrasound with diffuse scattering. Because the classic likelihood function is not effective, a perturbation is proposed. The proposed likelihood function was evaluated and compared against the classic likelihood function by converting both to posterior probability density functions (PDFs) using a noninformative prior. Example results are reported for bulk motion simulations using a 6λ tracking kernel and 30 dB SNR for 1000 data realizations. The canonical likelihood function assigned the true displacement a mean probability of only 0.070 ± 0.020, whereas the new likelihood function assigned the true displacement a much higher probability of 0.22 ± 0.16. The new likelihood function shows improvements at least for bulk motion, acoustic radiation force induced motion, and compressive motion, and at least for SNRs greater than 10 dB and kernel lengths between 1.5 and 12λ.