BACKGROUND - Dynamic activation and inactivation of gene regulatory DNA produce the expression changes that drive the differentiation of cellular lineages. Identifying regulatory regions active during developmental transitions is necessary to understand how the genome specifies complex developmental programs and how these processes are disrupted in disease. Gene regulatory dynamics are mediated by many factors, including the binding of transcription factors (TFs) and the methylation and acetylation of DNA and histones. Genome-wide maps of TF binding and DNA and histone modifications have been generated for many cellular contexts; however, given the diversity and complexity of animal development, these data cover only a small fraction of the cellular and developmental contexts of interest. Thus, there is a need for methods that use existing epigenetic and functional genomics data to analyze the thousands of contexts that remain uncharacterized.
RESULTS - To investigate the utility of histone modification data in the analysis of cellular contexts without such data, I evaluated how well genome-wide H3K27ac and H3K4me1 data collected in different developmental stages, tissues, and species were able to predict experimentally validated heart enhancers active at embryonic day 11.5 (E11.5) in mouse. Using a machine-learning approach to integrate the data from different contexts, I found that E11.5 heart enhancers can often be predicted accurately from data from other contexts, and I quantified the contribution of each data source to the predictions. The utility of each dataset correlated with nearness in developmental time and tissue to the target context: data from late developmental stages and adult heart tissues were most informative for predicting E11.5 enhancers, while marks from stem cells and early developmental stages were less informative. Predictions based on data collected in non-heart tissues and in human hearts were better than random, but worse than using data from mouse hearts.
CONCLUSIONS - The ability of these algorithms to accurately predict developmental enhancers based on data from related, but distinct, cellular contexts suggests that combining computational models with epigenetic data sampled from relevant contexts may be sufficient to enable functional characterization of many cellular contexts of interest.