BACKGROUND - Non-coding gene regulatory enhancers are essential to transcription in mammalian cells. As a result, a large variety of experimental and computational strategies have been developed to identify cis-regulatory enhancer sequences. Given the differences in the biological signals assayed, some variation in the enhancers identified by different methods is expected; however, the concordance of enhancers identified by different methods has not been comprehensively evaluated. This is critically needed, since in practice, most studies consider enhancers identified by only a single method. Here, we compare enhancer sets from eleven representative strategies in four biological contexts.
RESULTS - All sets we evaluated overlap significantly more than expected by chance; however, there is significant dissimilarity in their genomic, evolutionary, and functional characteristics, both at the element and base-pair level, within each context. The disagreement is sufficient to influence interpretation of candidate SNPs from GWAS studies, and to lead to disparate conclusions about enhancer and disease mechanisms. Most regions identified as enhancers are supported by only one method, and we find limited evidence that regions identified by multiple methods are better candidates than those identified by a single method. As a result, we cannot recommend the use of any single enhancer identification strategy in all settings.
CONCLUSIONS - Our results highlight the inherent complexity of enhancer biology and identify an important challenge to mapping the genetic architecture of complex disease. Greater appreciation of how the diverse enhancer identification strategies in use today relate to the dynamic activity of gene regulatory regions is needed to enable robust and reproducible results.