BACKGROUND - Enhancers are DNA regulatory elements that influence gene expression. There is substantial diversity in enhancers' activity patterns: some enhancers drive expression in a single cellular context, while others are active across many. Sequence characteristics, such as transcription factor (TF) binding motifs, influence the activity patterns of regulatory sequences; however, the regulatory logic through which specific sequences drive enhancer activity patterns is poorly understood. Recent analysis of Drosophila enhancers suggested that short dinucleotide repeat motifs (DRMs) are general enhancer sequence features that drive broad regulatory activity. However, it is not known whether the regulatory role of DRMs is conserved across species.
RESULTS - We performed a comprehensive analysis of the relationship between short DNA sequence patterns, including DRMs, and human enhancer activity in 38,538 enhancers across 411 different contexts. In a machine-learning framework, the occurrence patterns of short sequence motifs accurately predicted broadly active human enhancers. However, DRMs alone were weakly predictive of broad enhancer activity in humans and showed different enrichment patterns than in Drosophila. In general, GC-rich sequence motifs were significantly associated with broad enhancer activity, and consistent with this enrichment, broadly active human TFs recognize GC-rich motifs.
CONCLUSIONS - Our results reveal the importance of specific sequence motifs in broadly active human enhancers, demonstrate the lack of evolutionary conservation of the role of DRMs, and provide a computational framework for investigating the logic of enhancer sequences.