Deep brain stimulation, as a primary surgical treatment for various neurological disorders, involves implanting electrodes to stimulate target nuclei within millimeter accuracy. Accurate pre-operative target selection is challenging due to the poor contrast in its surrounding region in MR images. In this paper, we present a learning-based method to automatically and rapidly localize the target using multi-modal images. A learning-based technique is applied first to spatially normalize the images in a common coordinate space. Given a point in this space, we extract a heterogeneous set of features that capture spatial and intensity contextual patterns at different scales in each image modality. Regression forests are used to learn a displacement vector of this point to the target. The target is predicted as a weighted aggregation of votes from various test samples, leading to a robust and accurate solution. We conduct five-fold cross validation using 100 subjects and compare our method to three indirect targeting methods, a state-of-the-art statistical atlas-based approach, and two variations of our method that use only a single modality image. With an overall error of 2.63±1.37mm, our method improves upon the single modality-based variations and statistically significantly outperforms the indirect targeting ones. Our technique matches state-of-the-art registration methods but operates on completely different principles. Both techniques can be used in tandem in processing pipelines operating on large databases or in the clinical flow for automated error detection.