Sequence context analysis of 8.2 million single nucleotide polymorphisms in the human genome.

Zhao Z, Zhang F
Gene. 2006 366 (2): 316-24

PMID: 16314054 · DOI:10.1016/j.gene.2005.08.024

We analyzed n-mers (n=3-8) in the local environment of 8,249,446 human SNPs and compared their distribution with that in the genome reference sequences. The results revealed that the short sequences, which contained at least one CpG dinucleotide, occurred more frequently in the local SNP sequences than in the genome sequences. To exclude the hypermutability effect of the methylated CpG dinucleotides on the sequence context of SNPs, we examined the distribution patterns for each of the six categories of substitution. We observed the similar pattern (i.e., CpG-containing n-mers vs. non-CpG-containing n-mers) in SNP categories A/G, C/T and C/G but the opposite pattern in category A/T. We next identified 34,928 putative CpG islands in the human genome and located 133,591 SNPs within these islands. In the CpG islands, CpG SNPs were 3.92-fold less prevalent relative to the presence of CpG dinucleotides. Conversely, in the human genome, the frequency of CpG dinucleotides at the polymorphic sites was 6.09 times that in the genome reference sequences. These results support the previous views of mutational suppression at the CpG sites in the CpG islands and hypermutability of the methylated CpG dinucleotides that are prevalent in the non-CpG island sequences in the human genome. Our study represents a comprehensive investigation of the sequence context of SNPs in the human genome and in human CpG islands.

MeSH Terms (8)

Base Composition CpG Islands Dinucleotide Repeats DNA Methylation Genome, Human Humans Polymorphism, Single Nucleotide Sequence Analysis, DNA

Connections (1)

This publication is referenced by other Labnodes entities: