Distribution of Base Pair Repeats in Coding and Noncoding DNA Sequences

Nikolay V. Dokholyan, Sergey V. Buldyrev, Shlomo Havlin, and H. Eugene Stanley

Physical Review Letters 79(25), 5182-5185 (1997)

Abstract

We analyze the histograms for the lengths of the 16 possible distinct repeats of identical dimers, known as dimeric tandem repeats, in DNA sequences. We find that these distribution functions have different functional forms for coding and noncoding DNA. For coding regions, the probability of finding a repetitive sequence of l copies of a particular dimer decreases exponentially as l increases. For the noncoding regions, the distribution functions for most of the 16 dimers have long tails and can be approximated by power law functions, while for coding DNA, they can be well fit by a first-order Markov process. We propose a model, based on known biophysical processes, which leads to the observed probability distribution functions for noncoding DNA. We argue that this difference in the shape of the distribution functions between coding and noncoding DNA arises from the fact that noncoding DNA is more tolerant to evolutionary mutational alterations than coding DNA.