Understanding Long-Range Correlations in DNA Sequences
W. Li, Thomas Marr, Kunihiko Kaneko
Physica D , 75:392-416 (1994); erratum: 82:217 (1995)
Abstract
In this paper, we review the literature on statistical
long-range correlation in DNA sequences.
We examine the current evidence for these
correlations, and conclude that a mixture of many
length scales (including some relatively long ones) in DNA sequences
is responsible for the observed 1/f-like spectral component.
We note the complexity of the correlation structure in
DNA sequences. The observed complexity often makes it hard, or
impossible, to decompose the sequence into a few statistically
stationary regions. We suggest that, based on the complexity
of DNA sequences, a fruitful approach to understand
long-range correlation is to model duplication, and other
rearrangement processes, in DNA sequences. One model, called
"expansion-modification system", contains only point
duplication and point mutation. Though simplistic, this model is
able to generate sequences with 1/f spectra. We emphasize the importance
of DNA duplication in its contribution to the observed long-range
correlation in DNA sequences.