Base Compositional Structure of Genomes
J.W. Fickett, D.C. Torney, and D.R. Wolf,
Genomics, 13, 1056--1064 (1992)
Abstract
We model the base compositional structure of the human and Escherichia coli genomes. Three particular
properties are first quantified: (1) There is a significant tendency for any region of either genome to have a
strand-symmetric base composition. (2) The variation in base composition from region to region, within
each genome, is very much larger than expected from common homogeneous stochastic models. (3) A
given local base composition tends to persist over a scale of at least kilobases (E. coli) or tens of kilobases
(human). Multidomain stochastic models from the literature are reviewed and sharpened. In particular,
quantitative measurements of the third property lead us to suggest a significant shift in the style of domain
models, in which the variation of A+T content with position is modeled by a random walk with frequent
small steps rather than with large quantum jumps. As an application, we suggest a way to reduce the
amount of computation in the assembly of large sequences from sequences of randomly chosen fragments.