Markov Chain Analysis Finds a Significant Influence of
Neighboring Bases on the Occurrence of a Base in Eucaryotic
Nuclear DNA Sequences Both Protein-coding and Noncoding
B.E. Blaisdell
Journal of Molecular Evolution, 21(3):278-288 (1984-85)
Abstract
Sixty-four eucaryotic nuclear DNA sequences, half of them
coding and half noncoding, have been examined as expressions of
first-, second-, or third-order Markov chains. Standard
statistical tests found that most of the sequences required at least
second-order Markov chains for their representation, and some
required chains of third order. For all 64 sequences the observed
one-step second-order transition count matrices were effective in
predicting the two-step transition count matrices, and 56 of 64
were effective in predicting the three-step transition count
matrices. The departure from random expectation of the observed
first- and second-order transition count matrices meant that a
considerable sample of eucaryotic nuclear DNA sequences, both
protein coding and noncoding, have significant local structure
over subsequences of three to five contiguous bases, and that this
structure occurs throughout the total length of the sequence.
These results suggested that present DNA sequences may have
arisen from the duplication, concatenation, and gradual
modification of very early short sequences.