Entropies of Biosequences: The Role of Repeats

Hanspeter Herzel, Werner Ebeling, Armin O. Schmitt

Physical Review E, 50(6), 5061--5071 (1994)

Abstract

DNA sequences of higher organisms contain thousands of nearly identical dispersed repetitive sequences. In order to understand the effect of such repeats on word entropies, we construct a model that can be analyzed analytically. The hypothetical model sequences consist of independent equidistributed symbols with randomly interspersed repeats. As a conculsion, we predict that the entropy of DNA sequences measuring the information content is much lower than suggested by earlier empirical studies.