Entropies and Lexicographic Analysis of Biosequences

H. Herzel, W. Ebeling, A.O. Schmitt, M.A. Jimenez-Montano

Chapter 2 in From Simplicty to Complexity in Chemistry and Beyond , eds. A, Muller, A. Dress, F. Vogtle (Vieweg-Verlag, Braunschweig, 1995)

Abstract

This paper is devoted to the statistical and linguistic analysis of biosequences. Information-theoretical tools (Renyi entropies, mutual information) are introduced and applied to selected DNA sequences (yeast chromosome III, Epstein-Barr virus genome). Moreover, several techniques for the detection of long-range correlations are reviewed, and possible sources of such correlations are discussed. Finally, we study grammar representations of sequences and exemplify this approach by studying a fragment of mouse DNA.