Measuring Correlations in Symbol Sequences

Hanspeter Herzel, Ivo Große

Physica A 216, 518--542 (1995)

Abstract

The paper is devoted to relations between correlation functions and mutual information. It is shown that, in sequences over an alphabet of lambda symbols, statistical dependences are measured by (lambda -1)2 independent parameters. However, not all of them can be determined by autocorrelation functions. Appropriate sets of correlation functions (including crosscorrelations) are introduced, which allow the detection of all dependencies. The results are exemplified for binary, ternary, and quaternary symbol sequences. As an application, it is discussed that a nonuniform codon usage in protein-coding DNA sequences introduced periodic correlations even at distances in the order of 1000 base pairs.