Principal Component Analysis and Large-Scale Correlations in Non-Coding Sequences of Human DNA

Michael Teitelman and Frank H. Eeckman

Journal of Computational Biology, 3(4), 573-576 (1996).

Abstract

We have calculated a full set of second-order correlation functions of nucleotides in non-coding DNA. They are found to be independently invariant in regard to permutations of A and T, and also C and G. Considering correlation functions as a 4x4 matrix with a symmetrical basis, we have found the principal components - objects with zero cross-correlations. These three principal components are present the base compositions: (A+T-C-G), (A-T), (C-G). The long range behavior of these principal components yield power-law dependencies with different critical exponents.