Heuristic Informational Analysis of Sequences
J. M. Claverie and L. Bougueleret
Nucleic Acids Research, 14(1):179-196 (Jan 10, 1986)
Abstract
Nucleotide or amino-acid sequences are interpreted as successions of
words of length k (k-tuples) the frequencies of which are highly
variable in different statistical populations of genes or proteins.
After building k-tuple reference tables from coherent subsets or
entire data banks, the local information content profile of
individual sequences is drawn. Anomalous regions (peaks or
depressions) of such a profile can lead to the discovery and
identification of specific sequence patterns. Along the same
principle, the simultaneous use of two reference statistical
populations and the computation of an index combining the two
information profiles lead to a general and powerful discriminant
analysis methods. The identification of a "signal" associated with
gene conversion, the introns/exons discrimination and the location
of function specific patterns in proteins are given as examples of
successful applications of this heuristic informational approach.