Detecting Periodic Patterns in Biological Sequences

Eivind Coward1,3 and Finn Drabløs2

1Department of Mathematical Sciences, Norwegian University of Science and Technology and
2SINTEF UNIMED MR-Centre, N-7034 Trondheim, Norway

Bioinformatics , 14(6):498-507 (July 1998)

Abstract

Motivation: The search for repeated patterns in DNA and protein sequences is important in sequence analysis. The rapid increase in available sequences, in particular from large-scale genome sequencing projects, makes it relevant to develop sensitive automatic methods for the identification of repeats.

Results: A new method for finding periodic patterns in biological sequences is presented. The method is based on evolutionary distance and `phase shifts' corresponding to insertions and deletions. A given sequence is aligned to itself in a certain sense, trying to minimize a distance to periodicity. Relationships between different such periodicity measures are discussed. An iterative algorithm is used, and the running time is nearly proportional to the sequence length. The alignment produces a periodic consensus pattern. A `phase score' is used to indicate a statistical significance of the periodicity. Three examples using both DNA and protein sequences illustrate how the method can be used to find patterns.

Availability: On request from the authors. Contact: eivindc@math.ntnu.no; finn.drablos@unimed.sintef.no