Finding Borders between Coding and Noncoding DNA Regions by an Entropic
Segmentation Method
Pedro Bernaola-Galván1,2,
Ivo Grosse1,
Pedro Carpena2,3,
José L. Oliver4,
Ramón Román-Roldán5,
and H. Eugene Stanley1
1
Center for Polymer Studies and Department of Physics, Boston University,
Boston, Massachusetts 02215
2 Departamento de Física Aplicada II, Universidad de Málaga,
E-29071, Spain
3 Theoretical Physics, Oxford University, 1 Keble Road, OX1 3NP Oxford,
England
4 Departamento de Genética e Instituto de Biotecnología,
Universidad de Granada, E-18071, Spain
5
Departamento de Física Aplicada, Universidad de Granada, E-18071, Spain
Physical Review Letters,
85 (6): 1342-1345 (August 7 2000)
Abstract
We present a new computational approach to finding borders between coding
and noncoding DNA. This approach has two features: (i) DNA sequences are
described by a 12-letter alphabet that captures the differential base
composition at each codon position, and (ii) the search for the borders is
carried out by means of an entropic segmentation method which uses only the
general statistical properties of coding DNA. We find that this method is
highly accurate in finding borders between coding and noncoding regions and
requires no "prior training" on known data sets. Our results appear to be more
accurate than those obtained with moving windows in the discrimination of
coding from noncoding DNA.