Assessment of protein coding measures [Review]
Fickett JW, Tung CS
Nucleic Acids Research 20(24):6441-50 (Dec 25 1992)
Abstract
A number of methods for recognizing protein coding genes in DNA
sequence have been published over the last 13 years, and new, more
comprehensive algorithms, drawing on the repertoire of existing
techniques, continue to be developed. To optimize continued
development, it is valuable to systematically review and evaluate
published techniques. At the core of most gene recognition
algorithms is one or more coding measures--functions which produce,
given any sample window of sequence, a number or vector intended to
measure the degree to which a sample sequence resembles a window of
'typical' exonic DNA. In this paper we review and synthesize the
underlying coding measures from published algorithms. A standardized
benchmark is described, and each of the measures is evaluated
according to this benchmark. Our main conclusion is that a very
simple and obvious measure--counting oligomers--is more effective
than any of the more sophisticated measures. Different measures
contain different information. However there is a great deal of
redundancy in the current suite of measures. We show that in future
development of gene recognition algorithms, attention can probably
be limited to six of the twenty or so measures proposed to date.