GeneMark.hmm: New Solutions for Gene Finding
lexander V. Lukashin and Mark Borodovsky1
School of Biology and
1Schools of Biology and Mathematics, Georgia Institute of Technology,
Atlanta, GA 30332-0230, USA
Nucleic Acids Research , 26(4):1107-1115 (Feb 15, 1998)
Abstract
The number of completely sequenced bacterial genomes
has been growing fast. There are computer methods
available for finding genes but yet there is a need for more
accurate algorithms. The GeneMark.hmm algorithm
presented here was designed to improve the gene
prediction quality in terms of finding exact gene boundaries.
The idea was to embed the GeneMark models into naturally
derived hidden Markov model framework with gene
boundaries modeled as transitions between hidden states.
We also used the specially derived ribosome binding site
pattern to refine predictions of translation initiation codons.
The algorithm was evaluated on several test sets including
10 complete bacterial genomes. It was shown that the new
algorithm is significantly more accurate than GeneMark in
exact gene prediction. Interestingly, the high gene finding
accuracy was observed even in the case when Markov
models of order zero, one and two were used. We present
the analysis of false positive and false negative predictions
with the caution that these categories are not precisely
defined if the public database annotation is used as a
control.