Computational Gene Identification:
an Open Problem
Roderic Guigo
Computers & Chemistry , 21(4),
215-222 (1997).
Abstract
As the Human Genome Project enters the large-scale
sequencing phase, computational gene identification
methods are becoming essential for the automatic
analysis and annotation of large uncharacterized
genomic sequences. Currently available computer
programs relying mainly on sequences coding statistics
are of great use in pin-pointing regions in genomic
sequences containing exons. Such programs perform
rather poorly, however, when the problem is to
fully elucidate gene structure. For this problem,
the DNA sequence signals involved in the specification
of the genes - start sites and splices sites -
carry a lot of information, and simple methods relying on
such information can predict gene structure with
an accuracy to some extent comparable to that of
other more sophisticated computational methods.