Computational methods for the identification of
genes in vertebrate genomic sequences
Jean-Michel Claverie
Structural and Genetic Information Laboratory, CNRS-EP.91, 31 chemin
Joseph-Aiguier, 13402 Marseille cedex 20, France
Human Molecular Genetics
6(10), 1735-1744 (1997) (review issue)
Abstract
Research into new methods to identify genes in anonymous genomic sequences has
been going on for more than 15 years. Over this period of time, the field has evolved
from the designing of programs to identify protein coding regions in compact
mitochondrial or bacterial genomes, to the challenge of predicting the detailed
organization of multi-exon vertebrate genes. The best program currently available
perfectly locates more than 80% of the internal coding exons, and only 5% of the
predictions do not overlap a real exon. Given such accuracy, computational methods
are indeed very useful; however, they do not alleviate the need for experimental
validation. If the performances are satisfactory for the identification of the coding
moiety of genes (internal coding exons), the determination of the full extent of the
transcript (5[prime] and 3[prime] extremities of the gene) and the location of
promoter regions are still unreliable. As the human and mouse genome sequencing
projects enter a production mode, the fully automated annotation of megabase-long
anonymous genomic sequences is the next big challenge in bioinformatics.