Gene Structure Prediction Using Information on Homologous Protein Sequence
I.B. Rogozin, L. Milanesi and N.A. Kolchanov
Computer Applications in the Biosciences 12(3),161-170 (June, 1996)
Abstract
In this paper a new approach for the prediction of protein coding gene structures is
described. The principal scheme of prediction is as follows: first, the exons with
the best potential are predicted in a sequence with unknown functions and a list of
potential amino acid fragments coded by these exons is formed. Second, testing the
homology between each amino acid fragment from the list and proteins from the SWISS-PROT
database of amino acid sequences. One protein with the best homology is chosen out
of all the homologous sequences. Third, reconstruction of the exon-intron structure,
basing it on this homology with the chosen protein sequences. The method was tested
on an independent control set (20 genes). The results were as follows: 21% of real
exons were lost and 3% of non-real exons were found. This system can be used to refine
the results of gene predictions systems, especially if highly homologous proteins are
found in the amino acid database.