1
Baskin Center for Computer,
Engineering and Information Sciences,
University of California,
Santa Cruz, CA 95064,
{dkulp,haussler}@cse.ucsc.edu
2
Lawrence Berkeley Laboratory,
Genome Informatics Group,
1 Cyclotron Road,
Berkeley, CA, 94720,
{martinr,eeckman}@genome.lbl.gov
Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology , edited by David J. States, Pankaj Agarwal, Terry Gaasterland, Lawrence Hunter, & Randall F. Smith (AAAI Press, 1996), pages 134-142.
The description and results of an implementation of such a gene-finding model, called Genie, is presented. The exon sensor is a codon frequency model conditioned on windowed nucleotide frequency and the preceding codon. Two neural networks are used, as in (Brunak et al., 1991), for splice site prediction.
We show that this simple model performs quite well. For a cross-validated standard test set of 304 genes [ ftp://www-hgc.lbl.gov/pub/genesets] in human DNA, our gene-finding system identified up to 85% of protein-coding bases correctly with a specificity of 80%. 58% of exons were exactly identified with a specificity of 51%. Genie is shown to perform favorably compared with several other gene-finding systems.