1
Freie Universität Berlin, Abteilung Molekularbiologie und
Bioinformatik, Institut für Molekularbiologie und Biochemie,
Arnimallee 22, 14195 Berlin, Germany and
2
Department of Mathematics, Stanford University, Stanford, CA 94305, USA
Bioinformatics, 14(3), 232-243 (April 1998)
Results: A database of 46 non-redundant genomic sequences from maize is used for illustration. It is shown that the correct gene structures do not always maximize the considered target function. However, in most cases, the correct or nearly correct structures are found in a small set of high-scoring structures. A critical review of the generated structures sometimes allows the choices to be narrowed by considering additional variables such as predicted splice site strength or local optimality of splice site scores. Summary statistics for prediction accuracy over all 46 maize genes are derived under cross-validation and non-cross-validation training conditions for the Markov sequence models. The algorithm achieved exon sensitivity of 0.81 and specificity of 0.75 on an independent set of 14 novel maize genomic segments.
Availability: GeneGenerator runs under Borland-Pascal 7.0 using MS-DOS and C on UNIX work stations. The source code is available upon request.
Contact: jkleffe@euler.grumed.fu-berlin-de