Gene Structure Prediction by Linguistic Methods
S. Dong and D.B. Searls
Genomics, 23, 540-551 (1994)
Abstract
The higher-order structure of genes and other features of biological
sequences can be described by means of formal grammars. These
grammars can then be used by general-purpose parsers to detect and
assemble such structures by means of syntactic pattern
recognition. We describe a grammar and parser for eukaryotic
protein-encoding genes, which by some measures is as effective as
current connectionist and combinatorial algorithms in predicting gene
structures for sequence database entries. Parameters on the grammar
rules are optimized for several different species, and mixing
experiments performed to determine the degree of species specificity
and the relative importance of compositional, signal-based, and
syntactic components in gene prediction.