Inferring Genes from Open Reading Frames
J.W. Fickett
Computers & Chemistry, 18(3),203-5 (Sept 1994)
Abstract
One expects that in DNA without protein coding function, stop codons
(which constitute three of the 64 possible codons) should occur
frequently in all reading frames, and that a long open reading frame
(ORF) can be interpreted as a sign for the existence of a gene. We
make a beginning on introducing quantitative measures of confidence
into this inference--taking Saccharomyces cerevisiae as a sample
case--and show that some common assumptions can reasonably be
questioned. In particular we show that statistical support for the
biological function of shorter ORFs listed as putative genes in
recent papers is in fact very weak. This is an issue of practical as
well as theoretical interest, since researching the function of a
putative gene is difficult and expensive.