Method to Determine the Reading Frame of a Protein from the
Purine/Pyrimidine Genome Sequence and Its Possible Evolutionary
Justification
J.C. Shepherd
Proceedings of National Academy of Sciences ,
78(3):1596-1600 (Mar 1981)
Abstract
The periodic variations obtained by correlating the relative
positions of purines and pyrimidines (and of the four bases
thymine, cytosine, adenine, and guanine) in a wide variety of
genomes of wholly or partly known sequence suggest that there
may be enough of an earlier comma-free coding system (i.e., only
readable in one frame) still present to permit determination of the
reading frame and approximate extent of the present protein
coding stretches. The characteristics of these variations support
the hypothesis that these primitive messages were formed of
coding triplets having the form RNY (R = purine; Y =
pyrimidine; and N = purine or pyrimidine). The base sequences
and reading frames that have a minimal deviation from such a
message are still good predictors of actual coding regions and
reading frames in spite of the many mutations that have occurred
since such a genetic code was last in use. In fact, the right frame
for almost all the proteins in a number of viruses and various
prokaryotes and eukaryotes is deduced purely from
purine/pyrimidine information and not by using the normal start
and stop signals.