Origin of Eukaryotic Introns: a Hypothesis, Based on Codon
Distribution Statistics in Genes, and Its Implications
P. Senapathy
Proceedings of the National Academy of Sciences ,
83(7):2133-2137 (Apr, 1986)
Abstract
A hypothesis for the origin of introns in eukaryotic genes is
developed. By computer simulation it was found that the
reading-frame lengths in a random nucleotide sequence are
distributed in a negative exponential manner and that there exists
an upper limit of about 200 codons in the length of the reading
frames (RFs). These characteristics suggest that, if primordial DNA
contained a random nucleotide sequence, the most primitive cells
would have been under selective pressure to eliminate interfering
stop codons in order to increase the length of RFs. Further, they
indicate that the only possible way that a coding sequence that is
considerably longer than 600 nucleotides could be derived from the
short coding sequences occurring in a random sequence would be to
splice the short coding sequences and to eliminate the stretches of
sequences containing clusters of inframe stop codons. Thus, introns
are suggested to be those stretches of sequences containing
interfering stop codons that were originally earmarked in the first
primitive cells to be eliminated in order to enable the coding for
long polypeptides. Because the statistical characteristics of codon
distributions in today's eukaryotic DNA sequences resemble closely
those of a random sequence and because the upper limit in the length
of RFs (200 codons) in a random sequence corresponds precisely to
the observed maximum length of exons in today's eukaryotic genes
(600 nucleotides), it is suggested that introns originated in the
most primitive unicellular eukaryotes when they evolved from
primordial sequences. The data from the prokaryotic gene sequences
indicate that prokaryotic genes may have been derived originally
from primitive unicellular eukaryotic genes by losing introns from
them.