A Relationship Between GC Content
and Coding-Sequence Length
Jose L. Oliver, Antonio Marin
Journal of Molecular Evolution 43:216-223 (1996)
Abstract
Since base composition of translational stop codons
(TAG, TAA, and TGA) is biased toward a low G+C
content, a differential density for these termination
signals is expected in random DNA sequences of
different base compositions. The expected length of
reading frames (DNA segments of sense codons flanked
by in-phase stop codons) in random sequences is thus
a function of GC content. The analysis of DNA
sequences from several genome databases stratified
according to GC content reveals that the longest
coding sequences - exons in vertebrates and genes in
prokaryotes - are GC-rich, while the shortest ones
are GC-poor. Exon lengthening in GC-rich vertebrate
regions does not result, however, in longer
vertebrate proteins, perhaps because of the lower
number of exons in the genes located in these
regions. The effects on coding-sequence lengths
constitute a new evolutionary meaning for
compositional variations in DNA GC content.