Statistical Analysis and Prediction of the Exonic Structure of Human Genes
M.S. Gelfand
Journal of Molecular Evolution, 35(3):239-252 (Sept 1992)
Abstract
Nonhomologous fully sequenced human protein-coding genes were
studied. Three sets of exon-exon junctions were formed defined by
the intron (shadow) position relative to the reading frame. For the
analysis of intron shadow signals in exons, information content and
discrimination energy approaches were used with the correction
allowing one to ignore the influence of a protein-coding message.
The corrected formulas allow one to define the consensuses for the
three types of intron shadow signals as a AG/guwn, cAG/GUnn, and
cAG/gunU, and provide better recognition than the original formulas.
The analysis of the codon usage in the signal positions leads to the
conclusion that the prevalence of some amino acids in corresponding
protein sites is caused by the signal requirements and not vice
versa. The distribution of potential intron shadow signals in exons
contradicts the hypothesis of intron insertion into suitable
preexisting sites. There exists a correlation between the intron
types and/or the exon length modulo 3.