A Statistical Model for Locating Regulatory Regions in Genomic DNA.
E.M. Crowley, K. Roeder, M. Bina
Department of Statistics, Carnegie Mellon University,
Pittsburgh, PA 15213-3890, USA.
Journal of Molecular Biology
268(1): 8-14 (April 25, 1997)
Abstract
In addition to genes, chromosomal DNA contains sequences
that serve as signals for turning on and off
gene expression. These signals are thought to be distributed
as clusters in the regulatory regions of genes. We develop
a Bayesian model that views locating regulatory regions in
genomic DNA as a change-point problem, with the beginning of
regulatory and non-regulatory regions corresponding to the
change points. The model is based on a hidden Markov chain.
The data consist of nucleotide positions of protein-binding
elements in a genomic DNA sequence. These positions are identified using
a reference catalogue containing elements that interact with
transcription factors implicated in controlling the expression
of protein-encoding genes. Among the protein-binding elements in a
genomic DNA sequence, the statistical model automatically selects
those that tend to predict regulatory regions. We test the
model using viral sequences that include known regulatory
regions and provide the results obtained for human genomic
DNA corresponding to the beta globin locus on chromosome 11.