Hamming-Clustering method for signals prediction in 5' and 3'
regions of eukaryotic genes
L. Milanesi, M. Muselli1 and P. Arrigo1,
CNR, Istituto di Tecnologie Biomediche Avanzate,
Via Ampere 56, 20131 Milan, Italy and
1Istituto per i Circuiti Elettronici, via
De Marini 6, 16149 Genoa, Italy
Computer Applications in Bioscience, 12(5), 399-404 (Oct 1996)
Abstract
Motivation.
Gene expression is regulated by different kinds of short nucleotide
domains. These features can either activate or terminate the
transcription process. To predict the signal sites in the 5' and 3'
gene regions we applied the Hamming-Clustering network (HC) to the
TATA box, to the transcription initiation site and to the poly(A)
signal determination in DNA sequences. This approach employs a
technique deriving from the synthesis of digital networks in order to
generate prototypes, or rules, which can be directly analysed or
used for the construction of a final neural network.
Results. More than 1000 poly-A signals have been extracted from
EMBL database rel. 42 and used to build the training and the test set.
A full set of the eukaryotic genes (1252 entry) from the Eukaryotic
Promoter Database (EPD rel. 42) have been used for the TATA-box
signal and transcription initiation site training. A set of eukaryotic
plant genes have been used to test the validity of the Hamming-Clustering network
approach. The results show the applicability of the Hamming-Clustering method to
functional signal prediction.
Availability. The program implementing the algorithm has been
written in C language and it is available on the Web server
(
(http://www.itba.mi.cnr.it/webgene).
Contact. Email address:
milanesi@itba.mi.cnr.it