Knowledge-Based Analysis of
Microarray Gene Expression Data by
Using Support Vector Machines
Michael P. S. Brown$^a$, William
Noble Grundy$^c,*$, David Lin$^a$, Nello
Cristianini$^d$, Charles Walsh Sugnet$^b$,
Terrence S. Furey $^a$, Manuel Ares Jr.$^b$,
and David Haussler $^a$
a. Department of Computer Science and
b. Center for Molecular Biology
of RNA, Department of Biology, University of California, Santa Cruz,
Santa Cruz, CA 95064
c. Department of Computer Science, Columbia
University, New York, NY 10025
d. Department of Engineering
Mathematics, University of Bristol, Bristol BS8 1TR, United Kingdom
*. To whom reprint requests should be addressed at: Department of
Computer Science, Columbia University, 450 Computer Science
Building, Mail Code 0401, 1214 Amsterdam Avenue, New York, NY
10027. E-mail:
bgrundy@cs.columbia.edu.
Proceedings of the National Academy of Sciences,
97(1):262-267 (2000)
Abstract
We introduce a method of functionally classifying genes by using gene
expression data from DNA microarray hybridization experiments. The
method is based on the theory of support vector machines (SVMs).
SVMs are considered a supervised computer learning method because
they exploit prior knowledge of gene function to identify unknown genes
of similar function from expression data. SVMs avoid several problems
associated with unsupervised clustering methods, such as hierarchical
clustering and self-organizing maps. SVMs have many mathematical
features that make them attractive for gene expression analysis,
including their flexibility in choosing a similarity function, sparseness of
solution when dealing with large data sets, the ability to handle large
feature spaces, and the ability to identify outliers. We test several SVMs
that use different similarity metrics, as well as some other supervised
learning methods, and find that the SVMs best identify sets of genes with
a common function using expression data. Finally, we use SVMs to
predict functional roles for uncharacterized yeast ORFs based on their
expression data.