|
|
BIBLIOGRAPHY Features, Patterns, Correlations in DNA and Protein Texts
© Copyright, 2002-2006,
Wentian Li
of Feinstein Institute for Medical Research.
(1995-1996 Columbia University, 1996-2001 Rockefeller University).
Leave a message at my guestbook .
Last updated: June 11, 2008 (~1690 papers, excluding preprints)
|
|||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
front page |
preprints |
2008 |
2007 |
2006 |
2005 |
2004 |
2003 |
2002 |
2001 |
2000 | 1999 | 1998 | 1997 | 1996 | 1995 | 1994 | 1993 | 1992 | 1991 | 1990 | 80s | 70s | 60s | 50s [about] [links] [complete genomes] [gene duplication] [dna/protein music] [gene recognition] [mirror] |
||||||||||||
|
Number of visists:
DNA sequences carry biological information,
DNA moleculars (double-stranded helix) obey
physical laws, DNA genomes are products
of generations of evolution, and DNA texts
can be read as a string of symbols. All these
different aspects of DNA sequences (and to some
extends, protein sequences) make their
analysis a multi-disciplinary topic, which
touches biochemistry, biophysics, evolutionary
biology, as well as mathematics, statistics,
computer science, and statistical physics.
This bibliography is an attempt to collection papers on DNA sequences: statistical patterns, statistical features, statistical corrections, etc. No claim is made on the completeness of this bibliography and it is updated on an irregular basis. |
... But I am very much excited by your article in May 30th [1953] Nature,
and think that bring Biology over into the group of "exact" sciences.
I plan to be in England through most of September, and hope to have
a chance to talk to you about all that, but I would like to ask a few
questions now. If your point of view is correct [,] each organism will be
characteristized by a long number written in quadrucal (?) system with figures
1, 2, 3, 4 standing for different bases ... This would open a very
exciting possibility of theoretical research based on combinatorix [sic]
and the theory of numbers! ... I have a feeling this can be done. What
do you think?
George Gamow (1904-1968), in a letter to Watson and Crick (1953).
From the point of view of the theory of information, the works of
Shakespeare, with the same number of letters and signs aligned
at random by a monkey, would have the same value. It is this lack
of definition of the value of information that makes it
difficult to use in biology. What could be considered as
"objective" in the Shakespearean information that
would distinguish it from the monkey's information?
Essentially the transmissibility. The value of influence,
therefore of evolution. ...
"Whereas ordinary mortals are content to mimic others, creative
geniuses are condemned to plagiarize themselves" is my shorter,
albeit inarticulate, version of what Van Veen said in Ada by
Vladimir Nobokov. Indeed, it seems that vaunted geniuses seldom
invented more than one modus operandi during their lifetimes, and
even civilization has largely been dependent upon plagiarizing
a small number of creative works; e.g., the multitudes of
Gothic churches can be viewed as pan European plagiarism of
the abbey church of St. Denis and/or the cathedral at Sens. This
is not surprising for new genes sensu stricto has seldom been
invented. Evolution rather relies on palgiarizing an old and
tested theme; the mechanism of evolution by gene duplication.
... this principle of repetitious recurrence pervades both
the construction of coding sequences in the genome, which can
be regarded as being representative of nature, and musical
composition which can be regarded as the most abstract and
therefore the most intellectual expression of nature."
Searching for an objective reconstruction of the vanished past must surely
be the most challenging task in biology. I need to say this because, today,
given the powerful tools of molecular biology, we can answer many
questions simply by looking up the answer in Nature - and I do not mean
the journal of the same name. ... In one sense, everything in biology has
already been 'published' in the form of DNA sequences of genomes; but,
of course, this is written in a language we do not yet understand.
Indeed, I would assert that the prime task of biology is to learn and
understand this language so that we could then compute organisms from
their DNA sequences. ... We are at the dawn of proper theoretical biology.
While ... human genome projects ... were launched only in the past decade,
the technoscientific imaginary and the discursive practices that have
animated them, specifically the textual and linguistic
representations of the genome, are quite old. In their (post?) modern
form they first emerged in the late 1940s and were then fully
elaborated within the work on the genetic code in the 1950s and 1960s.
|
|||||||||||
|
||||||||||||