about this bibliography
This bibliography started out with a narrow focus: non-trivial
long-range statistical correlations in DNA sequences. Gradually,
I have been collecting papers on other topics as well. Now I
have a collection of papers studying the most basic features of
DNA and protein sequences, those concerning these sequences as
symbolic strings.
There are roughly two large categories: static and dynamic.
The static category treats DNA and protein sequences as fixed
entities and uses various statistical methods (counting is the
simplest one!) to characterize features in the text of these sequences.
Sequence statistics used for predicting genes are treated
in a separate bibliography:
http://www.nslij-genetics.org/gene/
The dynamic category treats DNA sequences as a product of evolution,
and investigates which particular process led to which particular
feature. Although point mutation is the most studied dynamical
process, I'm more interested in large-scale duplication process. To
keep this focused, I created a separate page on gene duplication:
http://www.nslij-genetics.org/duplication/
The "usual suspects": topics that motivated this bibliography at the
first place: Large-scale base composition variations
(heterogeneity, isochore...); Basic properties on base composition
(entropy...); Characterizing whole-sequence correlation structure
(correlation function, traditional spectral analysis,
wavelet analysis...); Multiple-scaled features (fractal, 1/f noise,
self-similarity, scale-invariance, domains-within-domains...)
Other topics that this bibliography is expanding to:
about strand symmetry or strand a-symmetry
isochore, large-scale variation of GC-content
periodicity of three bases
periodicity of ten or eleven bases
tandom repeats (not many papers listed)
about DNA/protein music
(more information on this topic can be found at
http://www.nslij-genetics.org/dnamusic/ )
sequencing papers (since they usually also include analysis results)
(more links to whole-genome sequencing can be found in another
page:
http://www.nslij-genetics.org/seq/
protein sequence analysis
DNA bending (not many papers listed)
double helix binding energy (not many papers listed)