Recursive Segmentation of DNA Sequences
This page is created to provide supplement material for the paper:
Wentian Li, Pedro Bernaola-Galvan, Fatameh Haghighi, Ivo Grosse (2002),
"Applications of recursive segmentation to the analysis of DNA sequences",
Computers & Chemistry, 26(5):491-510.
-
PDF file of the paper
-
Abstract from PUBMED
-
iso.pl Perl script [this script is to segment GC% only.
towards the end of the script, in the function sub reading_seq{ } ,
one can see that symbol "t" is combined with symbol "a", and "g" combined with
symbol "c". you can change this part of the rule to segment any other
types of converted sequence.] [last updated: March 2001]
-
a test sequence [i just made this sequence up
to test the script. once the Perl script is saved to a file iso.pl
-- making sure the file if executable, the test sequence to a file
test-seq.txt, type: "iso.pl test-seq.txt 0" . you
will see four domains: 1-9, 10-52, 53-79, 80-90. you can change the value
"0" to other positive threshold, as described in the paper, to obtain
fewer domains.]
Note that your DNA sequence can
-
contain a comment line (or lines) that starts with the ">" symbol
-
contain both uppercase (ACGT) and/or lowercase (acgt)
-
cpg.pl Perl script [almost the same as iso.pl,
only that now "cg" is converted to symbol "1" and all other to "0".
and in this particular version, one more command-line argument,
the minimum size of the domain, is added. one can type, e.g.
"cpg.pl test-seq.txt 0 3" to segment CpG domains that
are at least 3 bases long.]
If you find any bugs in the script or have any problems, please send me a message
(email: wli@nslij-genetics.org).
Wentian Li, Dec 30, 2003.
Outside links: