A streamlined random sequencing strategy for finding coding exons
Claverie JM
Genomics 23(3):575-581 (1994 Oct)
Abstract
The random (shotgun) DNA sequencing strategy is used for most
large-scale sequencing projects, including the identification of
human disease genes after positional cloning. The principle of the
method--sequence assembly from overlap--requires the candidate gene
region to be partitioned into 15- to 20-kb pieces (usually lambda
inserts), themselves randomly subcloned into M13 prior to sequencing
with a 6- to 8-fold redundancy. Most often, a time-consuming
directed strategy must be invoked to close the remaining gaps.
Ultimately, computer-based methods are invoked to locate putative
coding exons within the finished genomic sequence. Given the small
average size of vertebrate exons, I show here that they can be
detected from the computer analysis of the individual runs, much
before completion of contiguity. However, the successful assessment
of coding potential from the raw data depends on a combination of
new sequence masking techniques. When the identification of coding
exons is the primary goal, the usual random sequencing strategy can
thus be greatly optimized. The streamlined approach requires only a
2- to 2.5-fold sequencing redundancy, can dispense with the
subcloning in lambda and the closure of gaps, and can be fully
automated. The feasibility of this strategy is demonstrated using
data from the X-linked Kallmann syndrome gene region.