Identifying Coding Exons by Similarity Search: Alu-Derived and Other
Potentially Misleading Protein Sequences
J.M. Claverie
Genomics, 12(4),838-841 (April 1992)
Abstract
The search for significant local similarities with known protein
sequences is a powerful method for interpreting anonymous cDNA
sequences or locating coding exons within genomic DNA sequences at a
stage where the average contig size is still very small. The BLASTx
program, implemented on the National Center for Biotechnology
Information server, allows a sensitive search of all putative
translations of a nucleotide query sequence against all known
proteins in a matter of seconds. From an analysis of the current
databases, I report a set of protein sequences exhibiting high local
similarity to Alu repeat or vector sequences. These entries can lead
to misleading interpretations of similarity searches. During the
course of this study, the protease of a human spumaretrovirus was
found to have integrated the 3' end half of the U2 snRNA.