A Consensus Algorithm for Approximate Pattern Matching in Protein Sequences

A Alba; M Rubio-Rincon; M Rodrguez-Kessler; E R Arce-Santana; M O Mendez

2012, Number 2

<< Back Next >>

Rev Mex Ing Biomed 2012; 33 (2)

A Consensus Algorithm for Approximate Pattern Matching in Protein Sequences

Alba A, Rubio-Rincon M, Rodrguez-Kessler M, Arce-Santana ER, Mendez MO

Full text

How to cite this article

Language: Spanish
References: 21
Page: 87-99
PDF size: 783.78 Kb.

ABSTRACT

In bioinformatics, one of the main tools which allow scientists to nd common characteristics in protein or DNA sequences of dierent species is the approximate matching of strings. From the computational point of view, the diculty of approximate string matching lies in nding adequate measures to eciently compare two strings, since, in many cases, one is interested in performing searches in real time, within large databases. In this paper we propose a novel method for approximate string matching based on a generalization of the algorithm proposed by Baeza-Yates and Perleberg in 1996 for computing the Hamming distance between two sequences. In addition, a post-processing stage which signicantly reduces the number of false positives is presented. The proposed method has been evaluated in synthetic cases of random sequences, and with real cases of plant protein sequences. Results show that the proposed algorithm is highly ecient in computational terms and in specicity, especially when compared against a previously published method, which is based on the phase correlation function.

REFERENCES

Smith TF, Waterman MS. Identication of common molecular subsequences". Journal of Molecular Biology, 1981; 147: 195-197.
Altschul SF, GishW, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool". Journal of Molecular Biology, 1990; 215: 403-410.
Hamming RW. Error detecting and error correcting codes". Bell System Technical Journal, 1950; 29(2): 147-160.
Levenshtein VI. Binary codes capable of correcting deletions, insertions, and reversals". Soviet Physics Doklady, 1966; 10: 707-710.
Howie JM. Automata and Languages. Oxford University Press, 1991.
Ukkonen E. Algorithms for approximate string matching". Inf. Control, 1985; 64(1- 3): 100-118.
Jokinen P, Tarhio J, Ukkonen E. A comparison of approximate string matching algorithms". Softw. Pract. Exper., 1996; 26(12): 1439-1458.
Navarro G. A guided tour to approximate string matching". ACM Computing Surveys, 2001; 33(1): 31-88.
Navarro G, Baeza-Yates RA, Sutinen E, Tarhio J. Indexing methods for approximate string matching". IEEE Data Engineering Bulletin, 2001; 24(4): 19-27.
Boytsov L. Indexing methods for approximate dictionary searching: Comparative analysis". J. Exp. Algorithmics, 2011; 16: 1.1:1.1-1.1:1.91.
Buhler J. Ecient large-scale sequence comparison by locality sensitive hashing". Bioinformatics, 2001; 17(5): 419-428.
Baeza-Yates RA, Perleberg CH. Fast and practical approximate string matching". Inf. Process. Lett., 1996; 59(1): 21-27.
Alba A, Rodriguez-Kessler M, Arce- Santana ER, Mendez MO. Approximate string matching using phase correlation". Proceedings of the 34th Annual International Conference of the IEEE EMBS, 2012; pp. 6309-6312.
Frigo M, Johnson SG. The Design and Implementation of FFTW3". Proc. IEEE, 2005; 93(2): 216-231.
Baeza-Yates RA, Gonnet GH. A new approach to text searching". Proceedings of the 12th Annual ACM-SIGIR Conference on Information Retrieval, 1989; pp. 168-175.
Wu S, Manber U. Fast Text Searching With Errors". Technical Report TR 91-11, Department of Computer Science, University of Arizona, 1991.
Ochoa-Alfaro A, Rodrguez-Kessler M, Perez-Morales M, Delgado-Sanchez P, Cuevas-Velazquez C, Gomez-Anduro G, Jimenez-Bremont J. Functional characterization of an acidic SK3 dehydrin isolated from an Opuntia streptacantha cDNA library". Planta, 2012; 235: 565- 578.
Hundertmark M, Hincha DK. LEA (late embryogenesis abundant) proteins and their encoding genes in Arabidopsis thaliana". BMC Genomics, 2008; 9: 118.
Jimenez-Bremont JF, Maruri-Lopez I, Ochoa-Alfaro A, Delgado-Sanchez P, Bravo J, Rodrguez-Kessler M. LEA gene introns: is the intron of dehydrin genes a characteristic of the serinesegment?". Plant Mol Biol Rep. (DOI: 10.1007/s11105-012-0483-x). In press.
Allagulova CR, Gimalov FR, Shakirova FM, Vakhitov VA. The plant dehydrins: structure and putative functions". Biochemistry (Moscow), 2003; 68: 945- 951.
Kosova K, Prasil IT, Vtamvas P. Role of dehydrins in plant stress response" En: Pessarakli M, editor, Handbook of Plant and Crop Stress, 3rd ed., CRC Press (Florida), 2010: 239-285.