An Efficient Algorithm for Finding Approximate Complex Repetitive Patterns

D. He and X. Wu (USA)



The discovery of repetitive patterns is a fundamental prob lem in computational biology. There are many algorithms for finding both exact and approximate repetitive patterns. However, only a few of them have considered both length and frequency factors of those patterns. Since approximate repetitive patterns are more realistic, in this paper we pro pose an efficient algorithm for finding approximate repet itive patterns whose lengths and frequencies are greater than some pre-defined thresholds. Our definition of repet itive patterns is based on “elementary” repeats which are the basic building blocks of complex repetitive patterns. We show that the time complexity of our algorithm is O(n2 /f2 + Nf ( f k − f)) for the Hamming distance and O(n2 /f2 + Nf ( f k3 − f)) for the Edit distance, where n is the length of the given sequence, f is the frequency threshold, N is the expected number of seeds, f i s the expected frequency of seeds and, k is for k-(mismatch or differences). Approximate repetitive patterns, suffix tree, elementary re peats, complex structure.

Important Links:

Go Back