An Efficient Algorithm for Finding Approximate Complex Repetitive Patterns

D. He; X. Wu

An Efficient Algorithm for Finding Approximate Complex Repetitive Patterns

D. He and X. Wu (USA)

Keywords

Abstract

The discovery of repetitive patterns is a fundamental prob lem in computational biology. There are many algorithms for ﬁnding both exact and approximate repetitive patterns. However, only a few of them have considered both length and frequency factors of those patterns. Since approximate repetitive patterns are more realistic, in this paper we pro pose an efﬁcient algorithm for ﬁnding approximate repet itive patterns whose lengths and frequencies are greater than some pre-deﬁned thresholds. Our deﬁnition of repet itive patterns is based on “elementary” repeats which are the basic building blocks of complex repetitive patterns. We show that the time complexity of our algorithm is O(n2 /f2 + Nf ( f k − f)) for the Hamming distance and O(n2 /f2 + Nf ( f k3 − f)) for the Edit distance, where n is the length of the given sequence, f is the frequency threshold, N is the expected number of seeds, f i s the expected frequency of seeds and, k is for k-(mismatch or differences). Approximate repetitive patterns, sufﬁx tree, elementary re peats, complex structure.

Important Links:

DOI:
From Proceeding (540) Computational and Systems Biology - 2006

Go Back