Publication | Closed Access
A sublinear algorithm for weakly approximating edit distance
78
Citations
9
References
2003
Year
Unknown Venue
Mathematical ProgrammingEngineeringTime OComputational ComplexityNatural Language ProcessingString-searching AlgorithmData MiningString ProcessingComputational GeometryApproximation TheoryStatisticsSublinear AlgorithmKnowledge DiscoveryComputer ScienceSublinear TimeCombinatorial Pattern MatchingTime õApproximation MethodSimilarity Search
We show how to determine whether the edit distance between two given strings is small in sublinear time. Specifically, we present a test which, given two n-character strings A and B, runs in time o(n) and with high probability returns "CLOSE" if their edit distance is O(nΑ), and "FAR" if their edit distance is Ω(n), where Α is a fixed parameter less than 1. Our algorithm for testing the edit distance works by recursively subdividing the strings A and B into smaller substrings and looking for pairs of substrings in A, B with small edit distance. To do this, we query both strings at random places using a special technique for economizing on the samples which does not pick the samples independently and provides better query and overall complexity. As a result, our test runs in time Õ(nmax(Α/2, 2Α - 1\)) for any fixed Α < 1. Our algorithm thus provides a trade-off between accuracy and efficiency that is particularly useful when the input data is very large.We also show a lower bound of Ω(nΑ/2) on the query complexity of every algorithm that distinguishes pairs of strings with edit distance at most nΑ from those with edit distance at least n/6.
| Year | Citations | |
|---|---|---|
Page 1
Page 1