Publication | Closed Access
Incremental String Comparison
182
Citations
23
References
1998
Year
Incremental String ComparisonInformation RetrievalEngineeringString-searching AlgorithmProgram AnalysisString ProcessingComputational LinguisticsCombinatorial Pattern MatchingFormal MethodsComputational ComplexityApproximate Overlap ProblemOrder-sorted LogicComputer SciencePattern MatchingCombinatorial OptimizationNew SolutionSimilarity SearchCyclic String Comparison
Comparing sequences to find their longest common subsequence or edit distance has been extensively studied. This work investigates whether LCS or edit distance can be updated efficiently when a single symbol is appended to either sequence. A theorem relating dynamic programming solutions yields an O(k) incremental algorithm for updating alignments under a difference threshold, improving over the O(k²) recomputation cost. The algorithm outperforms nonincremental approaches, enabling O(nk) solutions for longest prefix approximate match, approximate overlap, and cyclic string comparison.
The problem of comparing two sequences A and B to determine their longest common subsequence (LCS) or the edit distance between them has been much studied. In this paper we consider the following incremental version of these problems: given an appropriate encoding of a comparison between A and B, can one incrementally compute the answer for A and bB, and the answer for A and Bb with equal efficiency, where b is an additional symbol? Our main result is a theorem exposing a surprising relationship between the dynamic programming solutions for two such "adjacent" problems. Given a threshold k on the number of differences to be permitted in an alignment, the theorem leads directly to an O(k) algorithm for incrementally computing a new solution from an old one, as contrasts the O(k2 ) time required to compute a solution from scratch. We further show, with a series of applications, that this algorithm is indeed more powerful than its nonincremental counterpart. We show this by solving the applications with greater asymptotic efficiency than heretofore possible. For example, we obtain O(nk) algorithms for the longest prefix approximate match problem, the approximate overlap problem, and cyclic string comparison.
| Year | Citations | |
|---|---|---|
Page 1
Page 1