Bounds on the Complexity of the Longest Common Subsequence Problem

TLDR

The longest common subsequence problem, used for file comparison and genetic sequence analysis, seeks the longest subsequence common to two strings. The authors analyze its computational difficulty via a decision‑tree model where each node tests symbol equality or inequality. They prove that without an alphabet size bound, any algorithm requires time proportional to the product of string lengths, and provide tighter lower bounds depending on the alphabet‑to‑length ratio, showing that forbidding intra‑string comparisons yields linear time for two symbols and quadratic for three or more.

Abstract

The problem of finding a longest common subsequence of two strings is discussed. This problem arises in data processing applications such as comparing two files and in genetic applications such as studying molecular evolution. The difficulty of computing a longest common subsequence of two strings is examined using the decision tree model of computation, in which vertices represent “equal - unequal” comparisons. It is shown that unless a bound on the total number of distinct symbols is assumed, every solution to the problem can consume an amount of time that is proportional to the product of the lengths of the two strings. A general lower bound as a function of the ratio of alphabet size to string length is derived. The case where comparisons between symbols of the same string are forbidden is also considered and it is shown that this problem is of linear complexity for a two-symbol alphabet and quadratic for an alphabet of three or more symbols.

References

Page 1

	Year	Citations

Page 1