Concepedia

Abstract

††,☆ † † † †† †† † This paper describes our recent studies on string pattern matching in compressed texts mainly from practical viewpoints. The aim is to speed up the string pattern matching task, in comparison with an ordinary search over the original texts. We have successfully developed (1) an AC type algorithm for searching in Huffman encoded files, and (2) a KMP type algorithm and (3) a BM type algorithm for searching in files compressed by the so-called byte pair encoding (BPE). Each of the algorithms reduces the search time at nearly the same rate as the compression ratio. Surprisingly, the BM type algorithm runs over BPE compressed files about 1.2–3.0 times faster than the exact match routines of the software package agrep, which is known as the fastest pattern matching tool.

References

YearCitations

Page 1