Publication | Closed Access
Speeding Up String Pattern Matching by Text Compression: The Dawn of a New Era
23
Citations
24
References
2001
Year
EngineeringCorpus LinguisticsText MiningNew EraNatural Language ProcessingString-searching AlgorithmInformation RetrievalData ScienceData MiningPattern RecognitionText CompressionComputational LinguisticsString ProcessingCompression RatioPractical ViewpointsLanguage StudiesLossless CompressionAc Type AlgorithmComputer SciencePattern MatchingCombinatorial Pattern MatchingLinguistics
††,☆ † † † †† †† † This paper describes our recent studies on string pattern matching in compressed texts mainly from practical viewpoints. The aim is to speed up the string pattern matching task, in comparison with an ordinary search over the original texts. We have successfully developed (1) an AC type algorithm for searching in Huffman encoded files, and (2) a KMP type algorithm and (3) a BM type algorithm for searching in files compressed by the so-called byte pair encoding (BPE). Each of the algorithms reduces the search time at nearly the same rate as the compression ratio. Surprisingly, the BM type algorithm runs over BPE compressed files about 1.2–3.0 times faster than the exact match routines of the software package agrep, which is known as the fastest pattern matching tool.
| Year | Citations | |
|---|---|---|
Page 1
Page 1