Publication | Open Access
Efficient string matching
2.9K
Citations
9
References
1975
Year
EngineeringCorpus LinguisticsText StringText MiningNatural Language ProcessingString-searching AlgorithmInformation RetrievalData ScienceData MiningPattern RecognitionPattern Matching MachineComputational LinguisticsString ProcessingKnowledge DiscoveryFinite State PatternComputer ScienceKeyword SearchCombinatorial Pattern MatchingKeyword Extraction
The paper presents a simple, efficient algorithm for locating all occurrences of a finite set of keywords within a text string. The algorithm builds a finite‑state pattern‑matching machine from the keywords and processes the text in a single pass, with construction time proportional to the total keyword length. The machine’s state transitions are independent of keyword count, and the algorithm accelerated a bibliographic search program by 5–10×.
This paper describes a simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text. The algorithm consists of constructing a finite state pattern matching machine from the keywords and then using the pattern matching machine to process the text string in a single pass. Construction of the pattern matching machine takes time proportional to the sum of the lengths of the keywords. The number of state transitions made by the pattern matching machine in processing the text string is independent of the number of keywords. The algorithm has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10.
| Year | Citations | |
|---|---|---|
Page 1
Page 1