Publication | Closed Access
MR-RePair: Grammar Compression Based on Maximal Repeats
19
Citations
8
References
2019
Year
Unknown Venue
EngineeringGrammar CompressionCorpus LinguisticsSpeech RecognitionNatural Language ProcessingSyntaxString-searching AlgorithmString ProcessingComputational LinguisticsRepair Compression AlgorithmGrammarLanguage StudiesLossless CompressionGrammar Generation AlgorithmMachine TranslationComputer ScienceGrammar InductionData CompressionMore Compact GrammarsCombinatorial Pattern MatchingSpeech ProcessingText ProcessingLinguistics
We analyze the grammar generation algorithm of the RePair compression algorithm and show the relation between a grammar generated by RePair and maximal repeats. We reveal that RePair replaces step by step the most frequent pairs within the corresponding most frequent maximal repeats. Then, we design a novel variant of RePair, called MR-RePair, which substitutes the most frequent maximal repeats at once instead of substituting the most frequent pairs consecutively. We implemented MR-RePair and compared the size of the grammar generated by MR-RePair to that by RePair on several text corpora. Our experiments show that MR-RePair generates more compact grammars than RePair does, especially for highly repetitive texts.
| Year | Citations | |
|---|---|---|
Page 1
Page 1