Publication | Closed Access
The zero-frequency problem: estimating the probabilities of novel events in adaptive text compression
727
Citations
16
References
1991
Year
Adaptive Text CompressionLossy CompressionEngineeringNovel EventsCorpus LinguisticsText MiningNatural Language ProcessingInformation RetrievalData ScienceData MiningText CompressionComputational LinguisticsLanguage StudiesCoding TheoryContent AnalysisStatisticsLossless CompressionMachine TranslationVariable-length CodeComputer ScienceInformation ExtractionData CompressionZero-frequency ProblemText ProcessingLinguistics
Approaches to the zero-frequency problem in adaptive text compression are discussed. This problem relates to the estimation of the likelihood of a novel event occurring. Although several methods have been used, their suitability has been on empirical evaluation rather than a well-founded model. The authors propose the application of a Poisson process model of novelty. Its ability to predict novel tokens is evaluated, and it consistently outperforms existing methods. It is applied to a practical statistical coding scheme, where a slight modification is required to avoid divergence. The result is a well-founded zero-frequency model that explains observed differences in the performance of existing methods, and offers a small improvement in the coding efficiency of text compression over the best method previously known.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>
| Year | Citations | |
|---|---|---|
Page 1
Page 1