A universal predictor based on pattern matching

Abstract

We consider a universal predictor based on pattern matching. Given a sequence X/sub 1/, ..., X/sub n/ drawn from a stationary mixing source, it predicts the next symbol X/sub n+1/ based on selecting a context of X/sub n+1/. The predictor, called the sampled pattern matching (SPM), is a modification of the Ehrenfeucht-Mycielski (1992) pseudorandom generator algorithm. It predicts the value of the most frequent symbol appearing at the so-called sampled positions. These positions follow the occurrences of a fraction of the longest suffix of the original sequence that has another copy inside X/sub 1/X/sub 2//spl middot//spl middot//spl middot/X/sub n/; that is, in SPM, the context selection consists of taking certain fraction of the longest match. The study of the longest match for lossless data compression was initiated by Wyner and Ziv in their 1989 seminal paper. Here, we estimate the redundancy of the SPM universal predictor, that is, we prove that the probability the SPM predictor makes worse decisions than the optimal predictor is O(n/sup -/spl nu//) for some 0</spl nu/< 1/2 as n/spl rarr//spl infin/. As a matter of fact, we show that we can predict K=O(1) symbols with the same probability of error.

References

Page 1

	Year	Citations

Page 1