Publication | Closed Access
Maintaining Stream Statistics over Sliding Windows
836
Citations
11
References
2002
Year
EngineeringStreaming AlgorithmComputational ComplexityStreaming DataLp NormsData ScienceDiscrete MathematicsParallel ComputingCombinatorial OptimizationStream StatisticsStream ProcessingSliding Window ModelLower BoundComputer ScienceAlgorithmic Information TheorySignal ProcessingExternal-memory AlgorithmRandomized AlgorithmData Streams
The paper studies maintaining aggregates and statistics over data streams within the sliding‑window model, which considers the last N elements seen. The authors aim to maintain counts of 1s and sums of the last N elements in a data stream, providing upper and lower bounds for these problems. They devise memory‑efficient algorithms that use \(O(\tfrac{1}{\varepsilon}\log^{2}N)\) bits to estimate counts and sums, extend to \(L_{p}\) norms, and achieve matching upper and lower bounds. They prove that \(O(\tfrac{1}{\varepsilon}\log^{2}N)\) bits suffice to estimate counts within a \(1+\varepsilon\) factor, establish matching lower bounds, and show that their techniques enable other sliding‑window algorithms with \(O(\tfrac{1}{\varepsilon}\log N)\) overhead and \(1+\varepsilon\) accuracy.
We consider the problem of maintaining aggregates and statistics over data streams, with respect to the last N data elements seen so far. We refer to this model as the sliding window model. We consider the following basic problem: Given a stream of bits, maintain a count of the number of 1's in the last N elements seen from the stream. We show that, using $O(\frac{1}{\epsilon} \log^2 N)$ bits of memory, we can estimate the number of 1's to within a factor of $1 + \epsilon$. We also give a matching lower bound of $\Omega(\frac{1}{\epsilon}\log^2 N)$ memory bits for any deterministic or randomized algorithms. We extend our scheme to maintain the sum of the last N positive integers and provide matching upper and lower bounds for this more general problem as well. We also show how to efficiently compute the Lp norms ($p \in [1,2]$) of vectors in the sliding window model using our techniques. Using our algorithm, one can adapt many other techniques to work for the sliding window model with a multiplicative overhead of $O(\frac{1}{\epsilon}\log N)$ in memory and a $1 +\epsilon$ factor loss in accuracy. These include maintaining approximate histograms, hash tables, and statistics or aggregates such as sum and averages.
| Year | Citations | |
|---|---|---|
Page 1
Page 1