Concepedia

TLDR

The paper introduces fast algorithms for selecting a random sample of n records without replacement from an unknown-size pool of N records. The authors present Algorithm Z, a one‑pass, constant‑space method with optimizations that improve speed by an order of magnitude, and provide an efficient Pascal‑like implementation. Algorithm Z achieves optimal expected time O(n(1+log(N/n))) with constant space and outperforms existing methods by a significant margin, as shown by theoretical analysis and experiments.

Abstract

We introduce fast algorithms for selecting a random sample of n records without replacement from a pool of N records, where the value of N is unknown beforehand. The main result of the paper is the design and analysis of Algorithm Z; it does the sampling in one pass using constant space and in O ( n (1 + log( N/n ))) expected time, which is optimum, up to a constant factor. Several optimizations are studied that collectively improve the speed of the naive version of the algorithm by an order of magnitude. We give an efficient Pascal-like implementation that incorporates these modifications and that is suitable for general use. Theoretical and empirical results indicate that Algorithm Z outperforms current methods by a significant margin.

References

YearCitations

Page 1