Concepedia

Publication | Closed Access

A One-Pass Space-Efficient Algorithm for Finding Quantiles.

48

Citations

0

References

1995

Year

Abstract

We present an algorithm for finding the quantile values of a large unordered dataset with unknown distribution. The algorithm has the following features: i) it requires only one pass over the data; ii) it is space efficient --- it uses a small bounded amount of memory independent of the number of values in the dataset; and iii) the true quantile is guaranteed to lie within the lower and upper bounds produced by the algorithm. Empirical evaluation using synthetic data with various distributions as well as real data show that the bounds obtained are quite tight. The algorithm has several applications in database systems, for example in database governors, query optimization, load balancing in multiprocessor database systems, and data mining. 1 Introduction The p-quantile of an ordered sequence of data values is the smallest value below which p fraction of the total values in the sequence lie. Accurate estimation of the number of tuples satisfying a predicate is a prerequisite for a goo...