Approximate nearest neighbor queries in fixed dimensions

TLDR

Given a set of n points in d‑dimensional Euclidean space, the nearest neighbor of a query point is the point in S with minimal Euclidean distance; while efficient solutions exist for small d, performance degrades rapidly as d increases. The goal is to preprocess the point set S so that queries can be answered as efficiently as possible. We present a randomized algorithm that builds a data structure in O(n) expected time, uses O(n log n) space, and answers queries in O(log n) expected time, reporting a point within a factor (1 + ε) of the true nearest neighbor. The constant factors depend on d and ε, and empirical tests show that a practical variant of the algorithm finds the nearest neighbor in moderately large dimensions significantly faster than existing practical approaches.

Abstract

Given a set of n points in d-dimensional Euclidean space, S ⊂ E, and a query point q ∈ E, we wish to determine the nearest neighbor of q, that is, the point of S whose Euclidean distance to q is minimum. The goal is to preprocess the point set S, such that queries can be answered as efficiently as possible. We assume that the dimension d is a constant independent of n. Although reasonably good solutions to this problem exist when d is small, as d increases the performance of these algorithms degrades rapidly. We present a randomized algorithm for approximate nearest neighbor searching. Given any set of n points S ⊂ E, and a constant ǫ > 0, we produce a data structure, such that given any query point, a point of S will be reported whose distance from the query point is at most a factor of (1 + ǫ) from that of the true nearest neighbor. Our algorithm runs in O(log n) expected time and requires O(n log n) space. The data structure can be built in O(n) expected time. The constant factors depend on d and ǫ. Because of the practical importance of nearest neighbor searching in higher dimensions, we have implemented a practical variant of this algorithm, and show empirically that for many point distributions this variant of the algorithm finds the nearest neighbor in moderately large dimension significantly faster than existing practical approaches.

References

Page 1

	Year	Citations

Page 1