Concepedia

Publication | Closed Access

Similarity estimation techniques from rounding algorithms

2.2K

Citations

36

References

2002

Year

Moses Charikar

Unknown Venue

TLDR

Locality‑sensitive hashing schemes map objects to compact sketches so that the probability of a hash collision equals a similarity measure, enabling efficient similarity estimation, nearest‑neighbor search, and clustering, with min‑wise independent permutations providing a classic construction for Jaccard similarity. The authors demonstrate that rounding algorithms for linear and semidefinite programs can be interpreted as locality‑sensitive hashing schemes for various collections of objects. Using this insight, they construct new locality‑sensitive hashing schemes tailored to these collections.

Abstract

(MATH) A locality sensitive hashing scheme is a distribution on a family $\F$ of hash functions operating on a collection of objects, such that for two objects x,y, PrhεF[h(x) = h(y)] = sim(x,y), where sim(x,y) ε [0,1] is some similarity function defined on the collection of objects. Such a scheme leads to a compact representation of objects so that similarity of objects can be estimated from their compact sketches, and also leads to efficient algorithms for approximate nearest neighbor search and clustering. Min-wise independent permutations provide an elegant construction of such a locality sensitive hashing scheme for a collection of subsets with the set similarity measure sim(A,B) = \frac{|A ∩ B|}{|A ∪ B|}. (MATH) We show that rounding algorithms for LPs and SDPs used in the context of approximation algorithms can be viewed as locality sensitive hashing schemes for several interesting collections of objects. Based on this insight, we construct new locality sensitive hashing schemes for:

References

YearCitations

Page 1