Publication | Closed Access
Spectral Hashing
2.1K
Citations
15
References
2008
Year
Graph SparsityEngineeringInformation RetrievalData ScienceData MiningPattern RecognitionGraph TheoryPerceptual HashingSimilarity SearchKnowledge DiscoverySemantic HashingBusinessHash FunctionComputer ScienceGraph PartitioningGraph ProcessingSemantic Similarity
Semantic hashing seeks compact binary codes for data points so that the Hamming distance between codewords reflects semantic similarity. The paper demonstrates that finding an optimal code for a dataset is equivalent to graph partitioning and is NP‑hard. The authors relax the problem to a spectral method using thresholded eigenvectors of the graph Laplacian and leverage convergence results to efficiently compute codes for new data points. Learning and applying the codes is straightforward, and experiments show that the resulting codes outperform state‑of‑the‑art methods.
Semantic hashing[1] seeks compact binary codes of data-points so that the Hamming distance between codewords correlates with semantic similarity. In this paper, we show that the problem of finding a best code for a given dataset is closely related to the problem of graph partitioning and can be shown to be NP hard. By relaxing the original problem, we obtain a spectral method whose solutions are simply a subset of thresholded eigenvectors of the graph Laplacian. By utilizing recent results on convergence of graph Laplacian eigenvectors to the Laplace-Beltrami eigenfunctions of manifolds, we show how to efficiently calculate the code of a novel data-point. Taken together, both learning the code and applying it to a novel point are extremely simple. Our experiments show that our codes outperform the state-of-the art.
| Year | Citations | |
|---|---|---|
Page 1
Page 1