Publication | Closed Access
DB-LSH 2.0: Locality-Sensitive Hashing With Query-Based Dynamic Bucketing
25
Citations
36
References
2023
Year
Cluster ComputingLinear CostEngineeringInformation RetrievalData ScienceData MiningManagementData IntegrationParallel ComputingLocality-sensitive HashingData ManagementVery Large DatabaseKnowledge DiscoveryComputer EngineeringHash FunctionComputer ScienceBig Data SearchLsh MethodsQuery OptimizationData IndexingDb-lsh 2.0Similarity SearchBig Data
Locality-sensitive hashing (LSH) is a promising family of methods for the high-dimensional approximate nearest neighbor (ANN) search problem due to its sub-linear query time and strong theoretical guarantee. Existing LSH methods either suffer from large index sizes and hash boundary problems, or incur a linear cost for high-quality candidate identification. This dilemma is addressed in a novel method called DB-LSH proposed in this paper. It organizes the projected spaces with multi-dimensional indexes instead of fixed-width hash buckets, which significantly reduces space costs. High-quality candidates can be generated efficiently by dynamically constructing query-based hypercubic buckets with the required widths through index-based window queries. A novel incremental search strategy called DBI-LSH is also developed to further boost the query performance, which incrementally accesses the next best point for higher accuracy and efficiency. Considering the intermediate query information of each query, DBA-LSH is designed to adaptively tune termination conditions without scarifying the success probability. Our theoretical analysis proves that DB-LSH has a smaller query cost than the existing work while DBA-LSH and DBI-LSH have lower expected query costs than DB-LSH. An extensive range of experiments on real-world data show the superiority of our approaches over the state-of-the-art methods in both efficiency and accuracy.
| Year | Citations | |
|---|---|---|
Page 1
Page 1