Publication | Closed Access
Metric Index: An Efficient and Scalable Solution for Similarity Search
42
Citations
14
References
2009
Year
Unknown Venue
EngineeringNovel IndexingSimilarity MeasureBig Data IndexingMetric SpaceText MiningInformation RetrievalData ScienceData MiningPattern RecognitionData ManagementResponse TimesKnowledge DiscoveryText IndexingComputer ScienceMetric IndexData IndexingRelational QueriesSearch Engine IndexingIndexing TechniqueSimilarity Search
Metric space as a universal and versatile model of similarity can be applied in various areas of non-text information retrieval. However, a general, efficient and scalable solution for metric data management is still a resisting research challenge. We introduce a novel indexing and searching mechanism called metric index (M-Index), that employs practically all known principles of metric space partitioning, pruning and filtering. The heart of the M-Index is a general mapping mechanism that enables to actually store the data in well-established structures such as the B <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">+</sup> -tree or even in a distributed storage. We have implemented the M-Index with B <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">+</sup> -tree and performed experiments on a combination of five MPEG-7 descriptors in a database of hundreds of thousands digital images. The experiments put under test several M-Index variants and compare them with two orthogonal approaches - the PM-Tree and the iDistance. The trials show that the M-Index outperforms the others in terms of efficiency of search-space pruning, I/O costs, and response times for precise similarity queries. Furthermore, the M-Index demonstrates an excellent ability to keep similar data close in the index which makes its approximation algorithm very efficient-maintaining practically constant response times while preserving a very high recall as the dataset grows.
| Year | Citations | |
|---|---|---|
Page 1
Page 1