Indexing high-dimensional data for content-based retrieval in large databases

Abstract

Many indexing approaches for high-dimensional data points have evolved into very complex and hard to code algorithms. Sometimes this complexity is not matched by increase in performance. Motivated by these ideas, we take a step back and look at simpler approaches to indexing multimedia data. In this paper we propose a simple, (not simplistic) yet efficient indexing structure for high-dimensional data Points of variable dimension, using dimension reduction. Our approach maps multidimensional points to a 1D line by computing their Euclidean Norm and use a B/sup +/-Tree to store data points. We exploit B/sup +/-Tree efficient sequential search to develop simple, yet performant methods to implement point, range and nearest-neighbor queries. To evaluate our technique we conducted a set of experiments, using both synthetic and real data. We analyze creation, insertion and query times as a function of data set size and dimension. Results so far show that our simple scheme outperforms current approaches, such as the Pyramid Technique, the A-Tree and the SR-Tree, for many data distributions. Moreover, our approach seems to scale better both with growing dimensionality and data set size, while exhibiting low insertion and search times.

References

Page 1

	Year	Citations

Page 1