Concepedia

TLDR

Selectivity estimation of queries is an important and well‑studied problem in relational database systems. The study examines selectivity estimation for GIS spatial data and proposes techniques using spatial indices, histograms, BSPs, and spatial skew. The authors develop partition‑based methods for point and range queries on 2D rectangles, evaluate them against sampling and parametric baselines, and test them on synthetic and TIGER datasets. Experiments show that the Min‑Skew BSP partitioning yields the most accurate selectivity estimates, is efficient to construct, space‑light, and works across a wide range of spatial queries.

Abstract

Selectivity estimation of queries is an important and well-studied problem in relational database systems. In this paper, we examine selectivity estimation in the context of Geographic Information Systems, which manage spatial data such as points, lines, poly-lines and polygons. In particular, we focus on point and range queries over two-dimensional rectangular data. We propose several techniques based on using spatial indices, histograms, binary space partitionings (BSPs), and the novel notion of spatial skew. Our techniques carefully partition the input rectangles into subsets and approximate each partition accurately. We present a detailed experimental study comparing the proposed techniques and the best known sampling and parametric techniques. We evaluate them using synthetic as well as real-life TIGER datasets. Based on our experiments, we identify a BSP based partitioning that we call Min-Skew which consistently provides the most accurate selectivity estimates for spatial queries. The Min-Skew partitioning can be constructed efficiently, occupies very little space, and provides accurate selectivity estimates over a broad range of spatial queries.

References

YearCitations

Page 1