Publication | Closed Access
Self-tuning histograms
248
Citations
13
References
1999
Year
EngineeringInformation RetrievalData ScienceData MiningKnowledge DiscoveryManagementData ExplorationData IntegrationComputer ScienceData RetrievalTraditional HistogramsData DistributionsApproximate Query AnsweringData ManagementStatisticsSimilarity SearchSelf-tuning HistogramsQuery Optimization
Self‑tuning histograms resemble traditional histograms but refine their estimates using query‑execution feedback rather than data samples, offering a cost‑effective alternative to expensive multi‑dimensional histograms that capture attribute dependencies. The authors introduce self‑tuning histograms and describe techniques for initializing and refining them. They build and update the histograms by iteratively adjusting bucket boundaries based on observed selectivity from range queries, without examining the data directly. Experiments show that self‑tuning histograms incur negligible construction and maintenance costs, provide a low‑cost alternative to multi‑dimensional histograms, and maintain accuracy for distributions with low to moderate skew.
In this paper, we introduce self-tuning histograms. Although similar in structure to traditional histograms, these histograms infer data distributions not by examining the data or a sample thereof, but by using feedback from the query execution engine about the actual selectivity of range selection operators to progressively refine the histogram. Since the cost of building and maintaining self-tuning histograms is independent of the data size, self-tuning histograms provide a remarkably inexpensive way to construct histograms for large data sets with little up-front costs. Self-tuning histograms are particularly attractive as an alternative to multi-dimensional traditional histograms that capture dependencies between attributes but are prohibitively expensive to build and maintain. In this paper, we describe the techniques for initializing and refining self-tuning histograms. Our experimental results show that self-tuning histograms provide a low-cost alternative to traditional multi-dimensional histograms with little loss of accuracy for data distributions with low to moderate skew.
| Year | Citations | |
|---|---|---|
Page 1
Page 1