Publication | Closed Access
Selectivity estimation using probabilistic models
262
Citations
19
References
2001
Year
Unknown Venue
EngineeringQuery ModelBayesian InferenceStatistical Relational LearningText MiningInformation RetrievalData ScienceData MiningManagementData IntegrationStatisticsSelectivity EstimationKnowledge DiscoveryComputer ScienceDatabase TheorySelectivity EstimatesQuery OptimizationStatistical InferenceApproximate Query AnsweringDatabase Query Processing
Estimating result sizes for complex multi‑attribute, multi‑relation queries is a difficult but essential task for query optimization, profiling, and approximate answering, and recent work extends Bayesian networks to relational domains via Probabilistic Relational Models. The paper proposes using probabilistic graphical models, specifically Probabilistic Relational Models, to provide a unified framework for accurately estimating selectivity of queries involving multiple attributes and foreign‑key joins. The authors construct a Probabilistic Relational Model from the database, capturing intra‑table and inter‑table attribute dependencies, and use it to compute selectivity estimates for a broad class of queries, demonstrating the approach on several real‑world databases. The approach yields more accurate selectivity estimates for single‑table multi‑attribute and select‑join queries than standard methods, while using comparable space and time.
Estimating the result size of complex queries that involve selection on multiple attributes and the join of several relations is a difficult but fundamental task in database query processing. It arises in cost-based query optimization, query profiling, and approximate query answering. In this paper, we show how probabilistic graphical models can be effectively used for this task as an accurate and compact approximation of the joint frequency distribution of multiple attributes across multiple relations. Probabilistic Relational Models (PRMs) are a recent development that extends graphical statistical models such as Bayesian Networks to relational domains. They represent the statistical dependencies between attributes within a table, and between attributes across foreign-key joins. We provide an efficient algorithm for constructing a PRM front a database, and show how a PRM can be used to compute selectivity estimates for a broad class of queries. One of the major contributions of this work is a unified framework for the estimation of queries involving both select and foreign-key join operations. Furthermore, our approach is not limited to answering a small set of predetermined queries; a single model can be used to effectively estimate the sizes of a wide collection of potential queries across multiple tables. We present results for our approach on several real-world databases. For both single-table multi-attribute queries and a general class of select-join queries, our approach produces more accurate estimates than standard approaches to selectivity estimation, using comparable space and time.
| Year | Citations | |
|---|---|---|
Page 1
Page 1