Publication | Closed Access
A novel probabilistic pruning approach to speed up similarity queries in uncertain databases
39
Citations
23
References
2011
Year
Unknown Venue
EngineeringSimilarity QueriesSimilarity MeasureUncertain DatabaseUncertain DataProbabilistic Similarity QueriesStatistical Relational LearningProbabilistic OntologyInformation RetrievalData ScienceData MiningUncertainty QuantificationManagementData IntegrationDomination CountKnowledge DiscoveryComputer ScienceQuery OptimizationUncertain DatabasesSimilarity Search
Computing the probabilistic domination count—how many uncertain objects are closer to a reference than a target—is essential for answering a broad class of probabilistic similarity queries on uncertain databases. The study introduces a novel probabilistic pruning criterion and an iterative filter‑refinement strategy to efficiently estimate the probabilistic domination count while preserving correctness. The method models uncertain attributes with continuous probability density functions, applies a geometric pruning filter, and iteratively refines estimates to compute the domination count. Experiments demonstrate that the technique yields tight probability bounds for the domination count rapidly, even on large uncertain databases.
In this paper, we propose a novel, effective and efficient probabilistic pruning criterion for probabilistic similarity queries on uncertain data. Our approach supports a general uncertainty model using continuous probabilistic density functions to describe the (possibly correlated) uncertain attributes of objects. In a nutshell, the problem to be solved is to compute the PDF of the random variable denoted by the probabilistic domination count: Given an uncertain database object B, an uncertain reference object R and a set D of uncertain database objects in a multi-dimensional space, the probabilistic domination count denotes the number of uncertain objects in D that are closer to R than B. This domination count can be used to answer a wide range of probabilistic similarity queries. Specifically, we propose a novel geometric pruning filter and introduce an iterative filter-refinement strategy for conservatively and progressively estimating the probabilistic domination count in an efficient way while keeping correctness according to the possible world semantics. In an experimental evaluation, we show that our proposed technique allows to acquire tight probability bounds for the probabilistic domination count quickly, even for large uncertain databases.
| Year | Citations | |
|---|---|---|
Page 1
Page 1