Publication | Closed Access
Fast incremental discovery of pointwise order dependencies
19
Citations
35
References
2020
Year
Invalid PodsEngineeringPattern DiscoveryComputational ComplexityInformation RetrievalData ScienceData MiningCombinatorial OptimizationPointwise Order DependenciesMinimal PodsVery Large DatabaseKnowledge DiscoveryComputer ScienceDatabase TheoryFast Incremental DiscoveryQuery OptimizationData IndexingRelational QueriesStructure DiscoveryOrder-sorted LogicStructure MiningIndexing Technique
Pointwise order dependencies (PODs) are dependencies that specify ordering semantics on attributes of tuples. POD discovery refers to the process of identifying the set Σ of valid and minimal PODs on a given data set D. In practice D is typically large and keeps changing, and it is prohibitively expensive to compute Σ from scratch every time. In this paper, we make a first effort to study the incremental POD discovery problem, aiming at computing changes ΔΣ to Σ such that Σ ⊕ ΔΣ is the set of valid and minimal PODs on D with a set Δ D of tuple insertion updates. (1) We first propose a novel indexing technique for inputs Σ and D. We give algorithms to build and choose indexes for Σ and D , and to update indexes in response to Δ D. We show that POD violations w.r.t. Σ incurred by Δ D can be efficiently identified by leveraging the proposed indexes, with a cost dependent on log (| D |). (2) We then present an effective algorithm for computing ΔΣ, based on Σ and identified violations caused by Δ D. The PODs in Σ that become invalid on D + Δ D are efficiently detected with the proposed indexes, and further new valid PODs on D + Δ D are identified by refining those invalid PODs in Σ on D + Δ D. (3) Finally, using both real-life and synthetic datasets, we experimentally show that our approach outperforms the batch approach that computes from scratch, up to orders of magnitude.
| Year | Citations | |
|---|---|---|
Page 1
Page 1