Publication | Closed Access
Answering (Unions of) Conjunctive Queries using Random Access and Random-Order Enumeration
19
Citations
22
References
2020
Year
Unknown Venue
Computational Complexity TheoryEngineeringAnswer EnumerationComputational ComplexityData StructureInformation RetrievalData ScienceDiscrete MathematicsRandom-order EnumerationComputer ScienceConjunctive QueriesDatabase TheoryAlgorithmic Information TheoryQuery OptimizationAutomated ReasoningFormal MethodsRandom AccessApproximate Query AnsweringKnowledge CompilationRandomized Algorithm
As data analytics becomes more crucial to digital systems, so grows the importance of characterizing the database queries that admit a more efficient evaluation. We consider the tractability yardstick of answer enumeration with a polylogarithmic delay after a linear-time preprocessing phase. Such an evaluation is obtained by constructing, in the preprocessing phase, a data structure that supports polylogarithmic-delay enumeration. In this paper, we seek a structure that supports the more demanding task of a "random permutation": polylogarithmic-delay enumeration in truly random order. Enumeration of this kind is required if downstream applications assume that the intermediate results are representative of the whole result set in a statistically valuable manner. An even more demanding task is that of a "random access": polylogarithmic-time retrieval of an answer whose position is given. We establish that the free-connex acyclic CQs are tractable in all three senses: enumeration, random-order enumeration, and random access; and in the absence of self-joins, it follows from past results that every other CQ is intractable by each of the three (under some fine-grained complexity assumptions). However, the three yardsticks are separated in the case of a union of CQs (UCQ): while a union of free-connex acyclic CQs has a tractable enumeration, it may (provably) admit no random access. For such UCQs we devise a random-order enumeration whose delay is logarithmic in expectation. We also identify a subclass of UCQs for which we can provide random access with polylogarithmic access time. Finally, we present an implementation and an empirical study that show a considerable practical superiority of our random-order enumeration approach over state-of-the-art alternatives.
| Year | Citations | |
|---|---|---|
Page 1
Page 1