Making Aggregation Work in Uncertain and Probabilistic Databases

TLDR

Exact aggregation on uncertain databases can produce exponentially sized results, motivating three alternatives: a low bound, a high bound, and the expected value. The paper describes how the Trio system handles aggregation for uncertain and probabilistic data. The authors provide formal definitions, semantics, and an open‑source implementation of these alternatives, and evaluate their performance and scalability on a large synthetic dataset. The proposed variants return a single result instead of a set of possibilities, are generally efficient for full‑table and grouped queries, and preliminary experiments show promising results for joins.

Abstract

We describe how aggregation is handled in the Trio system for uncertain and probabilistic data. Because "exact" aggregation in uncertain databases can produce exponentially sized results, we provide three alternatives: a low bound on the aggregate value, a high bound on the value, and the expected value. These variants return a single result instead of a set of possible results, and they are generally efficient to compute for both full-table and grouped aggregation queries. We provide formal definitions and semantics and a description of our open source implementation for single-table aggregation queries. We study the performance and scalability of our algorithms through experiments over a large synthetic data set. We also provide some preliminary results on aggregations over joins.

References

Page 1

	Year	Citations

Page 1