Publication | Closed Access
ULDBs: databases with uncertainty and lineage
459
Citations
33
References
2006
Year
Relational DatabaseEngineeringDatabasesUldb RepresentationUncertain DatabaseUncertain DataSemantic WebProbabilistic DatabasesInformation RetrievalData ScienceData MiningDatabase SystemManagementData IntegrationSemi-structured DataDatabase ConstructionData ManagementUldb RepresentationsKnowledge DiscoveryComputer ScienceDatabase TheoryData Modeling
Uncertain data and data lineage are critical yet often treated separately, but many applications need them together, and lineage provides a consistent representation that correlates input and query uncertainties and offers computational advantages. This paper introduces ULDBs, an extension of relational databases that provides simple yet expressive constructs for representing and manipulating both lineage and uncertainty. ULDBs define data‑minimal and lineage‑minimal representations, offer an algorithm for extracting subsets amid interconnected uncertainty, and enable a new query‑processing approach that underlies the Trio system. We demonstrate that the ULDB representation is complete, supports straightforward implementation of relational operations, yields computational benefits, and forms the foundation of the Trio probabilistic database system.
This paper introduces ULDBs, an extension of relational databases with simple yet expressive constructs for representing and manipulating both lineage and uncertainty. Uncertain data and data lineage are two important areas of data management that have been considered extensively in isolation, however many applications require the features in tandem. Fundamentally, lineage enables simple and consistent representation of uncertain data, it correlates uncertainty in query results with uncertainty in the input data, and query processing with lineage and uncertainty together presents computational benefits over treating them separately.We show that the ULDB representation is complete, and that it permits straightforward implementation of many relational operations. We define two notions of ULDB minimality--data-minimal and lineage-minimal--and study minimization of ULDB representations under both notions. With lineage, derived relations are no longer self-contained: their uncertainty depends on uncertainty in the base data. We provide an algorithm for the new operation of extracting a database subset in the presence of interconnected uncertainty. Finally, we show how ULDBs enable a new approach to query processing in probabilistic databases.ULDBs form the basis of the Trio system under development at Stanford.
| Year | Citations | |
|---|---|---|
Page 1
Page 1