Publication | Closed Access
A graph model of data and workflow provenance
45
Citations
16
References
2010
Year
Unknown Venue
EngineeringSemantic WebSoftware AnalysisData ProvenanceData ScienceManagementData IntegrationProvenance RecordsData ManagementWorkflow ProvenanceKnowledge DiscoveryWorkflow Provenance AimsComputer ScienceProvenance AnalysisWorkflow-style Provenance GraphWorkflow ExecutionScientific Workflow SystemProgram AnalysisProvenance ManagementData Modeling
Provenance has been studied extensively in both database and workflow management systems, so far with little convergence of definitions or models. Provenance in databases has generally been defined for relational or complex object data, by propagating fine-grained annotations or algebraic expressions from the input to the output. This kind of provenance has been found useful in other areas of computer science: annotation databases, probabilistic databases, schema and data integration, etc. In contrast, workflow provenance aims to capture a complete description of evaluation - or enactment - of a workflow, and this is crucial to verification in scientific computation. Workflows and their provenance are often presented using graphical notation, making them easy to visualize but complicating the formal semantics that relates their run-time behavior with their provenance records. We bridge this gap by extending a previously-developed dataflow language which supports both database-style querying and workflow-style batch processing steps to produce a workflow-style provenance graph that can be explicitly queried. We define and describe the model through examples, present queries that extract other forms of provenance, and give an executable definition of the graph semantics of dataflow expressions.
| Year | Citations | |
|---|---|---|
Page 1
Page 1