Publication | Closed Access
Chimera: a virtual data system for representing, querying, and automating data derivation
608
Citations
26
References
2002
Year
Unknown Venue
Cluster ComputingEngineeringSemantic WebData EcosystemSoftware AnalysisData ProvenanceVirtual Data CatalogVirtual Data SystemInformation RetrievalData ScienceDatabase SystemDatabase SupportData-intensive PlatformManagementData IntegrationVirtual DataData ManagementData ModelingKnowledge DiscoveryComputer ScienceDatabase TechnologyDatabase TheoryData-intensive ComputingData EngineeringAutomated ReasoningData DerivationCloud ComputingData VirtualizationBig Data
Scientific data often originates from computational derivations rather than direct measurements. The authors propose that explicitly representing derivation procedures can document provenance, reveal available methods, and enable on-demand data generation. They built Chimera, a virtual data system featuring a catalog of derivation procedures and a language interpreter that translates user requests into database operations, and integrated it with distributed data‑grid services for on-demand execution of computation schedules. Applying Chimera to reconstruct simulated collision event data and to search sky survey data for galactic clusters yielded promising results.
A lot of scientific data is not obtained from measurements but rather derived from other data by the application of computational procedures. We hypothesize that explicit representation of these procedures can enable documentation of data provenance, discovery of available methods, and on-demand data generation (so-called "virtual data"). To explore this idea, we have developed the Chimera virtual data system, which combines a virtual data catalog for representing data derivation procedures and derived data, with a virtual data language interpreter that translates user requests into data definition and query operations on the database. We couple the Chimera system with distributed "data grid" services to enable on-demand execution of computation schedules constructed from database queries. We have applied this system to two challenge problems, the reconstruction of simulated collision event data from a high-energy physics experiment, and searching digital sky survey data for galactic clusters, with promising results.
| Year | Citations | |
|---|---|---|
Page 1
Page 1