Publication | Closed Access
Scalable lineage capture for debugging DISC analytics
54
Citations
21
References
2013
Year
Unknown Venue
Software MaintenanceCluster ComputingEngineeringBig Data AnalyticsComputer ArchitectureSoftware EngineeringSoftware AnalysisData ScienceRecord-level Data LineageData-intensive PlatformData IntegrationQuery EngineData ManagementHigh-performance Data AnalyticsDisc AnalyticsComputer ScienceData Stream ManagementDebuggerData-intensive ComputingFlexible InstrumentationProgram AnalysisSoftware TestingParallel ProgrammingSystem SoftwareMassive Data ProcessingBig Data
A fundamental challenge for big-data analytics is how to efficiently tune and debug multi-step dataflows. This paper presents Newt, a scalable architecture for capturing and using record-level data lineage to discover and resolve errors in analytics. Newt's flexible instrumentation allows system developers to collect this fine-grain lineage from a range of data intensive scalable computing (DISC) architectures, actively recording the flow of data through multi-step, user-defined transformations. Newt pairs this API with a scale-out, fault-tolerant lineage store and query engine.
| Year | Citations | |
|---|---|---|
Page 1
Page 1