Publication | Open Access
BigSift: automated debugging of big data analytics in data-intensive scalable computing
18
Citations
11
References
2018
Year
Unknown Venue
Cluster ComputingEngineeringBig Data AnalyticsAutomated DebuggingSoftware AnalysisData ProvenanceBig Data InfrastructureData ScienceData-intensive PlatformData IntegrationParallel ComputingData ManagementHigh-performance Data AnalyticsError DebuggingData-intensive Scalable ComputingComputer ScienceData-intensive ComputingProgram AnalysisSoftware TestingParallel ProgrammingMassive Data ProcessingBig Data
Developing Big Data Analytics often involves trial and error debugging, due to the unclean nature of datasets or wrong assumptions made about data. When errors (e.g. program crash, outlier results, etc.) arise, developers are often interested in pinpointing the root cause of errors. To address this problem, BigSift takes an Apache Spark program, a user-defined test oracle function, and a dataset as input and outputs a minimum set of input records that reproduces the same test failure by combining the insights from delta debugging with data provenance. The technical contribution of BigSift is the design of systems optimizations that bring automated debugging closer to a reality for data intensive scalable computing.
| Year | Citations | |
|---|---|---|
Page 1
Page 1