Publication | Closed Access
Towards reliable interactive data cleaning
55
Citations
28
References
2016
Year
Unknown Venue
EngineeringBusiness IntelligenceData PreparationSemantic WebData CleaningData ScienceManagementData IntegrationData Pre-processingData ManagementCleaning PipelinesData ModelingKnowledge DiscoveryComputer ScienceData CleansingUser SurveyData ManipulationData EngineeringData TreatmentBig Data
Data cleaning is frequently an iterative process tailored to the requirements of a specific analysis task. The design and implementation of iterative data cleaning tools presents novel challenges, both technical and organizational, to the community. In this paper, we present results from a user survey (N = 29) of data analysts and infrastructure engineers from industry and academia. We highlight three important themes: (1) the iterative nature of data cleaning, (2) the lack of rigor in evaluating the correctness of data cleaning, and (3) the disconnect between the analysts who query the data and the infrastructure engineers who design the cleaning pipelines. We conclude by presenting a number of recommendations for future work in which we envision an interactive data cleaning system that accounts for the observed challenges.
| Year | Citations | |
|---|---|---|
Page 1
Page 1