Publication | Closed Access
SCODED: Statistical Constraint Oriented Data Error Detection
30
Citations
55
References
2020
Year
Unknown Venue
EngineeringConstraintsVerificationData PreparationInformation ForensicsSoftware AnalysisFormal VerificationError DetectionConstraint SolvingData IntegrityData ScienceData MiningData IntegrationData ManagementStatisticsRuntime VerificationKnowledge DiscoveryComputer ScienceData CleansingData ValidationStatistical ConstraintsProgram AnalysisAutomated ReasoningSoftware TestingFormal MethodsBusinessData TreatmentSc Violation DetectionData Modeling
Statistical Constraints (SCs) play an important role in statistical modeling and analysis. This paper brings the concept to data cleaning and studies how to leverage SCs for error detection. SCs provide a novel approach that has various application scenarios and works harmoniously with downstream statistical modeling. Entailment relationships between SCs and integrity constraints provide analytical insight into SCs. We develop SCODED, an SC-Oriented Data Error Detection system, comprising two key components: (1) SC Violation Detection : checks whether an SC is violated on a given dataset, and (2) Error Drill Down : identifies the top-k records that contribute most to the violation of an SC. Experiments on synthetic and real-world data show that SCs are effective in detecting data errors that violate them, compared to state-of-the-art approaches.
| Year | Citations | |
|---|---|---|
Page 1
Page 1