Publication | Closed Access
Enriching data imputation with extensive similarity neighbors
46
Citations
15
References
2015
Year
EngineeringInformation RetrievalData ScienceData MiningMachine LearningData ImputationLarge-scale DatasetsVery Large DatabaseKnowledge DiscoveryData TreatmentData IntegrationComputer ScienceData CleansingExtensive Similarity NeighborsData ManagementStatisticsSimilarity SearchIncomplete Information
Incomplete information often occur along with many database applications, e.g., in data integration, data cleaning or data exchange. The idea of data imputation is to fill the missing data with the values of its neighbors who share the same information. Such neighbors could either be identified certainly by editing rules or statistically by relational dependency networks. Unfortunately, owing to data sparsity, the number of neighbors (identified w.r.t. value equality) is rather limited, especially in the presence of data values with variances. In this paper, we argue to extensively enrich similarity neighbors by similarity rules with tolerance to small variations. More fillings can thus be acquired that the aforesaid equality neighbors fail to reveal. To fill the missing values more , we study the problem of maximizing the missing data imputation. Our major contributions include (1) the np-hardness analysis on solving and approximating the problem, (2) exact algorithms for tackling the problem, and (3) efficient approximation with performance guarantees. Experiments on real and synthetic data sets demonstrate that the filling accuracy can be improved.
| Year | Citations | |
|---|---|---|
Page 1
Page 1