Publication | Closed Access
Parallel Progressive Approach to Entity Resolution Using MapReduce
24
Citations
21
References
2017
Year
Unknown Venue
Cluster ComputingEngineeringResolution CostMap-reduceSemantic WebProgressive ApproachInformation RetrievalData ScienceDatabase SupportManagementData IntegrationParallel ComputingBig DataData ManagementEntity ResolutionParallel DatabaseParallel Progressive ApproachKnowledge DiscoveryComputer ScienceBig Data SearchDistributed Query ProcessingParallel ProgrammingMassive Data ProcessingData Modeling
Entity resolution (ER) is the process of identifying which entities in a dataset represent the same real-world object. This paper proposes a progressive approach to ER using MapReduce. In contrast to traditional ER, progressive ER aims to resolve the dataset such that the rate at which the data quality improves is maximized. Such a progressive approach is useful for many emerging analytical applications that require low latency response and/or in situations where the underlying resources are constrained or costly to use. Experiments with real-world datasets demonstrate the ability of our approach to generate high-quality results using limited amounts of resolution cost.
| Year | Citations | |
|---|---|---|
Page 1
Page 1