Publication | Open Access
Beyond 100 million entities
50
Citations
24
References
2012
Year
Unknown Venue
EngineeringBusiness IntelligenceSemantic WebLarge-scale DatasetsInformation RetrievalData ScienceData MiningManagementData IntegrationSemi-structured DataSchema InformationLinked DataData ManagementEntity ResolutionVery Large DatabaseKnowledge DiscoveryComputer ScienceInformation ManagementDistributed Query ProcessingQuery OptimizationFoundation ModelQuadratic ProcessBig Data
A prerequisite for leveraging the vast amount of data available on the Web is Entity Resolution, i.e., the process of identifying and linking data that describe the same real-world objects. To make this inherently quadratic process applicable to large data sets, blocking is typically employed: entities (records) are grouped into clusters - the blocks - of matching candidates and only entities of the same block are compared. However, novel blocking techniques are required for dealing with the noisy, heterogeneous, semi-structured, user-generateddata in the Web, as traditional blocking techniques are inapplicable due to their reliance on schema information. The introduction of redundancy, improves the robustness of blocking methods but comes at the price of additional computational cost.
| Year | Citations | |
|---|---|---|
Page 1
Page 1