Publication | Closed Access
On-the-fly entity-aware query processing in the presence of linkage
72
Citations
26
References
2010
Year
EngineeringSemantic WebProbabilistic DatabasesData StructureEntity LinkageInformation RetrievalData ScienceManagementData IntegrationSemi-structured DataLinkage InformationLinked DataData ManagementEntity DisambiguationKnowledge DiscoveryComputer ScienceDistributed Query ProcessingDatabase TheoryQuery OptimizationRecord LinkageAutomated ReasoningData Modeling
Entity linkage is central to data integration and cleaning, and traditional methods merge data structures using computed similarity before answering queries. The authors propose a novel framework for entity linkage that incorporates uncertainty. The framework stores possible linkages with belief values and applies a probabilistic query answering technique that performs merges at run time based on the query, allowing results to include structures generated through reasoning over linkages. Experimental evaluation shows that this approach enables run‑time merges, generates implicit structures, and supports cross‑linked query evaluation—capabilities not offered by existing probabilistic databases.
Entity linkage is central to almost every data integration and data cleaning scenario. Traditional techniques use some computed similarity among data structure to perform merges and then answer queries on the merged data. We describe a novel framework for entity linkage with uncertainty. Instead of using the linkage information to merge structures a-priori, possible linkages are stored alongside the data with their belief value. A new probabilistic query answering technique is used to take the probabilistic linkage into consideration. The framework introduces a series of novelties: (i) it performs merges at run time based not only on existing linkages but also on the given query; (ii) it allows results that may contain structures not explicitly represented in the data, but generated as a result of a reasoning on the linkages; and (iii) enables an evaluation of the query conditions that spans across linked structures, offering a functionality not currently supported by any traditional probabilistic databases. We formally define the semantics, describe an efficient implementation and report on the findings of our experimental evaluation.
| Year | Citations | |
|---|---|---|
Page 1
Page 1