Annotating and searching web tables using entities, types and relationships

TLDR

Web tables ubiquitously convey entity references, attributes, and relationships that represent relational world knowledge more richly than free text, yet the absence of a formal schema prevents search engines from exploiting this high‑quality information. The study proposes a joint graphical‑model approach that simultaneously annotates table cells with entities, columns with types, and column pairs with relations. The authors develop this graphical model and evaluate its effect on a prototype relational Web search system. Experiments on YAGO, DB‑Pedia, Wikipedia tables, and 25 million HTML tables demonstrate that the joint model outperforms baselines and improves the prototype search tool beyond plain text indexing.

Abstract

Tables are a universal idiom to present relational data. Billions of tables on Web pages express entity references, attributes and relationships. This representation of relational world knowledge is usually considerably better than completely unstructured, free-format text. At the same time, unlike manually-created knowledge bases, relational information mined from "organic" Web tables need not be constrained by availability of precious editorial time. Unfortunately, in the absence of any formal, uniform schema imposed on Web tables, Web search cannot take advantage of these high-quality sources of relational information. In this paper we propose new machine learning techniques to annotate table cells with entities that they likely mention, table columns with types from which entities are drawn for cells in the column, and relations that pairs of table columns seek to express. We propose a new graphical model for making all these labeling decisions for each table simultaneously, rather than make separate local decisions for entities, types and relations. Experiments using the YAGO catalog, DB-Pedia, tables from Wikipedia, and over 25 million HTML tables from a 500 million page Web crawl uniformly show the superiority of our approach. We also evaluate the impact of better annotations on a prototype relational Web search tool. We demonstrate clear benefits of our annotations beyond indexing tables in a purely textual manner.

References

Page 1

	Year	Citations

Page 1