Concepedia

Publication | Closed Access

Knowledge vault

1.5K

Citations

40

References

2014

Year

TLDR

Large-scale knowledge bases such as Wikipedia, Freebase, YAGO, Microsoft’s Satori, and Google’s Knowledge Graph have proliferated, yet prior text‑based extraction approaches are often noisy. This work introduces Knowledge Vault, a web‑scale probabilistic knowledge base that automatically expands knowledge by combining extractions from text, tables, page structure, human annotations, and existing repositories. We fuse these diverse sources using supervised machine learning to produce a probabilistic inference system that assigns calibrated correctness probabilities to facts. Knowledge Vault is substantially larger than any prior structured repository and, as shown in multiple studies, the probabilistic inference system effectively integrates the varied information sources.

Abstract

Recent years have witnessed a proliferation of large-scale knowledge bases, including Wikipedia, Freebase, YAGO, Microsoft's Satori, and Google's Knowledge Graph. To increase the scale even further, we need to explore automatic methods for constructing knowledge bases. Previous approaches have primarily focused on text-based extraction, which can be very noisy. Here we introduce Knowledge Vault, a Web-scale probabilistic knowledge base that combines extractions from Web content (obtained via analysis of text, tabular data, page structure, and human annotations) with prior knowledge derived from existing knowledge repositories. We employ supervised machine learning methods for fusing these distinct information sources. The Knowledge Vault is substantially bigger than any previously published structured knowledge repository, and features a probabilistic inference system that computes calibrated probabilities of fact correctness. We report the results of multiple studies that explore the relative utility of the different information sources and extraction methods.

References

YearCitations

Page 1