Identifying Relations for Open Information Extraction

TLDR

Open Information Extraction (IE) extracts assertions from large corpora without a pre‑specified vocabulary. The study proposes two simple syntactic and lexical constraints on binary relations expressed by verbs to remedy the uninformative and incoherent extractions produced by state‑of‑the‑art Open IE systems. These constraints were implemented in ReVerb, producing a precision‑recall area more than twice that of prior extractors such as TextRunner and woepos. ReVerb achieves over 30 % of extractions at precision 0.8 or higher—a marked improvement over earlier systems—and the paper analyzes its errors to suggest future research directions.

Abstract

Open Information Extraction (IE) is the task of extracting assertions from massive corpora without requiring a pre-specified vocabulary. This paper shows that the output of state-of-the-art Open IE systems is rife with uninformative and incoherent extractions. To overcome these problems, we introduce two simple syntactic and lexical constraints on binary relations expressed by verbs. We implemented the constraints in the ReVerb Open IE system, which more than doubles the area under the precision-recall curve relative to previous extractors such as TextRunner and woepos. More than 30% of ReVerb's extractions are at precision 0.8 or higher---compared to virtually none for earlier systems. The paper concludes with a detailed analysis of ReVerb's errors, suggesting directions for future work.

References

Page 1

	Year	Citations

Page 1