SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in\n Summarization

Abstract

In the summarization domain, a key requirement for summaries is to be\nfactually consistent with the input document. Previous work has found that\nnatural language inference (NLI) models do not perform competitively when\napplied to inconsistency detection. In this work, we revisit the use of NLI for\ninconsistency detection, finding that past work suffered from a mismatch in\ninput granularity between NLI datasets (sentence-level), and inconsistency\ndetection (document level). We provide a highly effective and light-weight\nmethod called SummaCConv that enables NLI models to be successfully used for\nthis task by segmenting documents into sentence units and aggregating scores\nbetween pairs of sentences. On our newly introduced benchmark called SummaC\n(Summary Consistency) consisting of six large inconsistency detection datasets,\nSummaCConv obtains state-of-the-art results with a balanced accuracy of 74.4%,\na 5% point improvement compared to prior work. We make the models and datasets\navailable: https://github.com/tingofurro/summac\n