BanditSum: Extractive Summarization as a Contextual Bandit

Abstract

In this work, we propose a novel method for training neural networks to perform singledocument extractive summarization without heuristically-generated extractive labels. We call our approach BANDITSUM as it treats extractive summarization as a contextual bandit (CB) problem, where the model receives a document to summarize (the context), and chooses a sequence of sentences to include in the summary (the action). A policy gradient reinforcement learning algorithm is used to train the model to select sequences of sentences that maximize ROUGE score. We perform a series of experiments demonstrating that BANDITSUM is able to achieve ROUGE scores that are better than or comparable to the state-of-the-art for extractive summarization, and converges using significantly fewer update steps than competing approaches. In addition, we show empirically that BANDIT-SUM performs significantly better than competing approaches when good summary sentences appear late in the source document.

References

Page 1

	Year	Citations

Page 1