Abstractive and Extractive Text Summarization using Document Context\n Vector and Recurrent Neural Networks

Abstract

Sequence to sequence (Seq2Seq) learning has recently been used for\nabstractive and extractive summarization. In current study, Seq2Seq models have\nbeen used for eBay product description summarization. We propose a novel\nDocument-Context based Seq2Seq models using RNNs for abstractive and extractive\nsummarizations. Intuitively, this is similar to humans reading the title,\nabstract or any other contextual information before reading the document. This\ngives humans a high-level idea of what the document is about. We use this idea\nand propose that Seq2Seq models should be started with contextual information\nat the first time-step of the input to obtain better summaries. In this manner,\nthe output summaries are more document centric, than being generic, overcoming\none of the major hurdles of using generative models. We generate\ndocument-context from user-behavior and seller provided information. We train\nand evaluate our models on human-extracted-golden-summaries. The\ndocument-contextual Seq2Seq models outperform standard Seq2Seq models.\nMoreover, generating human extracted summaries is prohibitively expensive to\nscale, we therefore propose a semi-supervised technique for extracting\napproximate summaries and using it for training Seq2Seq models at scale.\nSemi-supervised models are evaluated against human extracted summaries and are\nfound to be of similar efficacy. We provide side by side comparison for\nabstractive and extractive summarizers (contextual and non-contextual) on same\nevaluation dataset. Overall, we provide methodologies to use and evaluate the\nproposed techniques for large document summarization. Furthermore, we found\nthese techniques to be highly effective, which is not the case with existing\ntechniques.\n