Automatic Title Generation for Text with Pre-trained Transformer Language Model

Abstract

In this paper, we propose a novel approach to Automatic Title Generation for a given text using a pre-trained Transformer Language Model GPT-2. The model proposes an unique approach of generating a pool of candidate titles and selecting an appropriate title among them which is then refined or de-noised to get the final title. The approach consists of a pipeline of three modules namely Generation, Selection and Refinement followed by a Scoring function. The Generation and Refinement modules are based on GPT-2, while the Selection module has a heuristic based approach. The model is able to generate accurate titles in spite of having a smaller corpus of relevant training data due to the fact that the natural language generation capabilities come from the pre-training while the model has to primarily learn task and corpus specific nuances. Additionally, Selection and Refinement modules ensure that the titles are representative of the given text and are semantically and syntactically accurate. We train our model for research paper abstracts from arXiv and evaluate it on three different test sets. Our pipeline shows promising results when evaluated on ROUGE and BLEU metrics against the test sets. In addition, we also perform human evaluation for validating the results generated by our proposed approach.

References

Page 1

	Year	Citations

Page 1