Automatic Poetry Generation with Mutual Reinforcement Learning

TLDR

Poetry is a celebrated art form, and automatic poetry generation has attracted researchers for decades, yet existing neural models rely on maximum likelihood estimation, which fails to align with human evaluation criteria. The study aims to model evaluation criteria as explicit rewards and use reinforcement learning to guide generation toward higher human‑preferred scores. We introduce a mutual reinforcement learning framework that trains two generators simultaneously, each learning from a rewarder and from the other generator, and evaluate it on Chinese poetry. Our method surpasses the current state‑of‑the‑art, achieving superior results on a strong baseline.

Abstract

Poetry is one of the most beautiful forms of human language art. As a crucial step towards computer creativity, automatic poetry generation has drawn researchers' attention for decades. In recent years, some neural models have made remarkable progress in this task. However, they are all based on maximum likelihood estimation, which only learns common patterns of the corpus and results in loss-evaluation mismatch. Human experts evaluate poetry in terms of some specific criteria, instead of word-level likelihood. To handle this problem, we directly model the criteria and use them as explicit rewards to guide gradient update by reinforcement learning, so as to motivate the model to pursue higher scores. Besides, inspired by writing theories, we propose a novel mutual reinforcement learning schema. We simultaneously train two learners (generators) which learn not only from the teacher (rewarder) but also from each other to further improve performance. We experiment on Chinese poetry. Based on a strong basic model, our method achieves better results and outperforms the current state-of-the-art method.

References

Page 1

	Year	Citations

Page 1