Publication | Open Access
Challenges in Data-to-Document Generation
53
Citations
29
References
2017
Year
Structured PredictionEngineeringMachine LearningSemantic WebRecent Neural ModelsCorpus LinguisticsText MiningNatural Language ProcessingData GenerationInformation RetrievalData ScienceComputational LinguisticsData-to-document GenerationData IntegrationShort Descriptive TextsLanguage StudiesMachine TranslationSequence ModellingComputer ScienceNeural Machine TranslationRetrieval Augmented GenerationDescriptive DocumentsStructured DocumentLinguisticsLanguage Generation
Recent neural models have made significant progress in generating short descriptive texts conditioned on a small number of database records. This work proposes a slightly more difficult data‑to‑text generation task and investigates how effective current approaches are on it. The authors introduce a new large‑scale corpus of data records paired with descriptive documents, propose extractive evaluation methods, and establish baseline results using current neural generation methods. Experiments show that while the models produce fluent text, they fail to convincingly approximate human documents, and templated baselines outperform them on some metrics, though copy‑ and reconstruction‑based extensions yield noticeable improvements.
Recent neural models have shown significant progress on the problem of generating short descriptive texts conditioned on a small number of database records. In this work, we suggest a slightly more difficult data-to-text generation task, and investigate how effective current approaches are on this task. In particular, we introduce a new, large-scale corpus of data records paired with descriptive documents, propose a series of extractive evaluation methods for analyzing performance, and obtain baseline results using current neural generation methods. Experiments show that these models produce fluent text, but fail to convincingly approximate human-generated documents. Moreover, even templated baselines exceed the performance of these neural models on some metrics, though copy- and reconstruction-based extensions lead to noticeable improvements.
| Year | Citations | |
|---|---|---|
Page 1
Page 1