Concepedia

Publication | Closed Access

Joint Word- and Character-level Embedding CNN-RNN Models for Punctuation Restoration

16

Citations

22

References

2018

Year

Abstract

The sequence-to-sequence modelling paradigm has been successfully used in automatic punctuation of text generated by Automatic Speech Recognizers (ASR), using bidirectional Recurrent Neural Networks (RNN), which map the word and/or acoustic event sequence to the punctuation sequence. The current paper proposes to enhance the word sequence-based system with a character-level model using a Convolutional Neural Network (CNN). CNNs are known to be useful as data-driven front-ends in machine perception, hence we believe that the proposed approach is relevant for cognitive infocommunications. We also evaluate a hybrid solution when the punctuation marks are jointly determined based on character- and on word-level features, and demonstrate significant improvement in punctuation performance. The performance evaluation is executed on a Hungarian Broadcast Dataset and on the IWSLT English dataset.

References

YearCitations

Page 1