Publication | Closed Access
Joint Word- and Character-level Embedding CNN-RNN Models for Punctuation Restoration
16
Citations
22
References
2018
Year
Unknown Venue
EngineeringSpoken Language ProcessingLarge Language ModelRecurrent Neural NetworkCorpus LinguisticsWord EmbeddingsNatural Language ProcessingSpeech RecognitionComputational LinguisticsJoint Word-Language StudiesReal-time LanguageAutomatic PunctuationMachine TranslationSequence ModellingPunctuation SequenceDeep LearningNeural Machine TranslationPunctuation MarksSpeech ProcessingSpeech InputLinguistics
The sequence-to-sequence modelling paradigm has been successfully used in automatic punctuation of text generated by Automatic Speech Recognizers (ASR), using bidirectional Recurrent Neural Networks (RNN), which map the word and/or acoustic event sequence to the punctuation sequence. The current paper proposes to enhance the word sequence-based system with a character-level model using a Convolutional Neural Network (CNN). CNNs are known to be useful as data-driven front-ends in machine perception, hence we believe that the proposed approach is relevant for cognitive infocommunications. We also evaluate a hybrid solution when the punctuation marks are jointly determined based on character- and on word-level features, and demonstrate significant improvement in punctuation performance. The performance evaluation is executed on a Hungarian Broadcast Dataset and on the IWSLT English dataset.
| Year | Citations | |
|---|---|---|
Page 1
Page 1