Publication | Closed Access
Deep Temporal Models using Identity Skip-Connections for Speech Emotion Recognition
37
Citations
33
References
2017
Year
Unknown Venue
Convolutional Neural NetworkEngineeringMachine LearningSpoken Language ProcessingMultimodal Sentiment AnalysisIdentity Skip-connectionsSpeech RecognitionNatural Language ProcessingData ScienceSparse Neural NetworkAffective ComputingDeep ArchitecturesHealth SciencesFeature LearningDeep Temporal ModelsComputer ScienceDeep LearningSpeech AnalysisSpeech CommunicationDeep Temporal ArchitecturesMulti-speaker Speech RecognitionSpeech ProcessingSpeech PerceptionEmotion Recognition
Deep architectures using identity skip-connections have demonstrated groundbreaking performance in the field of image classification. Recently, empirical studies suggested that identity skip-connections enable ensemble-like behaviour of shallow networks, and that depth is not a solo ingredient for their success. Therefore, we examine the potential of identity skip-connections for the task of Speech Emotion Recognition (SER) where moderately deep temporal architectures are often employed. To this end, we propose a novel architecture which regulates unimpeded feature flows and captures long-term dependencies via gate-based skip-connections and a memory mechanism. Our proposed architecture is compared to other state-of-the-art methods of SER and is evaluated on large aggregated corpora recorded in different contexts. Our proposed architecture outperforms the state-of-the-art methods by 9 - 15% and achieves an Unweighted Accuracy of 80.5% in an imbalanced class distribution. In addition, we examine a variant adopting simplified skip-connections of Residual Networks (ResNet) and show that gate-based skip-connections are more effective than simplified skip-connections.
| Year | Citations | |
|---|---|---|
Page 1
Page 1