Publication | Closed Access
Towards Temporal Modelling of Categorical Speech Emotion Recognition
53
Citations
31
References
2018
Year
Unknown Venue
EngineeringCategorical LabelTowards Temporal ModellingSpoken Language ProcessingRecurrent Neural NetworkSpeech RecognitionNatural Language ProcessingData ScienceComputational LinguisticsAffective ComputingLanguage StudiesIemocap CorpusReal-time LanguageDeep LearningSpeech AnalysisSpeech CommunicationSpeech ProcessingSpeech PerceptionEmotionLinguisticsEmotion Recognition
To model the categorical speech emotion recognition task in a temporal manner, the first challenge arising is how to transfer the categorical label for each utterance into a label sequence.To settle this, we make a hypothesis that an utterance is consisting of emotional and non-emotional segments, and these non-emotional segments correspond to silent regions, short pauses, transitions between phonemes, unvoiced phonemes, etc.With this hypothesis, we propose to treat an utterance's label sequence as a chain of two states: the emotional state denoting the emotional frame and Null denoting the non-emotional frame.Then, we exploit a recurrent neural network based connectionist temporal classification model to automatically label and align an utterance's emotional segments with emotional labels, while non-emotional segments with Nulls.Experimental results on the IEMOCAP corpus validate our hypothesis and also demonstrate the effectiveness of our proposed method compared to the state-of-the-art algorithms.
| Year | Citations | |
|---|---|---|
Page 1
Page 1