Publication | Closed Access
Using regional saliency for speech emotion recognition
124
Citations
27
References
2017
Year
Unknown Venue
Convolutional Neural NetworkEngineeringMachine LearningAffective NeuroscienceMultimodal Sentiment AnalysisSocial SciencesSpeech RecognitionNatural Language ProcessingData ScienceAffective ComputingVoice RecognitionDeep LearningEmotion RecognitionSpeech CommunicationSpeech AnalysisAcoustic FeaturesConvolutional Neural NetworksSpeech ProcessingSpeech PerceptionEmotionLinguisticsRegional Saliency
In this paper, we show that convolutional neural networks can be directly applied to temporal low-level acoustic features to identify emotionally salient regions without the need for defining or applying utterance-level statistics. We show how a convolutional neural network can be applied to minimally hand-engineered features to obtain competitive results on the IEMOCAP and MSP-IMPROV datasets. In addition, we demonstrate that, despite their common use across most categories of acoustic features, utterance-level statistics may obfuscate emotional information. Our results suggest that convolutional neural networks with Mel Filterbanks (MFBs) can be used as a replacement for classifiers that rely on features obtained from applying utterance-level statistics.
| Year | Citations | |
|---|---|---|
Page 1
Page 1