Publication | Open Access
An Image-based Deep Spectrum Feature Representation for the Recognition of Emotional Speech
149
Citations
27
References
2017
Year
Unknown Venue
EngineeringMachine LearningBiometricsSpeech RecognitionImage AnalysisPattern RecognitionAffective ComputingRobust Speech RecognitionVoice RecognitionHealth SciencesDeep Spectrum FeaturesBoaw ParadigmDeep LearningHigher LayersDistant Speech RecognitionSpeech CommunicationSpeech AnalysisEmotional SpeechMulti-speaker Speech RecognitionSpeech ProcessingSpeech PerceptionEmotion RecognitionSpeaker Recognition
The outputs of the higher layers of deep pre-trained convolutional neural networks (CNNs) have consistently been shown to provide a rich representation of an image for use in recognition tasks. This study explores the suitability of such an approach for speech-based emotion recognition tasks. First, we detail a new acoustic feature representation, denoted as deep spectrum features, derived from feeding spectrograms through a very deep image classification CNN and forming a feature vector from the activations of the last fully connected layer. We then compare the performance of our novel features with standardised brute-force and bag-of-audio-words (BoAW) acoustic feature representations for 2- and 5-class speech-based emotion recognition in clean, noisy and denoised conditions. The presented results show that image-based approaches are a promising avenue of research for speech-based recognition tasks. Key results indicate that deep-spectrum features are comparable in performance with the other tested acoustic feature representations in matched for noise type train-test conditions; however, the BoAW paradigm is better suited to cross-noise-type train-test conditions.
| Year | Citations | |
|---|---|---|
Page 1
Page 1