Publication | Closed Access
Learning emotion-based acoustic features with deep belief networks
87
Citations
11
References
2011
Year
Unknown Venue
MusicEngineeringMachine LearningHuman Response LabelsDeep Belief NetworksMultimodal Sentiment AnalysisSocial SciencesSpeech RecognitionDominant Feature RepresentationData ScienceData MiningPattern RecognitionAffective ComputingVoice RecognitionMusic Emotion RecognitionFeature LearningAudio RetrievalComputer ScienceDeep LearningSpeech CommunicationSpeech AnalysisMusic ClassificationSpeech ProcessingEmotion Recognition
The medium of music has evolved specifically for the expression of emotions, and it is natural for us to organize music in terms of its emotional associations. But while such organization is a natural process for humans, quantifying it empirically proves to be a very difficult task, and as such no dominant feature representation for music emotion recognition has yet emerged. Much of the difficulty in developing emotion-based features is the ambiguity of the ground-truth. Even using the smallest time window, opinions on the emotion are bound to vary and reflect some disagreement between listeners. In previous work, we have modeled human response labels to music in the arousal-valence (A-V) representation of affect as a time-varying, stochastic distribution. Current methods for automatic detection of emotion in music seek performance increases by combining several feature domains (e.g. loudness, timbre, harmony, rhythm). Such work has focused largely in dimensionality reduction for minor classification performance gains, but has provided little insight into the relationship between audio and emotional associations. In this new work we seek to employ regression-based deep belief networks to learn features directly from magnitude spectra. While the system is applied to the specific problem of music emotion recognition, it could be easily applied to any regression-based audio feature learning problem.
| Year | Citations | |
|---|---|---|
Page 1
Page 1