Publication | Closed Access
DepAudioNet
263
Citations
32
References
2016
Year
Unknown Venue
Effective AudioSample ImbalanceMachine LearningEngineeringMulti-speaker Speech RecognitionAffective ComputingRobust Speech RecognitionSpeech ProcessingDepression ClassificationSocial SciencesVoice RecognitionDeep LearningEmotion RecognitionSpeech AnalysisSpeech Recognition
This paper presents a novel and effective audio based method on depression classification. It focuses on two important issues, \emph{i.e.} data representation and sample imbalance, which are not well addressed in literature. For the former one, in contrast to traditional shallow hand-crafted features, we propose a deep model, namely DepAudioNet, to encode the depression related characteristics in the vocal channel, combining Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) to deliver a more comprehensive audio representation. For the latter one, we introduce a random sampling strategy in the model training phase to balance the positive and negative samples, which largely alleviates the bias caused by uneven sample distribution. Evaluations are carried out on the DAIC-WOZ dataset for the Depression Classification Sub-challenge (DCC) at the 2016 Audio-Visual Emotion Challenge (AVEC), and the experimental results achieved clearly demonstrate the effectiveness of the proposed approach.
| Year | Citations | |
|---|---|---|
Page 1
Page 1