Multi task sequence learning for depression scale prediction from video

Abstract

Depression is a typical mood disorder, which affects people in mental and even physical problems. People who suffer depression always behave abnormal in visual behavior and the voice. In this paper, an audio visual based multimodal depression scale prediction system is proposed. Firstly, features are extracted from video and audio are fused in feature level to represent the audio visual behavior. Secondly, long short memory recurrent neural network (LSTM-RNN) is utilized to encode the dynamic temporal information of the abnormal audio visual behavior. Thirdly, emotion information is utilized by multi-task learning to boost the performance further. The proposed approach is evaluated on the Audio-Visual Emotion Challenge (AVEC2014) dataset. Experiments results show the dimensional emotion recognition helps to depression scale prediction.

References

Page 1

	Year	Citations

Page 1