Long-short term memory for emotional recognition with variable length speech

Abstract

Despite many kinds of features using for speech emotional recognition task, they are severely restricted due to the same dimension of features extracting from different length of speech. Therefore, frame-level features reserving temporal information in speech waveform are extracted, whose dimension changes dynamically with the length of original speech. From the perspective of information theory, the information loss of frame- Ievel features is less than that of fixed length, and is more suitable for the input of deep learning with self-learning ability. Bidirectional long-short term memory (BiLSTM) is applied to work as a classifier and process the variable length of features. Experimental results demonstrate that the proposed method significantly outperforms the INTERSPEECH 2010 features on CASIA database.

References

Page 1

	Year	Citations

Page 1