Facial Action Unit-based Deep Learning Framework for Spotting Macro- and Micro-expressions in Long Video Sequences

Abstract

In this paper, we utilize facial action units (AUs) detection to construct an end-to-end deep learning framework for the macro- and micro-expressions spotting task in long video sequences. The proposed framework focuses on individual components of facial muscle movement rather than processing the whole image, which eliminates the influence of image change caused by noises, such as body or head movement. Compared with existing models deploying deep learning methods with classical Convolutional Neural Network (CNN) models, the proposed framework utilizes Gated Recurrent Unit (GRU) or Long Short-term Memory (LSTM) or our proposed Concat-CNN models to learn the characteristic correlation between AUs of distinctive frames. The Concat-CNN uses three convolutional kernels with different sizes to observe features of different duration and emphasizes both local and global mutation features by changing dimensionality (max-pooling size) of the output space. Our proposal achieves state-of-the-art performance from the aspect of overall F1-scores: 0.2019 on CAS(ME)2-cropped, 0.2736 on SAMM Long Video, and 0.2118 on CAS(ME)2, which not only outperforms the baseline but is also ranked the 3rd of FME challenge 2021 for combined datasets of CAS(ME)2-cropped and SAMM-LV.

References

Page 1

	Year	Citations

Page 1