Cross-Modality Compensation Convolutional Neural Networks for RGB-D Action Recognition

Abstract

RGB-D-based human action recognition has attracted much attention recently because it can provide more complementary information than a single modality. However, it is difficult for two modalities to effectively learn spatial-temporal information from each other. To facilitate information interaction between different modalities, a cross-modality compensation convolutional neural network (ConvNet) is proposed for human action recognition, which enhances the discriminative ability by jointly learning compensation features from the RGB and depth modalities. Moreover, we design a cross-modality compensation block (CMCB) to extract compensation features from the RGB and depth modalities. Specifically, CMCB is incorporated into two typical network architectures, ResNet and VGG, to verify the ability to improve the performance of our model. The proposed architecture has been evaluated on three challenging datasets: NTU RGB+D 120, THU-READ and PKU-MMD. We experimentally verify that our proposed model with CMCB is effective for different input types, such as pairs of raw images and dynamic images constructed from the entire RGB-D sequence, and the experimental results show that the proposed framework achieves state-of-the-art performance on all three datasets.

References

Page 1

	Year	Citations

Page 1