A Slow-I-Fast-P Architecture for Compressed Video Action Recognition

Abstract

Compressed video action recognition has drawn growing attention for the storage and processing advantages of compressed videos over original raw videos. While the past few years have witnessed remarkable progress in this problem, most existing approaches rely on RGB frames from raw videos and require multi-step training. In this paper, we propose a novel Slow-I-Fast-P (SIFP) neural network model for compressed video action recognition. It consists of the slow I pathway receiving a sparse sampling I-frame clip and the fast P pathway receiving a dense sampling pseudo optical flow clip. An unsupervised estimation method and a new loss function are designed to generate pseudo optical flows in compressed videos. Our model eliminates the dependence on the traditional optical flows calculated from raw videos. The model is trained in an end-to-end way. The proposed method is evaluated on the challenging HMDB51 and UCF101 datasets. The extensive comparison results and ablation studies demonstrate the effectiveness and strength of the proposed method.

References

Page 1

	Year	Citations

Page 1