A Highly Compressed Accelerator With Temporal Optical Flow Feature Fusion and Tensorized LSTM for Video Action Recognition on Terminal Device

Abstract

Deep learning-based action recognition has become ubiquitous in the video analysis area; however, large neural networks require enormous computations to achieve high performance, which hinder them from mobile applications that are tightly constrained by hardware resources. In this work, we introduce a highly compact and fast neural network-based action recognition accelerator named ARA on the terminal device. We build an LSTM-based spatio-temporal action recognition model with extracted time-series features from RGB frames and flow features from optical flow fields. Then the LSTM-based spatio-temporal model is deeply compressed with tensor decomposition to further reduce redundant parameters and lessen computation overhead. Based on the datasets UCF-11, UCF-101, and HMDB51, our proposed method achieves 95.87%, 94.08%, and 75.71% classification accuracy, being comparable with other state-of-the-art methods. In particular, our proposed method significantly compresses the parameter of the LSTM model <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$215\times $ </tex-math></inline-formula> on the UCF-101 dataset. The proposed system can also achieve a fast running speed of 157.7 FPS on GPU. Furthermore, we validate the performance of the proposed system on an ARM-based terminal device; the results show it only has 0.017-s latency and 4.73-W power consumption.

References

Page 1

	Year	Citations

Page 1