Skeleton-Based Action Recognition With Shift Graph Convolutional Network

TLDR

Skeleton‑based action recognition has attracted attention, but graph convolutional networks suffer from high computational cost and inflexible receptive fields. This work introduces Shift‑GCN to address both the heavy computation and limited expressiveness of conventional GCNs. Shift‑GCN replaces costly regular graph convolutions with shift graph operations and lightweight point‑wise convolutions, enabling flexible spatial and temporal receptive fields. Across three benchmark datasets, Shift‑GCN surpasses state‑of‑the‑art methods while reducing computational complexity by more than tenfold.

Abstract

Action recognition with skeleton data is attracting more attention in computer vision. Recently, graph convolutional networks (GCNs), which model the human body skeletons as spatiotemporal graphs, have obtained remarkable performance. However, the computational complexity of GCN-based methods are pretty heavy, typically over 15 GFLOPs for one action sample. Recent works even reach about 100 GFLOPs. Another shortcoming is that the receptive fields of both spatial graph and temporal graph are inflexible. Although some works enhance the expressiveness of spatial graph by introducing incremental adaptive modules, their performance is still limited by regular GCN structures. In this paper, we propose a novel shift graph convolutional network (Shift-GCN) to overcome both shortcomings. Instead of using heavy regular graph convolutions, our Shift-GCN is composed of novel shift graph operations and lightweight point-wise convolutions, where the shift graph operations provide flexible receptive fields for both spatial graph and temporal graph. On three datasets for skeleton-based action recognition, the proposed Shift-GCN notably exceeds the state-of-the-art methods with more than 10 times less computational complexity.

References

Page 1

	Year	Citations

Page 1