Publication | Open Access
ACGNet: Action Complement Graph Network for Weakly-Supervised Temporal Action Localization
65
Citations
41
References
2022
Year
EngineeringMachine LearningAction Recognition (Movement Science)Action Recognition (Computer Vision)Video RetrievalVideo InterpretationImage AnalysisPattern RecognitionVideo TransformerHealth SciencesMachine VisionFeature LearningAction PatternTemporal IncoherenceUntrimmed VideosAction Model LearningVideo UnderstandingDeep LearningComputer VisionDifferent Wtal FrameworksActivity Recognition
Weakly-supervised temporal action localization (WTAL) in untrimmed videos has emerged as a practical but challenging task since only video-level labels are available. Existing approaches typically leverage off-the-shelf segment-level features, which suffer from spatial incompleteness and temporal incoherence, thus limiting their performance. In this paper, we tackle this problem from a new perspective by enhancing segment-level representations with a simple yet effective graph convolutional network, namely action complement graph network (ACGNet). It facilitates the current video segment to perceive spatial-temporal dependencies from others that potentially convey complementary clues, implicitly mitigating the negative effects caused by the two issues above. By this means, the segment-level features are more discriminative and robust to spatial-temporal variations, contributing to higher localization accuracies. More importantly, the proposed ACGNet works as a universal module that can be flexibly plugged into different WTAL frameworks, while maintaining the end-to-end training fashion. Extensive experiments are conducted on the THUMOS'14 and ActivityNet1.2 benchmarks, where the state-of-the-art results clearly demonstrate the superiority of the proposed approach.
| Year | Citations | |
|---|---|---|
Page 1
Page 1