Publication | Closed Access
GLNet: Global Local Network for Weakly Supervised Action Localization
37
Citations
32
References
2019
Year
EngineeringMachine LearningVideo SummarizationLocalizationVideo InterpretationImage AnalysisData ScienceSparse Loss FunctionPattern RecognitionRobot LearningVideo TransformerAction LocalizationMachine VisionAction Model LearningVideo UnderstandingDeep LearningComputer VisionSpatio-temporal Action LocalizationVideo HallucinationGlnet Model
In this paper, we address the challenging problem of weakly supervised spatio-temporal action localization for which only video-level action labels are available during training. To solve this problem, we propose an end-to-end Global Local Network (GLNet) to predict the probability distribution simultaneously in both spatial and temporal space. The proposed GLNet model includes two key components: a local spatial module and a global temporal module. The local spatial module aims to predict the frame-level spatial distribution by encoding short-term temporal information. In particular, we propose a Region Actionness Network (RAN) to select the target region boxes from the precomputed exhaustive proposals. The global temporal module can predict temporal distribution by a long-term temporal structure modelling. Specifically, we design a temporal fusion-and-excitation architecture on the top of several clips, and trained by a sparse loss function. Therefore, the proposed GLNet model can perform spatio-temporal action localization in an end-to-end manner. We evaluate the performance of GLNet on the J-HMDB and UCF101-24 datasets. The experimental results demonstrate GLNet achieves a significant margin against other state-of-the-art weakly supervised methods and even some fully supervised methods in terms of frame mean Average Precision (mAP) and the video mAP (called frame-mAP and video-mAP, respectively).
| Year | Citations | |
|---|---|---|
Page 1
Page 1