Publication | Closed Access
Learning to Refactor Action and Co-occurrence Features for Temporal Action Localization
43
Citations
45
References
2022
Year
Artificial IntelligenceEngineeringMachine LearningVideo SummarizationRefactor ActionVideo RetrievalVideo InterpretationImage AnalysisPattern RecognitionRobot LearningVideo TransformerCognitive ScienceMachine VisionAction ContentAction PatternSubtle Human ActionsAction Model LearningComputer ScienceVideo UnderstandingTemporal Action LocalizationDeep LearningComputer VisionCo-occurrence FeaturesEye TrackingActivity Recognition
The main challenge of Temporal Action Localization is to retrieve subtle human actions from various co-occurring ingredients, e.g., context and background, in an untrimmed video. While prior approaches have achieved substantial progress through devising advanced action detectors, they still suffer from these co-occurring ingredients which often dominate the actual action content in videos. In this paper, we explore two orthogonal but complementary aspects of a video snippet, i.e., the action features and the co-occurrence features. Especially, we develop a novel auxiliary task by decoupling these two types of features within a video snippet and recombining them to generate a new feature representation with more salient action information for accurate action localization. We term our method RefactorNet, which first explicitly factorizes the action content and regularizes its co-occurrence features, and then synthesizes a new action-dominated video representation. Extensive experimental results and ablation studies on THUMOS14 and ActivityNet v 1.3 demonstrate that our new representation, combined with a simple action detector, can significantly improve the action localization performance.
| Year | Citations | |
|---|---|---|
Page 1
Page 1