Publication | Closed Access
From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding
370
Citations
26
References
2013
Year
Unknown Venue
Artificial IntelligenceEngineeringMachine LearningVideo ProcessingDetailed Action UnderstandingAction LanguageSocial SciencesVideo InterpretationNatural Language ProcessingNew Human ActionImage AnalysisData SciencePattern RecognitionComputational LinguisticsStrongly-supervised RepresentationAffective ComputingVideo Content AnalysisRobot LearningHuman ActionsCognitive ScienceMachine VisionAction PatternAction Model LearningComputer ScienceVideo UnderstandingDeep LearningPatch ClassifiersComputer VisionEye TrackingActivity RecognitionLinguistics
This paper presents a novel approach for analyzing human actions in non-scripted, unconstrained video settings based on volumetric, x-y-t, patch classifiers, termed actemes. Unlike previous action-related work, the discovery of patch classifiers is posed as a strongly-supervised process. Specifically, key point labels (e.g., position) across space time are used in a data-driven training process to discover patches that are highly clustered in the space time key point configuration space. To support this process, a new human action dataset consisting of challenging consumer videos is introduced, where notably the action label, the 2D position of a set of key points and their visibilities are provided for each video frame. On a novel input video, each acteme is used in a sliding volume scheme to yield a set of sparse, non-overlapping detections. These detections provide the intermediate substrate for segmenting out the action. For action classification, the proposed representation shows significant improvement over state-of-the-art low-level features, while providing spatiotemporal localization as additional output, which sheds further light into detailed action understanding.
| Year | Citations | |
|---|---|---|
Page 1
Page 1