Publication | Closed Access
VLAD3: Encoding Dynamics of Deep Features for Action Recognition
84
Citations
32
References
2016
Year
Unknown Venue
Encoding DynamicsEngineeringMachine LearningDeep DynamicsAction Recognition (Movement Science)Action Recognition (Computer Vision)Video InterpretationDeep FeaturesImage AnalysisData SciencePattern RecognitionVideo TransformerHealth SciencesMachine VisionSpatiotemporal DiagnosticsComputer ScienceVideo UnderstandingDeep LearningComputer VisionVideo AnalysisVideo DynamicsActivity Recognition
Previous approaches to action recognition with deep features tend to process video frames only within a small temporal region, and do not model long-range dynamic information explicitly. However, such information is important for the accurate recognition of actions, especially for the discrimination of complex activities that share sub-actions, and when dealing with untrimmed videos. Here, we propose a representation, VLAD for Deep Dynamics (VLAD <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">3</sup> ), that accounts for different levels of video dynamics. It captures short-term dynamics with deep convolutional neural network features, relying on linear dynamic systems (LDS) to model medium-range dynamics. To account for long-range inhomogeneous dynamics, a VLAD descriptor is derived for the LDS and pooled over the whole video, to arrive at the final VLAD <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">3</sup> representation. An extensive evaluation was performed on Olympic Sports, UCF101 and THUMOS15, where the use of the VLAD <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">3</sup> representation leads to state-of-the-art results.
| Year | Citations | |
|---|---|---|
Page 1
Page 1