Modeling video evolution for action recognition

TLDR

The paper proposes a method to capture video‑wide temporal information for action recognition. The method learns per‑video ranking functions via a ranking machine, uses their parameters as a video representation, and is evaluated on Hollywood2, HMDB51, MPII‑Cooking, and Chalearn datasets. The approach effectively captures temporal evolution, is interpretable, fast, and yields a 7–10% absolute improvement in action recognition, complementing existing appearance and motion methods.

Abstract

In this paper we present a method to capture video-wide temporal information for action recognition. We postulate that a function capable of ordering the frames of a video temporally (based on the appearance) captures well the evolution of the appearance within the video. We learn such ranking functions per video via a ranking machine and use the parameters of these as a new video representation. The proposed method is easy to interpret and implement, fast to compute and effective in recognizing a wide variety of actions. We perform a large number of evaluations on datasets for generic action recognition (Hollywood2 and HMDB51), fine-grained actions (MPII- cooking activities) and gestures (Chalearn). Results show that the proposed method brings an absolute improvement of 7–10%, while being compatible with and complementary to further improvements in appearance and local motion based methods.

References

Page 1

	Year	Citations

Page 1