Representation and retrieval of video scene by using object actions and their spatio-temporal relationships

Abstract

In this paper we present a method for representing and retrieving video sequences based on the domain-specific behavior of objects present in the scenes. The representation includes three parts: (a) Action description, representing the action performed by a single object. (b) Interaction description, which describes interactions between multiple objects and is mapped directly to the event semantics in the content domain. (c) Event Structure, which provides a set of spatial and temporal relationship functions, along with a syntax to define the necessary conditions that a particular interaction should meet. Retrieval is performed by processing Event Structures, interpreting object relationships and selecting the relevant combinations of Action descriptions which match the conditions defined in the Event Structures. We describe an implementation of this system for retrieving scenes of soccer plays from among several soccer video sequences.

References

Page 1

	Year	Citations

Page 1