Concepedia

Publication | Closed Access

Video Google: a text retrieval approach to object matching in videos

6.4K

Citations

12

References

2003

Year

Sivic, Zisserman

Unknown Venue

TLDR

The study proposes an object and scene retrieval system that locates all instances of a user‑outlined object within a video. It represents objects with viewpoint‑invariant region descriptors, tracks them over time to filter noise, and applies a text‑retrieval style inverted index with vector quantization and ranking to produce a ranked list of key frames, illustrated on two feature films. The system delivers immediate, ranked key‑frame results, effectively functioning like Google for video content.

Abstract

We describe an approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video. The object is represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion. The temporal continuity of the video within a shot is used to track the regions in order to reject unstable regions and reduce the effects of noise in the descriptors. The analogy with text retrieval is in the implementation where matches on descriptors are pre-computed (using vector quantization), and inverted file systems and document rankings are used. The result is that retrieved is immediate, returning a ranked list of key frames/shots in the manner of Google. The method is illustrated for matching in two full length feature films.

References

YearCitations

1999

16.1K

1998

15.8K

2004

3.7K

1997

1.4K

2002

1.3K

2003

952

2002

624

2005

481

2000

446

1997

264

Page 1