Consumer video understanding

TLDR

Recognizing visual content in unconstrained videos is crucial, yet existing corpora lack scale and diversity, limiting progress and leaving many videos with minimal textual annotation that could benefit from automatic content analysis. The authors introduce and release CCV, a new database of 9,317 web videos spanning 20 semantic categories such as events, scenes, and objects. CCV was curated with careful selection to ensure consumer relevance and original content, avoiding post‑editing.

Abstract

Recognizing visual content in unconstrained videos has become a very important problem for many applications. Existing corpora for video analysis lack scale and/or content diversity, and thus limited the needed progress in this critical area. In this paper, we describe and release a new database called CCV, containing 9,317 web videos over 20 semantic categories, including events like "baseball" and "parade", scenes like "beach", and objects like "cat". The database was collected with extra care to ensure relevance to consumer interest and originality of video content without post-editing. Such videos typically have very little textual annotation and thus can benefit from the development of automatic content analysis techniques.

References

Page 1

	Year	Citations

Page 1