Concepedia

Publication | Closed Access

Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books

2K

Citations

35

References

2015

Year

TLDR

Books provide fine‑grained visual details and high‑level semantic states that evolve through a narrative. This work seeks to align books with their movie releases to supply rich, semantically detailed visual explanations beyond existing captions. The authors train an unsupervised neural sentence embedding from a large book corpus, pair it with a video‑text neural embedding for clip‑sentence similarity, and fuse the signals with a context‑aware CNN. The resulting system attains strong quantitative alignment performance and yields diverse qualitative examples demonstrating its utility across multiple tasks.

Abstract

Books are a rich source of both fine-grained information, how a character, an object or a scene looks like, as well as high-level semantics, what someone is thinking, feeling and how these states evolve through a story. This paper aims to align books to their movie releases in order to provide rich descriptive explanations for visual content that go semantically far beyond the captions available in the current datasets. To align movies and books we propose a neural sentence embedding that is trained in an unsupervised way from a large corpus of books, as well as a video-text neural embedding for computing similarities between movie clips and sentences in the book. We propose a context-aware CNN to combine information from multiple sources. We demonstrate good quantitative performance for movie/book alignment and show several qualitative examples that showcase the diversity of tasks our model can be used for.

References

YearCitations

1997

93.8K

2014

84.5K

2015

46.2K

2014

23.7K

2001

20.9K

2013

18.1K

2014

14.6K

2014

13.3K

2014

10.7K

2015

7.5K

Page 1