Publication | Open Access
Cycle-SUM: Cycle-Consistent Adversarial LSTM Networks for Unsupervised Video Summarization
116
Citations
28
References
2019
Year
Natural Language ProcessingEngineeringMachine LearningSummary VideoGenerative Adversarial NetworkVideo GenerationVideo SummarizationGenerative ModelsUnsupervised Video SummarizationVideo HallucinationVideo UnderstandingManual AnnotationDeep LearningComputer VisionMulti-modal Summarization
The paper proposes an unsupervised video summarization model that does not rely on manual annotations. Cycle‑SUM employs a bi‑directional LSTM selector and a cycle‑consistent adversarial evaluator composed of forward and backward GANs to learn an information‑preserving metric that guides the selector to pick the most informative frames. Experiments on two benchmark datasets confirm that Cycle‑SUM achieves state‑of‑the‑art performance, outperforming prior baselines, and reveal a close link between mutual‑information maximization and the cycle‑learning approach.
In this paper, we present a novel unsupervised video summarization model that requires no manual annotation. The proposed model termed Cycle-SUM adopts a new cycleconsistent adversarial LSTM architecture that can effectively maximize the information preserving and compactness of the summary video. It consists of a frame selector and a cycle-consistent learning based evaluator. The selector is a bi-direction LSTM network that learns video representations that embed the long-range relationships among video frames. The evaluator defines a learnable information preserving metric between original video and summary video and “supervises” the selector to identify the most informative frames to form the summary video. In particular, the evaluator is composed of two generative adversarial networks (GANs), in which the forward GAN is learned to reconstruct original video from summary video while the backward GAN learns to invert the processing. The consistency between the output of such cycle learning is adopted as the information preserving metric for video summarization. We demonstrate the close relation between mutual information maximization and such cycle learning procedure. Experiments on two video summarization benchmark datasets validate the state-of-theart performance and superiority of the Cycle-SUM model over previous baselines.
| Year | Citations | |
|---|---|---|
Page 1
Page 1