Learning to Generate Time-Lapse Videos Using Multi-stage Dynamic Generative Adversarial Networks

TLDR

Predicting future visual content from a single image, such as cloud motion, motivates the need for realistic time‑lapse video generation. The authors propose a two‑stage generative adversarial network that generates high‑resolution time‑lapse videos from a single initial frame. First, the network produces realistic per‑frame content, then it refines the sequence by enforcing motion dynamics and applying a Gram‑matrix motion model, trained on a newly built large‑scale time‑lapse dataset. The method yields 128×128 videos of 32 frames and achieves superior quantitative and qualitative performance compared to existing state‑of‑the‑art models.

Abstract

Taking a photo outside, can we predict the immediate future, e.g., how would the cloud move in the sky? We address this problem by presenting a generative adversarial network (GAN) based two-stage approach to generating realistic time-lapse videos of high resolution. Given the first frame, our model learns to generate long-term future frames. The first stage generates videos of realistic contents for each frame. The second stage refines the generated video from the first stage by enforcing it to be closer to real videos with regard to motion dynamics. To further encourage vivid motion in the final generated video, Gram matrix is employed to model the motion more precisely. We build a large scale time-lapse dataset, and test our approach on this new dataset. Using our model, we are able to generate realistic videos of up to 128 Ã- 128 resolution for 32 frames. Quantitative and qualitative experiment results demonstrate the superiority of our model over the state-of-the-art models.

References

Page 1

	Year	Citations

Page 1