VideoFlow: A Conditional Flow-Based Model for Stochastic Video Generation

TLDR

Generative models that predict sequences can capture complex real-world phenomena, but video prediction faces high uncertainty and existing probabilistic models are either computationally expensive or do not directly optimize likelihood. This work introduces multi‑frame video prediction with normalizing flows, enabling direct likelihood optimization and high‑quality stochastic predictions. The method models latent‑space dynamics using flow‑based generative models, providing an efficient and competitive framework for video generation. Flow‑based generative models are shown to be a viable and competitive approach to generative modeling of video.

Abstract

Generative models that can model and predict sequences of future events can, in principle, learn to capture complex real-world phenomena, such as physical interactions. However, a central challenge in video prediction is that the future is highly uncertain: a sequence of past observations of events can imply many possible futures. Although a number of recent works have studied probabilistic models that can represent uncertain futures, such models are either extremely expensive computationally as in the case of pixel-level autoregressive models, or do not directly optimize the likelihood of the data. To our knowledge, our work is the first to propose multi-frame video prediction with normalizing flows, which allows for direct optimization of the data likelihood, and produces high-quality stochastic predictions. We describe an approach for modeling the latent space dynamics, and demonstrate that flow-based generative models offer a viable and competitive approach to generative modeling of video.

References

Page 1

	Year	Citations

Page 1