Table of Contents
Fetching ...

A Survey on Future Frame Synthesis: Bridging Deterministic and Generative Approaches

Ruibo Ming, Zhewei Huang, Jingwei Wu, Zhuoxuan Ju, Daxin Jiang, Jianming Hu, Lihui Peng, Shuchang Zhou

TL;DR

This survey maps the evolution of future frame synthesis (FFS) from deterministic, pixel-level prediction to modern generative paradigms that emphasize semantic coherence and long-term plausibility. It introduces a taxonomy organized by modeling stochasticity and analyzes driving factors such as architectural advances, data scale, and compute, while outlining bifurcating research frontiers: real-time pragmatic synthesis and expansive generative world simulation. Key contributions include a synthesis of deterministic, stochastic, and generative methods across pixel and feature spaces, a critical review of evaluation metrics, and a discussion of datasets, challenges, and application domains. The work provides a practical roadmap for researchers aiming to build robust, scalable, and controllable video synthesis systems that bridge short-term prediction with long-horizon world modeling.

Abstract

Future Frame Synthesis (FFS), the task of generating subsequent video frames from context, represents a core challenge in machine intelligence and a cornerstone for developing predictive world models. This survey provides a comprehensive analysis of the FFS landscape, charting its critical evolution from deterministic algorithms focused on pixel-level accuracy to modern generative paradigms that prioritize semantic coherence and dynamic plausibility. We introduce a novel taxonomy organized by algorithmic stochasticity, which not only categorizes existing methods but also reveals the fundamental drivers--advances in architectures, datasets, and computational scale--behind this paradigm shift. Critically, our analysis identifies a bifurcation in the field's trajectory: one path toward efficient, real-time prediction, and another toward large-scale, generative world simulation. By pinpointing key challenges and proposing concrete research questions for both frontiers, this survey serves as an essential guide for researchers aiming to advance the frontiers of visual dynamic modeling.

A Survey on Future Frame Synthesis: Bridging Deterministic and Generative Approaches

TL;DR

This survey maps the evolution of future frame synthesis (FFS) from deterministic, pixel-level prediction to modern generative paradigms that emphasize semantic coherence and long-term plausibility. It introduces a taxonomy organized by modeling stochasticity and analyzes driving factors such as architectural advances, data scale, and compute, while outlining bifurcating research frontiers: real-time pragmatic synthesis and expansive generative world simulation. Key contributions include a synthesis of deterministic, stochastic, and generative methods across pixel and feature spaces, a critical review of evaluation metrics, and a discussion of datasets, challenges, and application domains. The work provides a practical roadmap for researchers aiming to build robust, scalable, and controllable video synthesis systems that bridge short-term prediction with long-horizon world modeling.

Abstract

Future Frame Synthesis (FFS), the task of generating subsequent video frames from context, represents a core challenge in machine intelligence and a cornerstone for developing predictive world models. This survey provides a comprehensive analysis of the FFS landscape, charting its critical evolution from deterministic algorithms focused on pixel-level accuracy to modern generative paradigms that prioritize semantic coherence and dynamic plausibility. We introduce a novel taxonomy organized by algorithmic stochasticity, which not only categorizes existing methods but also reveals the fundamental drivers--advances in architectures, datasets, and computational scale--behind this paradigm shift. Critically, our analysis identifies a bifurcation in the field's trajectory: one path toward efficient, real-time prediction, and another toward large-scale, generative world simulation. By pinpointing key challenges and proposing concrete research questions for both frontiers, this survey serves as an essential guide for researchers aiming to advance the frontiers of visual dynamic modeling.
Paper Structure (68 sections, 3 equations, 2 figures, 5 tables)

This paper contains 68 sections, 3 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Structure of our taxonomy. We categorize future frame synthesis (FFS) approaches based on the degree of stochasticity in the modeling paradigm.
  • Figure 2: Dataset samples. We adjust the absolute resolution of the image for visualization while maintaining the relative size relationships.