From Virtual Games to Real-World Play
Wenqiang Sun, Fangyun Wei, Jinjing Zhao, Xi Chen, Zilong Chen, Hongyang Zhang, Jun Zhang, Yan Lu
TL;DR
RealPlay tackles interactive real-world video generation by reframing it as a chunk-wise diffusion problem that accepts user control signals. It comprises a two-stage approach: first adapting a pre-trained image-to-video generator to produce short, iterative chunks, then fine-tuning on a mixed dataset of labeled game data and unlabeled real-world videos with an adaptive modulation of action signals. The method achieves strong control transfer from game to real-world entities (vehicles, bicycles, pedestrians) and outperforms both single-shot and prior chunk-wise baselines, showing robust temporal coherence and realism. This work demonstrates a compelling step toward neural real-world game engines that learn realistic dynamics from data, reducing reliance on annotated real-world action data.
Abstract
We introduce RealPlay, a neural network-based real-world game engine that enables interactive video generation from user control signals. Unlike prior works focused on game-style visuals, RealPlay aims to produce photorealistic, temporally consistent video sequences that resemble real-world footage. It operates in an interactive loop: users observe a generated scene, issue a control command, and receive a short video chunk in response. To enable such realistic and responsive generation, we address key challenges including iterative chunk-wise prediction for low-latency feedback, temporal consistency across iterations, and accurate control response. RealPlay is trained on a combination of labeled game data and unlabeled real-world videos, without requiring real-world action annotations. Notably, we observe two forms of generalization: (1) control transfer-RealPlay effectively maps control signals from virtual to real-world scenarios; and (2) entity transfer-although training labels originate solely from a car racing game, RealPlay generalizes to control diverse real-world entities, including bicycles and pedestrians, beyond vehicles. Project page can be found: https://wenqsun.github.io/RealPlay/
