GaussianPrediction: Dynamic 3D Gaussian Prediction for Motion Extrapolation and Free View Synthesis
Boming Zhao, Yuan Li, Ziyu Sun, Lin Zeng, Yujun Shen, Rui Ma, Yinda Zhang, Hujun Bao, Zhaopeng Cui
TL;DR
GaussianPrediction introduces a dynamic 3D Gaussian framework with a canonical space, lifecycle, and deformation modeling to forecast future scenes from monocular video and render novel-view images. It combines a hyper-canonical space with a concentric motion distillation strategy and key-point-based deformations, then uses a Graph Convolutional Network to predict future key-point motions, enabling coherent short-term future synthesis. The approach yields state-of-the-art results on synthetic and real data in both dynamic scene rendering and future-view synthesis, while maintaining efficiency by distilling motion into hundreds of key points. The work advances photorealistic, view-consistent forecasting for dynamic environments, with potential impact on planning and navigation in robotics and AR/VR applications, and notes future work on integrating motion priors for longer-horizon prediction.
Abstract
Forecasting future scenarios in dynamic environments is essential for intelligent decision-making and navigation, a challenge yet to be fully realized in computer vision and robotics. Traditional approaches like video prediction and novel-view synthesis either lack the ability to forecast from arbitrary viewpoints or to predict temporal dynamics. In this paper, we introduce GaussianPrediction, a novel framework that empowers 3D Gaussian representations with dynamic scene modeling and future scenario synthesis in dynamic environments. GaussianPrediction can forecast future states from any viewpoint, using video observations of dynamic scenes. To this end, we first propose a 3D Gaussian canonical space with deformation modeling to capture the appearance and geometry of dynamic scenes, and integrate the lifecycle property into Gaussians for irreversible deformations. To make the prediction feasible and efficient, a concentric motion distillation approach is developed by distilling the scene motion with key points. Finally, a Graph Convolutional Network is employed to predict the motions of key points, enabling the rendering of photorealistic images of future scenarios. Our framework shows outstanding performance on both synthetic and real-world datasets, demonstrating its efficacy in predicting and rendering future environments.
