DrivingScene: A Multi-Task Online Feed-Forward 3D Gaussian Splatting Method for Dynamic Driving Scenes
Qirui Hou, Wenzhang Sun, Chang Zeng, Chunfeng Wang, Hao Li, Jianxun Cui
TL;DR
DrivingScene tackles real-time 4D reconstruction of dynamic driving scenes from two surround-view frames by decoupling static geometry and dynamic motion. It uses a static backbone based on 3D Gaussian Splatting to model geometry and appearance, combined with a lightweight residual flow network that accounts for non-rigid motion, yielding a total motion field for dynamic rendering. A two-stage coarse-to-fine training strategy stabilizes learning by first learning a robust static prior, then refining with dynamics using self-supervised losses. On nuScenes, the method achieves state-of-the-art results for novel-view synthesis and depth estimation while maintaining real-time efficiency and providing intermediate representations like depth and scene flow. This approach offers a practical, multi-task perception solution for autonomous driving with explicit static-dynamic decoupling and efficient online inference.
Abstract
Real-time, high-fidelity reconstruction of dynamic driving scenes is challenged by complex dynamics and sparse views, with prior methods struggling to balance quality and efficiency. We propose DrivingScene, an online, feed-forward framework that reconstructs 4D dynamic scenes from only two consecutive surround-view images. Our key innovation is a lightweight residual flow network that predicts the non-rigid motion of dynamic objects per camera on top of a learned static scene prior, explicitly modeling dynamics via scene flow. We also introduce a coarse-to-fine training paradigm that circumvents the instabilities common to end-to-end approaches. Experiments on nuScenes dataset show our image-only method simultaneously generates high-quality depth, scene flow, and 3D Gaussian point clouds online, significantly outperforming state-of-the-art methods in both dynamic reconstruction and novel view synthesis.
