On-the-fly Large-scale 3D Reconstruction from Multi-Camera Rigs
Yijia Guo, Tong Hu, Zhiwei Li, Liwen Hu, Keming Qian, Xitong Lin, Shengbo Chen, Tiejun Huang, Lei Ma
TL;DR
This work introduces the first on-the-fly large-scale 3D reconstruction framework for multi-camera rigs, enabling kilometer-scale outdoor scenes to be reconstructed in minutes without explicit calibration. It combines a central-camera initialization with lightweight multi-camera bundle adjustment, redundancy-free Gaussian sampling, and a frequency-aware optimization scheduler to deliver drift-free trajectories and high-fidelity reconstructions in real time. The approach achieves state-of-the-art novel-view synthesis and pose accuracy across road and aerial datasets, and is validated on the new RigScapes dataset featuring synchronized multi-camera captures. The combination of hierarchical initialization, efficient pruning of Gaussian primitives, and adaptive optimization enables robust, scalable, and fast 3D scene reconstruction from camera rigs.
Abstract
Recent advances in 3D Gaussian Splatting (3DGS) have enabled efficient free-viewpoint rendering and photorealistic scene reconstruction. While on-the-fly extensions of 3DGS have shown promise for real-time reconstruction from monocular RGB streams, they often fail to achieve complete 3D coverage due to the limited field of view (FOV). Employing a multi-camera rig fundamentally addresses this limitation. In this paper, we present the first on-the-fly 3D reconstruction framework for multi-camera rigs. Our method incrementally fuses dense RGB streams from multiple overlapping cameras into a unified Gaussian representation, achieving drift-free trajectory estimation and efficient online reconstruction. We propose a hierarchical camera initialization scheme that enables coarse inter-camera alignment without calibration, followed by a lightweight multi-camera bundle adjustment that stabilizes trajectories while maintaining real-time performance. Furthermore, we introduce a redundancy-free Gaussian sampling strategy and a frequency-aware optimization scheduler to reduce the number of Gaussian primitives and the required optimization iterations, thereby maintaining both efficiency and reconstruction fidelity. Our method reconstructs hundreds of meters of 3D scenes within just 2 minutes using only raw multi-camera video streams, demonstrating unprecedented speed, robustness, and Fidelity for on-the-fly 3D scene reconstruction.
