Table of Contents
Fetching ...

On-the-fly Large-scale 3D Reconstruction from Multi-Camera Rigs

Yijia Guo, Tong Hu, Zhiwei Li, Liwen Hu, Keming Qian, Xitong Lin, Shengbo Chen, Tiejun Huang, Lei Ma

TL;DR

This work introduces the first on-the-fly large-scale 3D reconstruction framework for multi-camera rigs, enabling kilometer-scale outdoor scenes to be reconstructed in minutes without explicit calibration. It combines a central-camera initialization with lightweight multi-camera bundle adjustment, redundancy-free Gaussian sampling, and a frequency-aware optimization scheduler to deliver drift-free trajectories and high-fidelity reconstructions in real time. The approach achieves state-of-the-art novel-view synthesis and pose accuracy across road and aerial datasets, and is validated on the new RigScapes dataset featuring synchronized multi-camera captures. The combination of hierarchical initialization, efficient pruning of Gaussian primitives, and adaptive optimization enables robust, scalable, and fast 3D scene reconstruction from camera rigs.

Abstract

Recent advances in 3D Gaussian Splatting (3DGS) have enabled efficient free-viewpoint rendering and photorealistic scene reconstruction. While on-the-fly extensions of 3DGS have shown promise for real-time reconstruction from monocular RGB streams, they often fail to achieve complete 3D coverage due to the limited field of view (FOV). Employing a multi-camera rig fundamentally addresses this limitation. In this paper, we present the first on-the-fly 3D reconstruction framework for multi-camera rigs. Our method incrementally fuses dense RGB streams from multiple overlapping cameras into a unified Gaussian representation, achieving drift-free trajectory estimation and efficient online reconstruction. We propose a hierarchical camera initialization scheme that enables coarse inter-camera alignment without calibration, followed by a lightweight multi-camera bundle adjustment that stabilizes trajectories while maintaining real-time performance. Furthermore, we introduce a redundancy-free Gaussian sampling strategy and a frequency-aware optimization scheduler to reduce the number of Gaussian primitives and the required optimization iterations, thereby maintaining both efficiency and reconstruction fidelity. Our method reconstructs hundreds of meters of 3D scenes within just 2 minutes using only raw multi-camera video streams, demonstrating unprecedented speed, robustness, and Fidelity for on-the-fly 3D scene reconstruction.

On-the-fly Large-scale 3D Reconstruction from Multi-Camera Rigs

TL;DR

This work introduces the first on-the-fly large-scale 3D reconstruction framework for multi-camera rigs, enabling kilometer-scale outdoor scenes to be reconstructed in minutes without explicit calibration. It combines a central-camera initialization with lightweight multi-camera bundle adjustment, redundancy-free Gaussian sampling, and a frequency-aware optimization scheduler to deliver drift-free trajectories and high-fidelity reconstructions in real time. The approach achieves state-of-the-art novel-view synthesis and pose accuracy across road and aerial datasets, and is validated on the new RigScapes dataset featuring synchronized multi-camera captures. The combination of hierarchical initialization, efficient pruning of Gaussian primitives, and adaptive optimization enables robust, scalable, and fast 3D scene reconstruction from camera rigs.

Abstract

Recent advances in 3D Gaussian Splatting (3DGS) have enabled efficient free-viewpoint rendering and photorealistic scene reconstruction. While on-the-fly extensions of 3DGS have shown promise for real-time reconstruction from monocular RGB streams, they often fail to achieve complete 3D coverage due to the limited field of view (FOV). Employing a multi-camera rig fundamentally addresses this limitation. In this paper, we present the first on-the-fly 3D reconstruction framework for multi-camera rigs. Our method incrementally fuses dense RGB streams from multiple overlapping cameras into a unified Gaussian representation, achieving drift-free trajectory estimation and efficient online reconstruction. We propose a hierarchical camera initialization scheme that enables coarse inter-camera alignment without calibration, followed by a lightweight multi-camera bundle adjustment that stabilizes trajectories while maintaining real-time performance. Furthermore, we introduce a redundancy-free Gaussian sampling strategy and a frequency-aware optimization scheduler to reduce the number of Gaussian primitives and the required optimization iterations, thereby maintaining both efficiency and reconstruction fidelity. Our method reconstructs hundreds of meters of 3D scenes within just 2 minutes using only raw multi-camera video streams, demonstrating unprecedented speed, robustness, and Fidelity for on-the-fly 3D scene reconstruction.

Paper Structure

This paper contains 12 sections, 12 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Overview of our on-the-fly pipeline. We first initialize a central multi-camera rig system via feature matching and estimate the corresponding 3DGS scene and camera pose. Next, a lightweight multi-camera bundle adjustment refines trajectories across wide baselines for subsequent keyframes (Sec. \ref{['subsec_pose']}). Instead of relying on traditional density-boosting or oversampling heuristics, we perform redundancy-free Gaussian sampling and merging across overlapping camera views, which accelerates convergence and reduces computational overhead. The sampled Gaussian primitives are progressively fused into the global scene during sequential frame updates (Sec. \ref{['subsec_Gaussian']}). Finally, a frequency-aware optimization scheduler dynamically allocates more iterations to regions requiring rapid refinement, enabling fast and stable scene convergence with strong global consistency. (Sec. \ref{['subsec_optimize']})
  • Figure 2: Left: Our 5 DJI Action 5 Pro camera helmet rig. Right: Our two synchronized DJI Action 5 Pro on DJI Mavic 3 Pro drone.
  • Figure 3: Visualizations with the changed FOV. Using monocular video streams often leads to severe scene incompleteness. Directly applying monocular methods to multi-camera streams results in significant scene inconsistency and noticeable artifacts. In contrast, our approach achieves complete and high-fidelity scene reconstruction.
  • Figure 4: Visualization results after camera rotation. Only our method produces visually satisfactory results under such extreme novel-view settings.
  • Figure 5: Visualization of multi-camera rig trajectories.
  • ...and 2 more figures