A Constrained Optimization Approach for Gaussian Splatting from Coarsely-posed Images and Noisy Lidar Point Clouds
Jizong Peng, Tze Ho Elden Tse, Kai Xu, Wenchao Gao, Angela Yao
TL;DR
The paper tackles robust 3D reconstruction with Gaussian Splatting when camera poses and point clouds are noisy and only coarsely estimated. It introduces a constrained optimization framework that decouples camera pose into device-centered and world-centered components, and jointly refines intrinsics, extrinsics, and the 3DGS representation using sensitivity-based pre-conditioning and a log-barrier to stay within feasible regions. Two geometric priors—a soft epipolar constraint and a reprojection regularizer—regularize the optimization, aided by line-intersection depth estimation, improving rendering fidelity. A new multimodal SLAM dataset is collected to evaluate the method, and results demonstrate substantial improvements over COLMAP-based approaches and competitive performance with significantly reduced preprocessing time, highlighting practical applicability for large-scale, noisy multi-camera scenes.
Abstract
3D Gaussian Splatting (3DGS) is a powerful reconstruction technique, but it needs to be initialized from accurate camera poses and high-fidelity point clouds. Typically, the initialization is taken from Structure-from-Motion (SfM) algorithms; however, SfM is time-consuming and restricts the application of 3DGS in real-world scenarios and large-scale scene reconstruction. We introduce a constrained optimization method for simultaneous camera pose estimation and 3D reconstruction that does not require SfM support. Core to our approach is decomposing a camera pose into a sequence of camera-to-(device-)center and (device-)center-to-world optimizations. To facilitate, we propose two optimization constraints conditioned to the sensitivity of each parameter group and restricts each parameter's search space. In addition, as we learn the scene geometry directly from the noisy point clouds, we propose geometric constraints to improve the reconstruction quality. Experiments demonstrate that the proposed method significantly outperforms the existing (multi-modal) 3DGS baseline and methods supplemented by COLMAP on both our collected dataset and two public benchmarks.
