Improving Robustness for Joint Optimization of Camera Poses and Decomposed Low-Rank Tensorial Radiance Fields
Bo-Yu Cheng, Wei-Chen Chiu, Yu-Lun Liu
TL;DR
This work addresses robust joint optimization of camera poses and a 3D scene represented by decomposed low-rank tensors using only 2D supervision, noting that naive voxel-based pose optimization can converge to sub-optimal minima due to high-frequency content. It introduces a spectrum-control framework built on separable component-wise Gaussian convolution over decomposed tensors, enabling a coarse-to-fine training regime, plus robustness techniques including smoothed 2D supervision, randomly scaled kernels, and edge-guided loss. A key contribution is an efficient separable convolution approach that distributes 3D Gaussian filtering across tensor components, achieving significant computational savings while preserving expressivity, and allowing a single voxel grid to be trained with accelerated convergence. Empirically, the method delivers state-of-the-art novel view synthesis and robust pose recovery on NeRF-Synthetic and LLFF datasets, converging an order of magnitude faster than prior methods that require hundreds of thousands of iterations. Overall, the paper advances robust joint optimization for voxel-based radiance fields with decomposed representations, making unknown-pose 3D reconstruction more practical and scalable.
Abstract
In this paper, we propose an algorithm that allows joint refinement of camera pose and scene geometry represented by decomposed low-rank tensor, using only 2D images as supervision. First, we conduct a pilot study based on a 1D signal and relate our findings to 3D scenarios, where the naive joint pose optimization on voxel-based NeRFs can easily lead to sub-optimal solutions. Moreover, based on the analysis of the frequency spectrum, we propose to apply convolutional Gaussian filters on 2D and 3D radiance fields for a coarse-to-fine training schedule that enables joint camera pose optimization. Leveraging the decomposition property in decomposed low-rank tensor, our method achieves an equivalent effect to brute-force 3D convolution with only incurring little computational overhead. To further improve the robustness and stability of joint optimization, we also propose techniques of smoothed 2D supervision, randomly scaled kernel parameters, and edge-guided loss mask. Extensive quantitative and qualitative evaluations demonstrate that our proposed framework achieves superior performance in novel view synthesis as well as rapid convergence for optimization.
