Disentangled Generation and Aggregation for Robust Radiance Fields
Shihe Shen, Huachen Gao, Wangze Xu, Rui Peng, Luyang Tang, Kaiqiang Xiong, Jianbo Jiao, Ronggang Wang
TL;DR
This work addresses the difficulty of jointly optimizing camera poses and a triplane-based radiance field, where local updates and plane-wise entanglement hinder robust optimization. It introduces Disentangled Triplane Generation to inject global context via a per-plane generator, and Disentangled Plane Aggregation (DPA) to decouple pose updates from plane feature updates, complemented by a two-stage warm-start that preserves high-frequency detail. Empirical results on LLFF and NeRF-Synthetic show state-of-the-art performance in novel view synthesis and camera pose estimation under noisy or unknown poses, with faster convergence and robust optimization. Together, these components enable robust, efficient joint pose–NeRF optimization using a disentangled, explicit triplane representation suitable for practical deployment with uncertain camera parameters.
Abstract
The utilization of the triplane-based radiance fields has gained attention in recent years due to its ability to effectively disentangle 3D scenes with a high-quality representation and low computation cost. A key requirement of this method is the precise input of camera poses. However, due to the local update property of the triplane, a similar joint estimation as previous joint pose-NeRF optimization works easily results in local minima. To this end, we propose the Disentangled Triplane Generation module to introduce global feature context and smoothness into triplane learning, which mitigates errors caused by local updating. Then, we propose the Disentangled Plane Aggregation to mitigate the entanglement caused by the common triplane feature aggregation during camera pose updating. In addition, we introduce a two-stage warm-start training strategy to reduce the implicit constraints caused by the triplane generator. Quantitative and qualitative results demonstrate that our proposed method achieves state-of-the-art performance in novel view synthesis with noisy or unknown camera poses, as well as efficient convergence of optimization. Project page: https://gaohchen.github.io/DiGARR/.
