Table of Contents
Fetching ...

Robust Neural Rendering in the Wild with Asymmetric Dual 3D Gaussian Splatting

Chengqi Li, Zhihao Shi, Yangdi Lu, Wenbo He, Xiangyu Xu

TL;DR

The paper tackles robust 3D scene reconstruction from unconstrained, in-the-wild images by exploiting the stochastic nature of artifacts. It introduces Asymmetric Dual 3D Gaussian Splatting (AsymGS), which trains two Gaussian-based models under mutual consistency while applying complementary masks to encourage divergent learning and suppress shared errors. A Dynamic EMA proxy variant further improves training efficiency by replacing one model with a dynamically updated EMA copy and using an alternating masking strategy. Across three real-world datasets, AsymGS achieves state-of-the-art reconstruction quality with significant efficiency gains, demonstrating strong robustness to distractors and varying illumination, and highlighting practical potential for in-the-wild 3D scene modeling.

Abstract

3D reconstruction from in-the-wild images remains a challenging task due to inconsistent lighting conditions and transient distractors. Existing methods typically rely on heuristic strategies to handle the low-quality training data, which often struggle to produce stable and consistent reconstructions, frequently resulting in visual artifacts. In this work, we propose \modelname{}, a novel framework that leverages the stochastic nature of these artifacts: they tend to vary across different training runs due to minor randomness. Specifically, our method trains two 3D Gaussian Splatting (3DGS) models in parallel, enforcing a consistency constraint that encourages convergence on reliable scene geometry while suppressing inconsistent artifacts. To prevent the two models from collapsing into similar failure modes due to confirmation bias, we introduce a divergent masking strategy that applies two complementary masks: a multi-cue adaptive mask and a self-supervised soft mask, which leads to an asymmetric training process of the two models, reducing shared error modes. In addition, to improve the efficiency of model training, we introduce a lightweight variant called Dynamic EMA Proxy, which replaces one of the two models with a dynamically updated Exponential Moving Average (EMA) proxy, and employs an alternating masking strategy to preserve divergence. Extensive experiments on challenging real-world datasets demonstrate that our method consistently outperforms existing approaches while achieving high efficiency. See the project website at https://steveli88.github.io/AsymGS.

Robust Neural Rendering in the Wild with Asymmetric Dual 3D Gaussian Splatting

TL;DR

The paper tackles robust 3D scene reconstruction from unconstrained, in-the-wild images by exploiting the stochastic nature of artifacts. It introduces Asymmetric Dual 3D Gaussian Splatting (AsymGS), which trains two Gaussian-based models under mutual consistency while applying complementary masks to encourage divergent learning and suppress shared errors. A Dynamic EMA proxy variant further improves training efficiency by replacing one model with a dynamically updated EMA copy and using an alternating masking strategy. Across three real-world datasets, AsymGS achieves state-of-the-art reconstruction quality with significant efficiency gains, demonstrating strong robustness to distractors and varying illumination, and highlighting practical potential for in-the-wild 3D scene modeling.

Abstract

3D reconstruction from in-the-wild images remains a challenging task due to inconsistent lighting conditions and transient distractors. Existing methods typically rely on heuristic strategies to handle the low-quality training data, which often struggle to produce stable and consistent reconstructions, frequently resulting in visual artifacts. In this work, we propose \modelname{}, a novel framework that leverages the stochastic nature of these artifacts: they tend to vary across different training runs due to minor randomness. Specifically, our method trains two 3D Gaussian Splatting (3DGS) models in parallel, enforcing a consistency constraint that encourages convergence on reliable scene geometry while suppressing inconsistent artifacts. To prevent the two models from collapsing into similar failure modes due to confirmation bias, we introduce a divergent masking strategy that applies two complementary masks: a multi-cue adaptive mask and a self-supervised soft mask, which leads to an asymmetric training process of the two models, reducing shared error modes. In addition, to improve the efficiency of model training, we introduce a lightweight variant called Dynamic EMA Proxy, which replaces one of the two models with a dynamically updated Exponential Moving Average (EMA) proxy, and employs an alternating masking strategy to preserve divergence. Extensive experiments on challenging real-world datasets demonstrate that our method consistently outperforms existing approaches while achieving high efficiency. See the project website at https://steveli88.github.io/AsymGS.

Paper Structure

This paper contains 23 sections, 10 equations, 9 figures, 16 tables, 1 algorithm.

Figures (9)

  • Figure 1: Left: The key insight of this work is that artifacts arising from low-quality in-the-wild inputs are typically stochastic across different runs of the same model (Baseline: Run 1 vs. Run 2). This motivates the design of the Asymmetric Dual 3DGS framework, which enhances true scene structure while suppressing errors through cross-model consistency (w/ consistency). Right: Our method compares favorably against the state-of-the-art approaches in terms of reconstruction quality while maintaining high training efficiency. Results are on the NeRF On-the-go dataset ren_nerfonthego_2024.
  • Figure 2: Overview of the AsymGS framework. Two 3DGS models $\mathbb{G}_1$ and $\mathbb{G}_2$ are concurrently optimized with the reconstruction loss $\mathcal{L}_{r1}^{\mathbf{M}_h}$ and $\mathcal{L}_{r2}^{\mathbf{M}_s}$ (Eq. \ref{['eq:dualrecon1']}), along with the mutual consistency loss $\mathcal{L}_{m1}$ and $\mathcal{L}_{m2}$ (Eq. \ref{['eq:dualmutual']}). In addition, we apply a mask loss (Eq. \ref{['eq:mask']}) for learning soft mask in a self-supervised manner. For improved efficiency, we also propose an EMA version of our framework by replacing $\mathbb{G}_2$ with a dynamic EMA proxy. Both the mask loss and the EMA proxy have been omitted here for clarity. Note that the color transform in this figure is for illustration purpose, which undergoes a rasterization process in practice as introduced in Section \ref{['sec:preliminary']}.
  • Figure 3: Comparisons of hard and soft masks. Distractors are highlighted in red boxes in the input. The right four columns show the evolving of the self-supervised soft mask across different epochs.
  • Figure 4: Qualitative results on the NeRF On-the-go ren_nerfonthego_2024 (top) and the RobustNeRF sabour_robustnerf_2023 (bottom) datasets.
  • Figure 5: Qualitative results on the PhotoTourism dataset jin_phototourism_2020.
  • ...and 4 more figures