Table of Contents
Fetching ...

ERGO: Excess-Risk-Guided Optimization for High-Fidelity Monocular 3D Gaussian Splatting

Zehua Ma, Hanhui Li, Zhenyu Xie, Xiaonan Luo, Michael Kampffmeyer, Feng Gao, Xiaodan Liang

TL;DR

ERGO introduces excess-risk-guided optimization to adaptively weight multi-view supervision when reconstructing 3D content from a single image. By decomposing empirical risk into excess risk and Bayes error, it dynamically emphasizes more informative views and losses, while adding geometry-aware and texture-aware objectives for global consistency and local detail. The method leverages 3D Gaussian splatting and refines auxiliary views through geometry correction, enabling robust cross-view fidelity and texture realism under imperfect supervision. Experiments on Google Scanned Objects and OmniObject3D demonstrate superior performance over state-of-the-art optimization-based and feed-forward methods, with clear gains in PSNR, SSIM, and LPIPS and in qualitative texture quality and geometric coherence.

Abstract

Generating 3D content from a single image remains a fundamentally challenging and ill-posed problem due to the inherent absence of geometric and textural information in occluded regions. While state-of-the-art generative models can synthesize auxiliary views to provide additional supervision, these views inevitably contain geometric inconsistencies and textural misalignments that propagate and amplify artifacts during 3D reconstruction. To effectively harness these imperfect supervisory signals, we propose an adaptive optimization framework guided by excess risk decomposition, termed ERGO. Specifically, ERGO decomposes the optimization losses in 3D Gaussian splatting into two components, i.e., excess risk that quantifies the suboptimality gap between current and optimal parameters, and Bayes error that models the irreducible noise inherent in synthesized views. This decomposition enables ERGO to dynamically estimate the view-specific excess risk and adaptively adjust loss weights during optimization. Furthermore, we introduce geometry-aware and texture-aware objectives that complement the excess-risk-derived weighting mechanism, establishing a synergistic global-local optimization paradigm. Consequently, ERGO demonstrates robustness against supervision noise while consistently enhancing both geometric fidelity and textural quality of the reconstructed 3D content. Extensive experiments on the Google Scanned Objects dataset and the OmniObject3D dataset demonstrate the superiority of ERGO over existing state-of-the-art methods.

ERGO: Excess-Risk-Guided Optimization for High-Fidelity Monocular 3D Gaussian Splatting

TL;DR

ERGO introduces excess-risk-guided optimization to adaptively weight multi-view supervision when reconstructing 3D content from a single image. By decomposing empirical risk into excess risk and Bayes error, it dynamically emphasizes more informative views and losses, while adding geometry-aware and texture-aware objectives for global consistency and local detail. The method leverages 3D Gaussian splatting and refines auxiliary views through geometry correction, enabling robust cross-view fidelity and texture realism under imperfect supervision. Experiments on Google Scanned Objects and OmniObject3D demonstrate superior performance over state-of-the-art optimization-based and feed-forward methods, with clear gains in PSNR, SSIM, and LPIPS and in qualitative texture quality and geometric coherence.

Abstract

Generating 3D content from a single image remains a fundamentally challenging and ill-posed problem due to the inherent absence of geometric and textural information in occluded regions. While state-of-the-art generative models can synthesize auxiliary views to provide additional supervision, these views inevitably contain geometric inconsistencies and textural misalignments that propagate and amplify artifacts during 3D reconstruction. To effectively harness these imperfect supervisory signals, we propose an adaptive optimization framework guided by excess risk decomposition, termed ERGO. Specifically, ERGO decomposes the optimization losses in 3D Gaussian splatting into two components, i.e., excess risk that quantifies the suboptimality gap between current and optimal parameters, and Bayes error that models the irreducible noise inherent in synthesized views. This decomposition enables ERGO to dynamically estimate the view-specific excess risk and adaptively adjust loss weights during optimization. Furthermore, we introduce geometry-aware and texture-aware objectives that complement the excess-risk-derived weighting mechanism, establishing a synergistic global-local optimization paradigm. Consequently, ERGO demonstrates robustness against supervision noise while consistently enhancing both geometric fidelity and textural quality of the reconstructed 3D content. Extensive experiments on the Google Scanned Objects dataset and the OmniObject3D dataset demonstrate the superiority of ERGO over existing state-of-the-art methods.
Paper Structure (18 sections, 15 equations, 10 figures, 2 tables)

This paper contains 18 sections, 15 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Given single-view images as inputs, the proposed ERGO method can generate 3D objects with better texture consistency and fidelity, compared with state-of-the-art optimization-based methods (e.g., liu2024syncdreamer) and feed-forward large models (e.g., openlrm).
  • Figure 2: Comparison of various optimization paradigms for single-image-to-3D generation, including (a) optimization-based methods, (b) direct reconstruction with multi-view synthesized images, and (c) the proposed ERGO framework with adaptive objective weights. (d) Illustration of two types of inconsistency caused by the direct reconstruction with multi-view inconsistent images.
  • Figure 3: The proposed ERGO framework for single-image 3D content generation. Given coarse 3D Gaussians and synthesized images with inconsistencies, ERGO not only estimates excess risk to mitigate inconsistencies and modulates iterative optimization globally, but also leverages the geometry-aware objective and the texture-aware objective to achieve localized refinement.
  • Figure 4: The performance of the MVD baseline degrades as the viewpoint transformation magnitude increases.
  • Figure 5: Illustration of visibility map generation. We lift the pixels from one image into 3D space and project them onto the adjacent image to identify the corresponding pixels. We then calculate the differences between these corresponding pixels to generate the visibility map.
  • ...and 5 more figures