Table of Contents
Fetching ...

Leveling3D: Leveling Up 3D Reconstruction with Feed-Forward 3D Gaussian Splatting and Geometry-Aware Generation

Yiming Huang, Baixiang Huang, Beilei Cui, Chi Kit Ng, Long Bai, Hongliang Ren

Abstract

Feed-forward 3D reconstruction has revolutionized 3D vision, providing a powerful baseline for downstream tasks such as novel-view synthesis with 3D Gaussian Splatting. Previous works explore fixing the corrupted rendering results with a diffusion model. However, they lack geometric concern and fail at filling the missing area on the extrapolated view. In this work, we introduce Leveling3D, a novel pipeline that integrates feed-forward 3D reconstruction with geometrical-consistent generation to enable holistic simultaneous reconstruction and generation. We propose a geometry-aware leveling adapter, a lightweight technique that aligns internal knowledge in the diffusion model with the geometry prior from the feed-forward model. The leveling adapter enables generation on the artifact area of the extrapolated novel views caused by underconstrained regions of the 3D representation. Specifically, to learn a more diverse distributed generation, we introduce the palette filtering strategy for training, and a test-time masking refinement to prevent messy boundaries along the fixing regions. More importantly, the enhanced extrapolated novel views from Leveling3D could be used as the inputs for feed-forward 3DGS, leveling up the 3D reconstruction. We achieve SOTA performance on public datasets, including tasks such as novel-view synthesis and depth estimation.

Leveling3D: Leveling Up 3D Reconstruction with Feed-Forward 3D Gaussian Splatting and Geometry-Aware Generation

Abstract

Feed-forward 3D reconstruction has revolutionized 3D vision, providing a powerful baseline for downstream tasks such as novel-view synthesis with 3D Gaussian Splatting. Previous works explore fixing the corrupted rendering results with a diffusion model. However, they lack geometric concern and fail at filling the missing area on the extrapolated view. In this work, we introduce Leveling3D, a novel pipeline that integrates feed-forward 3D reconstruction with geometrical-consistent generation to enable holistic simultaneous reconstruction and generation. We propose a geometry-aware leveling adapter, a lightweight technique that aligns internal knowledge in the diffusion model with the geometry prior from the feed-forward model. The leveling adapter enables generation on the artifact area of the extrapolated novel views caused by underconstrained regions of the 3D representation. Specifically, to learn a more diverse distributed generation, we introduce the palette filtering strategy for training, and a test-time masking refinement to prevent messy boundaries along the fixing regions. More importantly, the enhanced extrapolated novel views from Leveling3D could be used as the inputs for feed-forward 3DGS, leveling up the 3D reconstruction. We achieve SOTA performance on public datasets, including tasks such as novel-view synthesis and depth estimation.
Paper Structure (24 sections, 22 equations, 11 figures, 11 tables, 1 algorithm)

This paper contains 24 sections, 22 equations, 11 figures, 11 tables, 1 algorithm.

Figures (11)

  • Figure 1: We propose Leveling3D, a new method for extrapolated view refinement for Feed-Forward 3DGS with sparse input. Previous 3DGS refinement methods with naive reference or text prompts as diffusion control have limited geometry generalization ability within the artifact area in extrapolated views. In contrast, our method utilizes geometry-prior control, which generates fine RGB details with geometric consistency. The refined views further level up the 3D reconstruction with extrapolation to the unseen area, achieving SOTA performance on both image and depth synthesis.
  • Figure 2: Overview of the Leveling3D pipeline. Our pipeline integrates a geometry-aware leveling adapter that fuses geometry tokens with diffusion control, enabling robust refinement of extrapolated views and leveling up the 3D reconstruction with extended geometric-consistent extrapolation areas.
  • Figure 3: Qualitative result on MipNeRF360 barron2022mip and VRNeRF xu2023vr datasets. Previous refinement methods suffer from severe geometry collapse, texture corruption, and missing content. In contrast, our Leveling3D method robustly generate fine details and fills large extrapolated areas with plausible content.
  • Figure 4: Qualitative result of depth estimation on TartanAir wang2020tartanair and ScanNet dai2017scannet datasets. Our method refines geometry-consistent novel views within the extrapolated artifact areas, leveling up the 3DGS representation to achieve complete scene reconstruction with superior geometry detail recovery and boundary coherence.
  • Figure 5: Qualitative result by removing the test-time mask refinement. Our mask refinement robustly prevent corrupted generation near the boundary.
  • ...and 6 more figures