Table of Contents
Fetching ...

DeclutterNeRF: Generative-Free 3D Scene Recovery for Occlusion Removal

Wanzhou Liu, Zhexiao Xiong, Xinyu Li, Nathan Jacobs

TL;DR

DeclutterNeRF tackles occlusion removal in NeRF-based 3D scene reconstruction without relying on generative priors. It introduces DeclutterSet, a dataset with realistic, multi-depth occlusions and motion, and DeclutterNeRF, which uses joint multi-view camera-parameter optimization, Occlusion Annealing Regularization, and S3IM to achieve artifact-free recovery. The approach delivers state-of-the-art results on DeclutterSet with substantial improvements in PSNR/SSIM and LPIPS while maintaining efficient training on a single RTX 4090. This work provides a practical, robust baseline for occlusion-aware 3D reconstruction and establishes a dataset and framework for future research, including potential integration with generative priors.

Abstract

Recent novel view synthesis (NVS) techniques, including Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have greatly advanced 3D scene reconstruction with high-quality rendering and realistic detail recovery. Effectively removing occlusions while preserving scene details can further enhance the robustness and applicability of these techniques. However, existing approaches for object and occlusion removal predominantly rely on generative priors, which, despite filling the resulting holes, introduce new artifacts and blurriness. Moreover, existing benchmark datasets for evaluating occlusion removal methods lack realistic complexity and viewpoint variations. To address these issues, we introduce DeclutterSet, a novel dataset featuring diverse scenes with pronounced occlusions distributed across foreground, midground, and background, exhibiting substantial relative motion across viewpoints. We further introduce DeclutterNeRF, an occlusion removal method free from generative priors. DeclutterNeRF introduces joint multi-view optimization of learnable camera parameters, occlusion annealing regularization, and employs an explainable stochastic structural similarity loss, ensuring high-quality, artifact-free reconstructions from incomplete images. Experiments demonstrate that DeclutterNeRF significantly outperforms state-of-the-art methods on our proposed DeclutterSet, establishing a strong baseline for future research.

DeclutterNeRF: Generative-Free 3D Scene Recovery for Occlusion Removal

TL;DR

DeclutterNeRF tackles occlusion removal in NeRF-based 3D scene reconstruction without relying on generative priors. It introduces DeclutterSet, a dataset with realistic, multi-depth occlusions and motion, and DeclutterNeRF, which uses joint multi-view camera-parameter optimization, Occlusion Annealing Regularization, and S3IM to achieve artifact-free recovery. The approach delivers state-of-the-art results on DeclutterSet with substantial improvements in PSNR/SSIM and LPIPS while maintaining efficient training on a single RTX 4090. This work provides a practical, robust baseline for occlusion-aware 3D reconstruction and establishes a dataset and framework for future research, including potential integration with generative priors.

Abstract

Recent novel view synthesis (NVS) techniques, including Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have greatly advanced 3D scene reconstruction with high-quality rendering and realistic detail recovery. Effectively removing occlusions while preserving scene details can further enhance the robustness and applicability of these techniques. However, existing approaches for object and occlusion removal predominantly rely on generative priors, which, despite filling the resulting holes, introduce new artifacts and blurriness. Moreover, existing benchmark datasets for evaluating occlusion removal methods lack realistic complexity and viewpoint variations. To address these issues, we introduce DeclutterSet, a novel dataset featuring diverse scenes with pronounced occlusions distributed across foreground, midground, and background, exhibiting substantial relative motion across viewpoints. We further introduce DeclutterNeRF, an occlusion removal method free from generative priors. DeclutterNeRF introduces joint multi-view optimization of learnable camera parameters, occlusion annealing regularization, and employs an explainable stochastic structural similarity loss, ensuring high-quality, artifact-free reconstructions from incomplete images. Experiments demonstrate that DeclutterNeRF significantly outperforms state-of-the-art methods on our proposed DeclutterSet, establishing a strong baseline for future research.

Paper Structure

This paper contains 25 sections, 14 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Comparison of Mainstream Occlusion Removal Datasets. DeclutterSet is a new dataset reflecting real-world challenges and complexity in occlusion removal. For each dataset, we show four evenly spaced views per scene. As seen in both the RGB images and masks, DeclutterSet exhibits: (i) wider distance distribution, (ii) larger occluded regions, (iii) greater relative motion between viewpoints and occluders, and (iv) more uncertain occluder shapes and mask layouts. In contrast, the OCC-NeRF dataset zhu2023occlusion does not employ masks during selection, limiting it to foreground occlusions and requiring a strict separation between foreground and background, reducing its suitability for complex scenarios. SPIn-NeRF spinnerf provides limited challenge for cross-view consistency, as it is constrained to small viewpoint variations, keeping occluders and background nearly static across rendered views. A detailed analysis is provided in Sec. \ref{['sec:experimental_setup']}.
  • Figure 2: Overview of Our Optimization Framework. Our method builds on the NeRF architecture to recover occluded scenes without generative priors. Starting with a single-view SAM segmentation method yin2023ornerf, we propagate occluder masks across views via stereo matching. Camera parameters are jointly optimized with masked photometric supervision to correct occlusion-induced pose errors (Sec. \ref{['methodcam']}). To stabilize training and mitigate overfitting to visible regions, we propose Occlusion Annealing Regularization (Sec. \ref{['methodann']}). The Stochastic Structural Similarity loss (Sec. \ref{['methodssim']}) enforces global consideration across views and improves reconstruction under long-tail visibility.
  • Figure 3: Visualization of the Impact of Obstacles on Pose Estimation. Structure-from-motion methods, including the widely used COLMAP schoenberger2016sfmschoenberger2016mvs and the recently proposed GLOMAP pan2024glomap, struggle to maintain stable camera pose estimation after occlusion is removed. This is illustrated in the Ladder scene (left) and the Lamp Post scene (right). Green dashed lines connect corresponding samples before and after occlusion removal, highlighting positional shifts. Axes are rotated for clearer visualization.
  • Figure 4: Visualization of Sampling Distribution. For a demonstration of the principle of global patched S3IM, the distribution of pixels exhibits a marked imbalance. This issue can be addressed through our patch reorganization. The distribution of each patch becomes more concentrated and uniform, eliminating the regional long-tail distribution of pixels and promoting stable model iteration. Darker regions indicate more extreme long-tailed visibility, which require targeted optimization.
  • Figure 5: The DeclutterSet.(a)Orchids and (f)Lamp Post illustrate occluders at different distances: in (a), both the buds and flowers lie on the same near-depth plane close to the camera, while in (f), the occluding object is situated farther away in the mid-background; (b)Railing and (c)Statue resemble traditional occlusion and object removal settings commonly found in existing benchmarks; (e)Stone Column and (g)Chain Fence exhibit occlusions that scatter across different image regions as the viewpoint shifts; (d)Ladder and (h)Chair Back feature larger, irregularly shaped occluders and more pronounced viewpoint variations, posing further challenges to cross-view consistency and geometry recovery. Further details are provided in the supplementary material.
  • ...and 5 more figures