Table of Contents
Fetching ...

3DGS-HPC: Distractor-free 3D Gaussian Splatting with Hybrid Patch-wise Classification

Jiahao Chen, Yipeng Qin, Ganlong Zhao, Xin Li, Wenping Wang, Guanbin Li

TL;DR

3DGS-HPC is proposed, a framework that circumvents limitations by combining two complementary principles: a patch-wise classification strategy that leverages local spatial consistency for robust region-level decisions, and a hybrid classification metric that adaptively integrates photometric and perceptual cues for more reliable separation.

Abstract

3D Gaussian Splatting (3DGS) has demonstrated remarkable performance in novel view synthesis and 3D scene reconstruction, yet its quality often degrades in real-world environments due to transient distractors, such as moving objects and varying shadows. Existing methods commonly rely on semantic cues extracted from pre-trained vision models to identify and suppress these distractors, but such semantics are misaligned with the binary distinction between static and transient regions and remain fragile under the appearance perturbations introduced during 3DGS optimization. We propose 3DGS-HPC, a framework that circumvents these limitations by combining two complementary principles: a patch-wise classification strategy that leverages local spatial consistency for robust region-level decisions, and a hybrid classification metric that adaptively integrates photometric and perceptual cues for more reliable separation. Extensive experiments demonstrate the superiority and robustness of our method in mitigating distractors to improve 3DGS-based novel view synthesis.

3DGS-HPC: Distractor-free 3D Gaussian Splatting with Hybrid Patch-wise Classification

TL;DR

3DGS-HPC is proposed, a framework that circumvents limitations by combining two complementary principles: a patch-wise classification strategy that leverages local spatial consistency for robust region-level decisions, and a hybrid classification metric that adaptively integrates photometric and perceptual cues for more reliable separation.

Abstract

3D Gaussian Splatting (3DGS) has demonstrated remarkable performance in novel view synthesis and 3D scene reconstruction, yet its quality often degrades in real-world environments due to transient distractors, such as moving objects and varying shadows. Existing methods commonly rely on semantic cues extracted from pre-trained vision models to identify and suppress these distractors, but such semantics are misaligned with the binary distinction between static and transient regions and remain fragile under the appearance perturbations introduced during 3DGS optimization. We propose 3DGS-HPC, a framework that circumvents these limitations by combining two complementary principles: a patch-wise classification strategy that leverages local spatial consistency for robust region-level decisions, and a hybrid classification metric that adaptively integrates photometric and perceptual cues for more reliable separation. Extensive experiments demonstrate the superiority and robustness of our method in mitigating distractors to improve 3DGS-based novel view synthesis.
Paper Structure (22 sections, 7 equations, 17 figures, 3 tables)

This paper contains 22 sections, 7 equations, 17 figures, 3 tables.

Figures (17)

  • Figure 1: Comparison between previous methods and our proposed Hybrid Patch-wise Classification (HPC). When training 3DGS with static scenes disturbed by transient distractors, previous methods utilize image semantic priors but fail to separate transient distractors (e.g., pedestrians and shadows) from static backgrounds (e.g., mountains). By addressing their inherent semantic limitations, our method can generate binary masks that better represent transient distractors, thereby removing related artifacts in rendered results (yellow boxes).
  • Figure 2: Overview of Hybrid Patch-wise Classification (HPC). Given training images disturbed by transient distractors, our method (a) optimizes 3D Gaussian Splatting following standard practice, (b) classifies static and transient regions using a novel hybrid metric that combines the strengths of photometric and perceptual cues, and (c) performs classification at the patch level to exploit local spatial consistency, thereby enhancing robustness and efficiency without relying on external semantics such as segmentation or detection models.
  • Figure 3: Visualization of Different Partitioning Strategies. When grouping pixels into multiple semantically consistent regions with blue borders, existing methods (b-d) fail to faithfully represent transient regions (e.g., pedestrians and shadows) due to semantic mismatch between different tasks, while our patch-wise partition strategy (e) can accurately extract these regions.
  • Figure 4: Visualization of Different Error Metrics. Given training images with distractors and corresponding rendered images, traditional photometric metrics (L1) produce noisy static maps due to low-level appearance ambiguities (e.g., similar colors). Instead, perceptual metrics based on vision foundation models (DINOv2) produce clean static maps with clear boundaries.
  • Figure 5: Comparison of different classification approaches with different error metrics. Bottom row: Although perceptual error metrics are effective in capturing semantic differences, they can produce anomalous results in textureless regions (e.g., green wall and white tablecloth) due to visual perturbability (e.g., blurring) between training and rendered images. Top row: In contrast, while inherently noisier, photometric error metrics demonstrate greater reliability in estimating the overall proportion of transient pixels within an image, thus guiding perception error metrics for more accurate classification.
  • ...and 12 more figures