Table of Contents
Fetching ...

Bridging Geometry and Appearance: Topological Features for Robust Self-Supervised Segmentation

Kebin Peng, Haotang Li, Zhenyu Qi, Huashan Chen, Zi Wang, Wei Zhang, Sen He, Huanrui Yang, Qing Guo

TL;DR

This work tackles the fragility of monocular depth estimation under challenging, low-visibility conditions by introducing PhysDepth, a plug-and-play framework that fuses geometric cues with physically grounded priors. Central to the approach is the Physical Prior Module (PPM), which extracts robust red-channel features and injects them into the base MDE backbone, and the Red Channel Attenuation Loss (RCA) that leverages the Beer-Lambert law and Rayleigh scattering to supervise depth via $d_R = -\frac{1}{\mu} \ln f(R) + \frac{1}{\mu}(g\lambda - 1)$. The authors demonstrate state-of-the-art performance across RobotCar-Night, nuScenes-Night, and nuScenes-Rain, and show the framework’s plug-and-play benefits across multiple backbones, including during daytime evaluation on KITTI. Their results establish that incorporating physical priors yields more stable, physically meaningful depth estimates than purely data-driven methods, with practical implications for autonomous driving and robotics in adverse conditions. halos and complex lighting remain as future challenges for further improvement.

Abstract

Self-supervised semantic segmentation methods often fail when faced with appearance ambiguities. We argue that this is due to an over-reliance on unstable, appearance-based features such as shadows, glare, and local textures. We propose \textbf{GASeg}, a novel framework that bridges appearance and geometry by leveraging stable topological information. The core of our method is Differentiable Box-Counting (\textbf{DBC}) module, which quantifies multi-scale topological statistics from two parallel streams: geometric-based features and appearance-based features. To force the model to learn these stable structural representations, we introduce Topological Augmentation (\textbf{TopoAug}), an adversarial strategy that simulates real-world ambiguities by applying morphological operators to the input images. A multi-objective loss, \textbf{GALoss}, then explicitly enforces cross-modal alignment between geometric-based and appearance-based features. Extensive experiments demonstrate that GASeg achieves state-of-the-art performance on four benchmarks, including COCO-Stuff, Cityscapes, and PASCAL, validating our approach of bridging geometry and appearance via topological information.

Bridging Geometry and Appearance: Topological Features for Robust Self-Supervised Segmentation

TL;DR

This work tackles the fragility of monocular depth estimation under challenging, low-visibility conditions by introducing PhysDepth, a plug-and-play framework that fuses geometric cues with physically grounded priors. Central to the approach is the Physical Prior Module (PPM), which extracts robust red-channel features and injects them into the base MDE backbone, and the Red Channel Attenuation Loss (RCA) that leverages the Beer-Lambert law and Rayleigh scattering to supervise depth via . The authors demonstrate state-of-the-art performance across RobotCar-Night, nuScenes-Night, and nuScenes-Rain, and show the framework’s plug-and-play benefits across multiple backbones, including during daytime evaluation on KITTI. Their results establish that incorporating physical priors yields more stable, physically meaningful depth estimates than purely data-driven methods, with practical implications for autonomous driving and robotics in adverse conditions. halos and complex lighting remain as future challenges for further improvement.

Abstract

Self-supervised semantic segmentation methods often fail when faced with appearance ambiguities. We argue that this is due to an over-reliance on unstable, appearance-based features such as shadows, glare, and local textures. We propose \textbf{GASeg}, a novel framework that bridges appearance and geometry by leveraging stable topological information. The core of our method is Differentiable Box-Counting (\textbf{DBC}) module, which quantifies multi-scale topological statistics from two parallel streams: geometric-based features and appearance-based features. To force the model to learn these stable structural representations, we introduce Topological Augmentation (\textbf{TopoAug}), an adversarial strategy that simulates real-world ambiguities by applying morphological operators to the input images. A multi-objective loss, \textbf{GALoss}, then explicitly enforces cross-modal alignment between geometric-based and appearance-based features. Extensive experiments demonstrate that GASeg achieves state-of-the-art performance on four benchmarks, including COCO-Stuff, Cityscapes, and PASCAL, validating our approach of bridging geometry and appearance via topological information.

Paper Structure

This paper contains 22 sections, 9 equations, 9 figures, 20 tables.

Figures (9)

  • Figure 1: Plug-and-Play of the PhysDepth. we show that adding PhysDepth to a baselines like MonoDepth2 godard2019digging, md4ALLgasperini2023robust, and RNW wang2021regularizing effectively corrects common failure modes.
  • Figure 2: Quantitative Analysis of Robustness to Atmospheric Attenuation. Quantitative comparison of model robustness against increasing atmospheric attenuation ($\beta$). Our method (purple line) achieves the lowest Absolute Relative Error (AbsRel) and the flattest performance curve.
  • Figure 3: PhysDepth Architecture contains three parts:(a) The Physical Prior Module (PPM), which uses a ConvNeXt backbone to extract hierarchical features from the red channels (Rt, Rt-1). (b) The Base MDE Model, which uses a ViT Encoder and a Decoder to predict depth from the full image. The PPM's features are fused into the Base Decoder at multiple scales. (c) The Loss Function, which includes our RCA Loss that supervises the PPM.
  • Figure 4: Qualitative Results - RobotCar-Night: Depth estimated on five different test images (row: Inputs) using our model (row: PhysDepth) and three SOTA (remaining rows) for comparison.
  • Figure 5: Qualitative Results - nuScence-Rain and nuScence-Night: Depth estimated on different test images using PhysDepth and two SOTAs for comparison.
  • ...and 4 more figures