Bridging Geometry and Appearance: Topological Features for Robust Self-Supervised Segmentation

Kebin Peng; Haotang Li; Zhenyu Qi; Huashan Chen; Zi Wang; Wei Zhang; Sen He; Huanrui Yang; Qing Guo

Bridging Geometry and Appearance: Topological Features for Robust Self-Supervised Segmentation

Kebin Peng, Haotang Li, Zhenyu Qi, Huashan Chen, Zi Wang, Wei Zhang, Sen He, Huanrui Yang, Qing Guo

TL;DR

This work tackles the fragility of monocular depth estimation under challenging, low-visibility conditions by introducing PhysDepth, a plug-and-play framework that fuses geometric cues with physically grounded priors. Central to the approach is the Physical Prior Module (PPM), which extracts robust red-channel features and injects them into the base MDE backbone, and the Red Channel Attenuation Loss (RCA) that leverages the Beer-Lambert law and Rayleigh scattering to supervise depth via $d_R = -\frac{1}{\mu} \ln f(R) + \frac{1}{\mu}(g\lambda - 1)$. The authors demonstrate state-of-the-art performance across RobotCar-Night, nuScenes-Night, and nuScenes-Rain, and show the framework’s plug-and-play benefits across multiple backbones, including during daytime evaluation on KITTI. Their results establish that incorporating physical priors yields more stable, physically meaningful depth estimates than purely data-driven methods, with practical implications for autonomous driving and robotics in adverse conditions. halos and complex lighting remain as future challenges for further improvement.

Abstract

Self-supervised semantic segmentation methods often fail when faced with appearance ambiguities. We argue that this is due to an over-reliance on unstable, appearance-based features such as shadows, glare, and local textures. We propose \textbf{GASeg}, a novel framework that bridges appearance and geometry by leveraging stable topological information. The core of our method is Differentiable Box-Counting (\textbf{DBC}) module, which quantifies multi-scale topological statistics from two parallel streams: geometric-based features and appearance-based features. To force the model to learn these stable structural representations, we introduce Topological Augmentation (\textbf{TopoAug}), an adversarial strategy that simulates real-world ambiguities by applying morphological operators to the input images. A multi-objective loss, \textbf{GALoss}, then explicitly enforces cross-modal alignment between geometric-based and appearance-based features. Extensive experiments demonstrate that GASeg achieves state-of-the-art performance on four benchmarks, including COCO-Stuff, Cityscapes, and PASCAL, validating our approach of bridging geometry and appearance via topological information.

Bridging Geometry and Appearance: Topological Features for Robust Self-Supervised Segmentation

TL;DR

Abstract

Bridging Geometry and Appearance: Topological Features for Robust Self-Supervised Segmentation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)