Table of Contents
Fetching ...

Denoise and Align: Towards Source-Free UDA for Robust Panoramic Semantic Segmentation

Yaowen Chang, Zhen Cao, Xu Zheng, Xiaoxin Mi, Zhen Dong

Abstract

Panoramic semantic segmentation is pivotal for comprehensive 360° scene understanding in critical applications like autonomous driving and virtual reality. However, progress in this domain is constrained by two key challenges: the severe geometric distortions inherent in panoramic projections and the prohibitive cost of dense annotation. While Unsupervised Domain Adaptation (UDA) from label-rich pinhole-camera datasets offers a viable alternative, many real-world tasks impose a stricter source-free (SFUDA) constraint where source data is inaccessible for privacy or proprietary reasons. This constraint significantly amplifies the core problems of domain shift, leading to unreliable pseudo-labels and dramatic performance degradation, particularly for minority classes. To overcome these limitations, we propose the DAPASS framework. DAPASS introduces two synergistic modules to robustly transfer knowledge without source data. First, our Panoramic Confidence-Guided Denoising (PCGD) module generates high-fidelity, class-balanced pseudo-labels by enforcing perturbation consistency and incorporating neighborhood-level confidence to filter noise. Second, a Contextual Resolution Adversarial Module (CRAM) explicitly addresses scale variance and distortion by adversarially aligning fine-grained details from high-resolution crops with global semantics from low-resolution contexts. DAPASS achieves state-of-the-art performances on outdoor (Cityscapes-to-DensePASS) and indoor (Stanford2D3D) benchmarks, yielding 55.04% (+2.05%) and 70.38% (+1.54%) mIoU, respectively.

Denoise and Align: Towards Source-Free UDA for Robust Panoramic Semantic Segmentation

Abstract

Panoramic semantic segmentation is pivotal for comprehensive 360° scene understanding in critical applications like autonomous driving and virtual reality. However, progress in this domain is constrained by two key challenges: the severe geometric distortions inherent in panoramic projections and the prohibitive cost of dense annotation. While Unsupervised Domain Adaptation (UDA) from label-rich pinhole-camera datasets offers a viable alternative, many real-world tasks impose a stricter source-free (SFUDA) constraint where source data is inaccessible for privacy or proprietary reasons. This constraint significantly amplifies the core problems of domain shift, leading to unreliable pseudo-labels and dramatic performance degradation, particularly for minority classes. To overcome these limitations, we propose the DAPASS framework. DAPASS introduces two synergistic modules to robustly transfer knowledge without source data. First, our Panoramic Confidence-Guided Denoising (PCGD) module generates high-fidelity, class-balanced pseudo-labels by enforcing perturbation consistency and incorporating neighborhood-level confidence to filter noise. Second, a Contextual Resolution Adversarial Module (CRAM) explicitly addresses scale variance and distortion by adversarially aligning fine-grained details from high-resolution crops with global semantics from low-resolution contexts. DAPASS achieves state-of-the-art performances on outdoor (Cityscapes-to-DensePASS) and indoor (Stanford2D3D) benchmarks, yielding 55.04% (+2.05%) and 70.38% (+1.54%) mIoU, respectively.

Paper Structure

This paper contains 13 sections, 6 equations, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: Teaser of our DAPASS. Compared with source-only (Only use the pinhole source domain image datasets train the model without adaptation) and existing SOTA SFUDA methods zheng2024360sfuda++, DAPASS effectively removes pseudo-label noise and recovers fine details for different classes, benefiting from its PCGD and CRAM modules. Performance improvements are shown across cross-domain tasks.
  • Figure 2: An overview of the proposed DAPASS. The framework consists of two key modules: a Panoramic Confidence-Guided Denoising (PCGD) module to filter noisy pseudo-labels, and a Cross-Resolution Attention Module (CRAM) to enhance detail-level segmentation by combining HR and LR contexts.
  • Figure 3: Illustration of the proposed Panoramic Confidence-Guided Denoising (PCGD) module.
  • Figure 4: Visualization results on C-to-D scenario. From top to bottom: input image, Source-Only, SFDA, 360SFUDA++ zheng2024360sfuda++, our DAPASS, and ground truth (GT). Compared to prior methods, DAPASS yields more accurate boundaries, reduces noise, and better recognizes occluded objects (highlighted in dashed boxes).
  • Figure 5: Visualization results on Spin-to-Span task. From left to right: input image, 360SFUDA++ zheng2024360sfuda++, our DAPASS, and GT.