Table of Contents
Fetching ...

360SFUDA++: Towards Source-free UDA for Panoramic Segmentation by Learning Reliable Category Prototypes

Xu Zheng, Pengyuan Zhou, Athanasios V. Vasilakos, Lin Wang

TL;DR

This work tackles source-free unsupervised domain adaptation for panoramic semantic segmentation, transferring knowledge from a pinhole-trained source model to unlabeled 360-degree panoramas. It leverages multi-projection knowledge extraction (TP and FFP) and introduces RP^2AM for reliable panoramic prototype adaptation, complemented by CDAM for cross-projection feature alignment. The framework demonstrates substantial gains over prior SFUDA methods and achieves competitive performance with some UDA methods that access source data, across Cityscapes-to-DensePASS, SynPASS-to-DensePASS, and Stanford indoor panoramas. The proposed approach advances practical panoramic segmentation under privacy and data-access constraints by robustly bridging semantic, distortion, and style gaps between domains.

Abstract

In this paper, we address the challenging source-free unsupervised domain adaptation (SFUDA) for pinhole-to-panoramic semantic segmentation, given only a pinhole image pre-trained model (i.e., source) and unlabeled panoramic images (i.e., target). Tackling this problem is non-trivial due to three critical challenges: 1) semantic mismatches from the distinct Field-of-View (FoV) between domains, 2) style discrepancies inherent in the UDA problem, and 3) inevitable distortion of the panoramic images. To tackle these problems, we propose 360SFUDA++ that effectively extracts knowledge from the source pinhole model with only unlabeled panoramic images and transfers the reliable knowledge to the target panoramic domain. Specifically, we first utilize Tangent Projection (TP) as it has less distortion and meanwhile slits the equirectangular projection (ERP) to patches with fixed FoV projection (FFP) to mimic the pinhole images. Both projections are shown effective in extracting knowledge from the source model. However, as the distinct projections make it less possible to directly transfer knowledge between domains, we then propose Reliable Panoramic Prototype Adaptation Module (RP2AM) to transfer knowledge at both prediction and prototype levels. RP$^2$AM selects the confident knowledge and integrates panoramic prototypes for reliable knowledge adaptation. Moreover, we introduce Cross-projection Dual Attention Module (CDAM), which better aligns the spatial and channel characteristics across projections at the feature level between domains. Both knowledge extraction and transfer processes are synchronously updated to reach the best performance. Extensive experiments on the synthetic and real-world benchmarks, including outdoor and indoor scenarios, demonstrate that our 360SFUDA++ achieves significantly better performance than prior SFUDA methods.

360SFUDA++: Towards Source-free UDA for Panoramic Segmentation by Learning Reliable Category Prototypes

TL;DR

This work tackles source-free unsupervised domain adaptation for panoramic semantic segmentation, transferring knowledge from a pinhole-trained source model to unlabeled 360-degree panoramas. It leverages multi-projection knowledge extraction (TP and FFP) and introduces RP^2AM for reliable panoramic prototype adaptation, complemented by CDAM for cross-projection feature alignment. The framework demonstrates substantial gains over prior SFUDA methods and achieves competitive performance with some UDA methods that access source data, across Cityscapes-to-DensePASS, SynPASS-to-DensePASS, and Stanford indoor panoramas. The proposed approach advances practical panoramic segmentation under privacy and data-access constraints by robustly bridging semantic, distortion, and style gaps between domains.

Abstract

In this paper, we address the challenging source-free unsupervised domain adaptation (SFUDA) for pinhole-to-panoramic semantic segmentation, given only a pinhole image pre-trained model (i.e., source) and unlabeled panoramic images (i.e., target). Tackling this problem is non-trivial due to three critical challenges: 1) semantic mismatches from the distinct Field-of-View (FoV) between domains, 2) style discrepancies inherent in the UDA problem, and 3) inevitable distortion of the panoramic images. To tackle these problems, we propose 360SFUDA++ that effectively extracts knowledge from the source pinhole model with only unlabeled panoramic images and transfers the reliable knowledge to the target panoramic domain. Specifically, we first utilize Tangent Projection (TP) as it has less distortion and meanwhile slits the equirectangular projection (ERP) to patches with fixed FoV projection (FFP) to mimic the pinhole images. Both projections are shown effective in extracting knowledge from the source model. However, as the distinct projections make it less possible to directly transfer knowledge between domains, we then propose Reliable Panoramic Prototype Adaptation Module (RP2AM) to transfer knowledge at both prediction and prototype levels. RPAM selects the confident knowledge and integrates panoramic prototypes for reliable knowledge adaptation. Moreover, we introduce Cross-projection Dual Attention Module (CDAM), which better aligns the spatial and channel characteristics across projections at the feature level between domains. Both knowledge extraction and transfer processes are synchronously updated to reach the best performance. Extensive experiments on the synthetic and real-world benchmarks, including outdoor and indoor scenarios, demonstrate that our 360SFUDA++ achieves significantly better performance than prior SFUDA methods.
Paper Structure (22 sections, 13 equations, 10 figures, 7 tables, 1 algorithm)

This paper contains 22 sections, 13 equations, 10 figures, 7 tables, 1 algorithm.

Figures (10)

  • Figure 1: (a) Performance comparison between zheng2024semantics and 360SFUDA++; (b) Prototype comparison between zheng2024semantics and 360SFUDA++ on outdoor C-to-D scenario, different colors stand for different categories.
  • Figure 2: Overall framework of our proposed 360SFUDA++.
  • Figure 3: Illustration of the cross-projection pixel-wise prediction assessment.
  • Figure 4: Illustration of the cross-projection prototype extraction.
  • Figure 5: Visualization results on C-to-D scenario. (a) source, (b) SFDA liu2021source, (c) DATC yang2022source, (d) 360SFUDA, (e) 360SFUDA++ (f) Ground Truth (GT).
  • ...and 5 more figures