Multi-source Domain Adaptation for Panoramic Semantic Segmentation
Jing Jiang, Sicheng Zhao, Jiankun Zhu, Wenbo Tang, Zhaopan Xu, Jidong Yang, Guoping Liu, Tengfei Xing, Pengfei Xu, Hongxun Yao
TL;DR
The paper tackles panoramic semantic segmentation in a multi-source domain setting by introducing MSDA4PASS, which leverages labeled real pinhole images and synthetic panoramic data to improve segmentation on unlabeled real panoramas. It presents DTA4PASS, comprising Unpaired Semantic Morphing (USM) to bridge distortion via a learnable, unpaired deformation, and Distortion Gating Alignment (DGA) to bridge texture gaps through pin- and pan-like feature gating and uncertainty-guided alignment. The approach achieves state-of-the-art results in outdoor and indoor panoramic benchmarks, demonstrating strong gains over single-source, multi-source, and panoramic-domain baselines, with robust ablations supporting the necessity of USM and DGA. The work offers a practical pathway for scalable panoramic scene understanding in applications like autonomous driving and robotics by effectively exploiting readily available pinhole and synthetic panoramic data while avoiding heavy reliance on costly real panoramic annotations.
Abstract
Unsupervised domain adaptation methods for panoramic semantic segmentation utilize real pinhole images or low-cost synthetic panoramic images to transfer segmentation models to real panoramic images. However, these methods struggle to understand the panoramic structure using only real pinhole images and lack real-world scene perception with only synthetic panoramic images. Therefore, in this paper, we propose a new task, Multi-source Domain Adaptation for Panoramic Semantic Segmentation (MSDA4PASS), which leverages both real pinhole and synthetic panoramic images to improve segmentation on unlabeled real panoramic images. There are two key issues in the MSDA4PASS task: (1) distortion gaps between the pinhole and panoramic domains -- panoramic images exhibit global and local distortions absent in pinhole images; (2) texture gaps between the source and target domains -- scenes and styles differ across domains. To address these two issues, we propose a novel framework, Deformation Transform Aligner for Panoramic Semantic Segmentation (DTA4PASS), which converts all pinhole images in the source domains into distorted images and aligns the source distorted and panoramic images with the target panoramic images. Specifically, DTA4PASS consists of two main components: Unpaired Semantic Morphing (USM) and Distortion Gating Alignment (DGA). First, in USM, the Dual-view Discriminator (DvD) assists in training the diffeomorphic deformation network at the image and pixel level, enabling the effective deformation transformation of pinhole images without paired panoramic views, alleviating distortion gaps. Second, DGA assigns pinhole-like (pin-like) and panoramic-like (pan-like) features to each image by gating, and aligns these two features through uncertainty estimation, reducing texture gaps.
