Table of Contents
Fetching ...

Fusion-then-Distillation: Toward Cross-modal Positive Distillation for Domain Adaptive 3D Semantic Segmentation

Yao Wu, Mingwei Xing, Yachao Zhang, Yuan Xie, Yanyun Qu

TL;DR

This work proposes a novel fusion-then-distillation (FtD++) method to explore cross-modal positive distillation of the source and target domains for 3D semantic segmentation and proposes cross-modal debiased pseudo-labeling to model the uncertainty of pseudo-labels via a self-training manner.

Abstract

In cross-modal unsupervised domain adaptation, a model trained on source-domain data (e.g., synthetic) is adapted to target-domain data (e.g., real-world) without access to target annotation. Previous methods seek to mutually mimic cross-modal outputs in each domain, which enforces a class probability distribution that is agreeable in different domains. However, they overlook the complementarity brought by the heterogeneous fusion in cross-modal learning. In light of this, we propose a novel fusion-then-distillation (FtD++) method to explore cross-modal positive distillation of the source and target domains for 3D semantic segmentation. FtD++ realizes distribution consistency between outputs not only for 2D images and 3D point clouds but also for source-domain and augment-domain. Specially, our method contains three key ingredients. First, we present a model-agnostic feature fusion module to generate the cross-modal fusion representation for establishing a latent space. In this space, two modalities are enforced maximum correlation and complementarity. Second, the proposed cross-modal positive distillation preserves the complete information of multi-modal input and combines the semantic content of the source domain with the style of the target domain, thereby achieving domain-modality alignment. Finally, cross-modal debiased pseudo-labeling is devised to model the uncertainty of pseudo-labels via a self-training manner. Extensive experiments report state-of-the-art results on several domain adaptive scenarios under unsupervised and semi-supervised settings. Code is available at https://github.com/Barcaaaa/FtD-PlusPlus.

Fusion-then-Distillation: Toward Cross-modal Positive Distillation for Domain Adaptive 3D Semantic Segmentation

TL;DR

This work proposes a novel fusion-then-distillation (FtD++) method to explore cross-modal positive distillation of the source and target domains for 3D semantic segmentation and proposes cross-modal debiased pseudo-labeling to model the uncertainty of pseudo-labels via a self-training manner.

Abstract

In cross-modal unsupervised domain adaptation, a model trained on source-domain data (e.g., synthetic) is adapted to target-domain data (e.g., real-world) without access to target annotation. Previous methods seek to mutually mimic cross-modal outputs in each domain, which enforces a class probability distribution that is agreeable in different domains. However, they overlook the complementarity brought by the heterogeneous fusion in cross-modal learning. In light of this, we propose a novel fusion-then-distillation (FtD++) method to explore cross-modal positive distillation of the source and target domains for 3D semantic segmentation. FtD++ realizes distribution consistency between outputs not only for 2D images and 3D point clouds but also for source-domain and augment-domain. Specially, our method contains three key ingredients. First, we present a model-agnostic feature fusion module to generate the cross-modal fusion representation for establishing a latent space. In this space, two modalities are enforced maximum correlation and complementarity. Second, the proposed cross-modal positive distillation preserves the complete information of multi-modal input and combines the semantic content of the source domain with the style of the target domain, thereby achieving domain-modality alignment. Finally, cross-modal debiased pseudo-labeling is devised to model the uncertainty of pseudo-labels via a self-training manner. Extensive experiments report state-of-the-art results on several domain adaptive scenarios under unsupervised and semi-supervised settings. Code is available at https://github.com/Barcaaaa/FtD-PlusPlus.

Paper Structure

This paper contains 37 sections, 26 equations, 11 figures, 8 tables.

Figures (11)

  • Figure 1: Per-class IoU(%) on "Day$\to$Night" scenario by using w/o cross-modal distillation, vanilla cross-modal distillation, and our fusion-then-distillation.
  • Figure 2: The comparison of four types of cross-modal UDA methods, from top to bottom: (a) Bridging the cross-modality gap; (b) Bridging the cross-domain gap; (c) Bidirectional Distillation BFtD-xMUDA and (d) Cross-modal Positive Distillation proposed in this work. The core idea of (c) and (d) is to explore whether and how cross-modal fusion representation facilitates an efficient and robust UDA model.
  • Figure 3: Overview framework of FtD++, which contains two learning mechanisms: Modality-Preserving Distillation (MPD) and Domain-Preserving Distillation (DPD). Notably, Student and Teacher share the same 2D and 3D backbones, but the parameters of Teacher are updated to the counterparts of Student via Exponential Moving Average (EMA) after each training iteration. Here, "Only S" means only applying source prediction for DPD.
  • Figure 4: The diagram of MFFM with memorized modality attention modules.
  • Figure 5: The diagram of xDPL for the target domain in the self-training stage. The fuzzy region denotes the high prediction variances.
  • ...and 6 more figures