Table of Contents
Fetching ...

Semi Supervised Heterogeneous Domain Adaptation via Disentanglement and Pseudo-Labelling

Cassio F. Dantas, Raffaele Gaetano, Dino Ienco

TL;DR

SSHDA tackles domain shifts when source and target data come from different modalities, with scarce labels in the target domain. The authors propose SHeDD, an end-to-end framework that disentangles domain-invariant features from domain-specific information using separate encoders and classifiers, and enforces consistency via pseudo-labeling on unlabeled target samples. The model leverages four losses—classification, domain, orthogonality, and pseudo-label consistency—applied across labeled source, labeled target, and unlabeled target data. Experiments on RESISC45-Euro and EuroSat-MS-SAR show SHeDD consistently outperforms competitive baselines, especially when target labels are limited, demonstrating the approach's effectiveness for SSHDA in remote sensing.

Abstract

Semi-supervised domain adaptation methods leverage information from a source labelled domain with the goal of generalizing over a scarcely labelled target domain. While this setting already poses challenges due to potential distribution shifts between domains, an even more complex scenario arises when source and target data differs in modality representation (e.g. they are acquired by sensors with different characteristics). For instance, in remote sensing, images may be collected via various acquisition modes (e.g. optical or radar), different spectral characteristics (e.g. RGB or multi-spectral) and spatial resolutions. Such a setting is denoted as Semi-Supervised Heterogeneous Domain Adaptation (SSHDA) and it exhibits an even more severe distribution shift due to modality heterogeneity across domains.To cope with the challenging SSHDA setting, here we introduce SHeDD (Semi-supervised Heterogeneous Domain Adaptation via Disentanglement) an end-to-end neural framework tailored to learning a target domain classifier by leveraging both labelled and unlabelled data from heterogeneous data sources. SHeDD is designed to effectively disentangle domain-invariant representations, relevant for the downstream task, from domain-specific information, that can hinder the cross-modality transfer. Additionally, SHeDD adopts an augmentation-based consistency regularization mechanism that takes advantages of reliable pseudo-labels on the unlabelled target samples to further boost its generalization ability on the target domain. Empirical evaluations on two remote sensing benchmarks, encompassing heterogeneous data in terms of acquisition modes and spectral/spatial resolutions, demonstrate the quality of SHeDD compared to both baseline and state-of-the-art competing approaches. Our code is publicly available here: https://github.com/tanodino/SSHDA/

Semi Supervised Heterogeneous Domain Adaptation via Disentanglement and Pseudo-Labelling

TL;DR

SSHDA tackles domain shifts when source and target data come from different modalities, with scarce labels in the target domain. The authors propose SHeDD, an end-to-end framework that disentangles domain-invariant features from domain-specific information using separate encoders and classifiers, and enforces consistency via pseudo-labeling on unlabeled target samples. The model leverages four losses—classification, domain, orthogonality, and pseudo-label consistency—applied across labeled source, labeled target, and unlabeled target data. Experiments on RESISC45-Euro and EuroSat-MS-SAR show SHeDD consistently outperforms competitive baselines, especially when target labels are limited, demonstrating the approach's effectiveness for SSHDA in remote sensing.

Abstract

Semi-supervised domain adaptation methods leverage information from a source labelled domain with the goal of generalizing over a scarcely labelled target domain. While this setting already poses challenges due to potential distribution shifts between domains, an even more complex scenario arises when source and target data differs in modality representation (e.g. they are acquired by sensors with different characteristics). For instance, in remote sensing, images may be collected via various acquisition modes (e.g. optical or radar), different spectral characteristics (e.g. RGB or multi-spectral) and spatial resolutions. Such a setting is denoted as Semi-Supervised Heterogeneous Domain Adaptation (SSHDA) and it exhibits an even more severe distribution shift due to modality heterogeneity across domains.To cope with the challenging SSHDA setting, here we introduce SHeDD (Semi-supervised Heterogeneous Domain Adaptation via Disentanglement) an end-to-end neural framework tailored to learning a target domain classifier by leveraging both labelled and unlabelled data from heterogeneous data sources. SHeDD is designed to effectively disentangle domain-invariant representations, relevant for the downstream task, from domain-specific information, that can hinder the cross-modality transfer. Additionally, SHeDD adopts an augmentation-based consistency regularization mechanism that takes advantages of reliable pseudo-labels on the unlabelled target samples to further boost its generalization ability on the target domain. Empirical evaluations on two remote sensing benchmarks, encompassing heterogeneous data in terms of acquisition modes and spectral/spatial resolutions, demonstrate the quality of SHeDD compared to both baseline and state-of-the-art competing approaches. Our code is publicly available here: https://github.com/tanodino/SSHDA/
Paper Structure (9 sections, 5 equations, 3 figures, 6 tables, 1 algorithm)

This paper contains 9 sections, 5 equations, 3 figures, 6 tables, 1 algorithm.

Figures (3)

  • Figure 1: Schematic view of the proposed method architecture with a separate encoder for each of the data modalities (source and target). Feature disentanglement enables domain-specific and domain-invariant information to be encoded separately into each half of the generated embedding vectors (depicted in orange and green respectively). The domain-invariant information ($z^{inv}$) is used by the task classifier, while the domain classifier receives the domain-specific portion of the embedding vector ($z^{spe}$). At inference time, only the bottom part of the architecture is used, the top part being instrumental in the training stage to enable the feature disentanglement procedure.
  • Figure 2: Schematic view of the data flow during the training phase. The four proposed loss terms (framed in grey) are illustrated with their corresponding inputs.
  • Figure 3: Visualization of the embeddings extracted from the different competing approaches: (a) Target Only (b) FixMatch (c) SS-HIDA and (d) SHeDD when trained on the RESISC45-Euro benchmark with RGB as source and MS as target domain (RGB $\rightarrow$ MS) and only 25 labelled samples per class are considered for the target domain. For this visual inspection, 50 random samples per class from the test set (coming from the target domain) are sampled. The two dimensional representation is obtained via the T-SNE algorithm tsne.