Learning from Unlabelled Data with Transformers: Domain Adaptation for Semantic Segmentation of High Resolution Aerial Images

Nikolaos Dionelis; Francesco Pro; Luca Maiano; Irene Amerini; Bertrand Le Saux

Learning from Unlabelled Data with Transformers: Domain Adaptation for Semantic Segmentation of High Resolution Aerial Images

Nikolaos Dionelis, Francesco Pro, Luca Maiano, Irene Amerini, Bertrand Le Saux

TL;DR

The paper tackles semantic segmentation on unlabelled Earth Observation data by introducing NEOS, a Transformer-based framework (SegFormer B5) with a two-head architecture that performs domain adaptation to align features across labelled and unlabelled datasets. NEOS optimizes a three-term loss $L = L_0 + \lambda_1 L_1 + \lambda_2 L_2$, where $L_0$ is pixel-wise cross-entropy, $L_1$ is Dice loss, and $L_2$ enforces latent-feature alignment for domain invariance, enabling segmentation on the unlabelled CVUSA data. Evaluations on Potsdam, Vaihingen, and CVUSA show NEOS surpasses baselines (including SAM and related models) in accuracy, F1-score, and IoU, confirming effective domain adaptation under distribution shifts due to scene, sensor, and temporal factors. This work reduces labeling requirements for cross-domain EO segmentation and supports robust deployment across changing acquisition conditions.

Abstract

Data from satellites or aerial vehicles are most of the times unlabelled. Annotating such data accurately is difficult, requires expertise, and is costly in terms of time. Even if Earth Observation (EO) data were correctly labelled, labels might change over time. Learning from unlabelled data within a semi-supervised learning framework for segmentation of aerial images is challenging. In this paper, we develop a new model for semantic segmentation of unlabelled images, the Non-annotated Earth Observation Semantic Segmentation (NEOS) model. NEOS performs domain adaptation as the target domain does not have ground truth semantic segmentation masks. The distribution inconsistencies between the target and source domains are due to differences in acquisition scenes, environment conditions, sensors, and times. Our model aligns the learned representations of the different domains to make them coincide. The evaluation results show that NEOS is successful and outperforms other models for semantic segmentation of unlabelled data.

Learning from Unlabelled Data with Transformers: Domain Adaptation for Semantic Segmentation of High Resolution Aerial Images

TL;DR

, where

is pixel-wise cross-entropy,

is Dice loss, and

enforces latent-feature alignment for domain invariance, enabling segmentation on the unlabelled CVUSA data. Evaluations on Potsdam, Vaihingen, and CVUSA show NEOS surpasses baselines (including SAM and related models) in accuracy, F1-score, and IoU, confirming effective domain adaptation under distribution shifts due to scene, sensor, and temporal factors. This work reduces labeling requirements for cross-domain EO segmentation and supports robust deployment across changing acquisition conditions.

Abstract

Paper Structure (4 sections, 4 equations, 3 figures)

This paper contains 4 sections, 4 equations, 3 figures.

Introduction
Related Work
Proposed Methodology
Evaluation and Results

Figures (3)

Figure 1: Flowchart of NEOS for semantic segmentation using domain adaptation on datasets with no ground truth labels.
Figure 2: Evaluation of NEOS in accuracy (Acc), F1-score (F1) and IoU on the dataset Potsdam with the class Clutter [12].
Figure :

Learning from Unlabelled Data with Transformers: Domain Adaptation for Semantic Segmentation of High Resolution Aerial Images

TL;DR

Abstract

Learning from Unlabelled Data with Transformers: Domain Adaptation for Semantic Segmentation of High Resolution Aerial Images

Authors

TL;DR

Abstract

Table of Contents

Figures (3)