Table of Contents
Fetching ...

Cross-Domain Few-Shot Learning via Adaptive Transformer Networks

Naeem Paeedeh, Mahardhika Pratama, Muhammad Anwar Ma'sum, Wolfgang Mayer, Zehong Cao, Ryszard Kowlczyk

TL;DR

This work tackles cross-domain few-shot learning under large domain shifts by presenting ADAPTER, a bidirectional cross-attention transformer built on a compact transformer backbone. ADAPTER combines a self-supervised representation learning phase using DINO with a subsequent few-shot phase that adapts a target classifier and employs label propagation-based smoothing to counteract small-sample bias. The method demonstrates strong improvements over diverse baselines on the BSCD-FSL benchmark, highlighting the importance of explicit domain alignment via cross-attention in transformer architectures. The results suggest practical impact for rapid adaptation across heterogeneous data sources, with potential extensions to continual cross-domain learning scenarios.

Abstract

Most few-shot learning works rely on the same domain assumption between the base and the target tasks, hindering their practical applications. This paper proposes an adaptive transformer network (ADAPTER), a simple but effective solution for cross-domain few-shot learning where there exist large domain shifts between the base task and the target task. ADAPTER is built upon the idea of bidirectional cross-attention to learn transferable features between the two domains. The proposed architecture is trained with DINO to produce diverse, and less biased features to avoid the supervision collapse problem. Furthermore, the label smoothing approach is proposed to improve the consistency and reliability of the predictions by also considering the predicted labels of the close samples in the embedding space. The performance of ADAPTER is rigorously evaluated in the BSCD-FSL benchmarks in which it outperforms prior arts with significant margins.

Cross-Domain Few-Shot Learning via Adaptive Transformer Networks

TL;DR

This work tackles cross-domain few-shot learning under large domain shifts by presenting ADAPTER, a bidirectional cross-attention transformer built on a compact transformer backbone. ADAPTER combines a self-supervised representation learning phase using DINO with a subsequent few-shot phase that adapts a target classifier and employs label propagation-based smoothing to counteract small-sample bias. The method demonstrates strong improvements over diverse baselines on the BSCD-FSL benchmark, highlighting the importance of explicit domain alignment via cross-attention in transformer architectures. The results suggest practical impact for rapid adaptation across heterogeneous data sources, with potential extensions to continual cross-domain learning scenarios.

Abstract

Most few-shot learning works rely on the same domain assumption between the base and the target tasks, hindering their practical applications. This paper proposes an adaptive transformer network (ADAPTER), a simple but effective solution for cross-domain few-shot learning where there exist large domain shifts between the base task and the target task. ADAPTER is built upon the idea of bidirectional cross-attention to learn transferable features between the two domains. The proposed architecture is trained with DINO to produce diverse, and less biased features to avoid the supervision collapse problem. Furthermore, the label smoothing approach is proposed to improve the consistency and reliability of the predictions by also considering the predicted labels of the close samples in the embedding space. The performance of ADAPTER is rigorously evaluated in the BSCD-FSL benchmarks in which it outperforms prior arts with significant margins.
Paper Structure (16 sections, 8 equations, 2 figures, 5 tables, 1 algorithm)

This paper contains 16 sections, 8 equations, 2 figures, 5 tables, 1 algorithm.

Figures (2)

  • Figure 1: ADAPTER learns the base task $B$ and the unlabelled samples of the target task $U$ in a self-supervised manner using the DINO method and bidirectional features of the quadruple transformer block. A target classification head is created using a few labelled samples of the target task $S$. The label smoothing procedure via the label propagation method is carried out to refine the model's predictions.
  • Figure 2: t-SNE plots of 10 classes from the test sets of EuroSAT and CropDisease datasets for 2000 samples.