Table of Contents
Fetching ...

Cross-Domain Few-Shot Learning with Coalescent Projections and Latent Space Reservation

Naeem Paeedeh, Mahardhika Pratama, Imam Mustafa Kamal, Wolfgang Mayer, Jimmy Cao, Ryszard Kowlczyk

TL;DR

This work tackles cross-domain few-shot learning under extreme domain shifts by freezing a pre-trained transformer backbone and introducing two innovations: Coalescent Projection (CP), a per-head learnable matrix that substitutes soft prompts to adapt attention with lower risk of overfitting, and Latent Space Reservation (LSR), which generates pseudo-novel embeddings and rotated-input augmentations to carve out space for unseen domains. The latent-space pseudo-classes repel base-class embeddings and expand decision boundaries, while input-space rotations diversify the base, collectively improving generalization on the BSCD-FSL benchmark. Theoretical analysis links LSR to tighter generalization bounds by reducing domain discrepancy and stabilizing class distributions; empirically, CPLSR consistently outperforms DINO and competitive baselines on Mini-ImageNet and Tiered-ImageNet bases across four target datasets in 1- and 5-shot settings. The proposed method demonstrates strong cross-domain generalization with low additional parameter cost and is accompanied by open-source code for reproducibility and further research impact.

Abstract

Despite the progress in cross-domain few-shot learning, a model pre-trained with DINO combined with a prototypical classifier outperforms the latest SOTA methods. A crucial limitation that needs to be overcome is that updating too many parameters of the transformers leads to overfitting due to the scarcity of labeled samples. To address this challenge, we propose a new concept, coalescent projection, as an effective successor to soft prompts. Additionally, we propose a novel pseudo-class generation method, combined with self-supervised transformations, that relies solely on the base domain to prepare the network to encounter unseen samples from different domains. The proposed method exhibits its effectiveness in comprehensive experiments on the extreme domain-shift problem of the BSCD-FSL benchmark. Our code is published at \href{https://github.com/Naeem-Paeedeh/CPLSR}{https://github.com/Naeem-Paeedeh/CPLSR}.

Cross-Domain Few-Shot Learning with Coalescent Projections and Latent Space Reservation

TL;DR

This work tackles cross-domain few-shot learning under extreme domain shifts by freezing a pre-trained transformer backbone and introducing two innovations: Coalescent Projection (CP), a per-head learnable matrix that substitutes soft prompts to adapt attention with lower risk of overfitting, and Latent Space Reservation (LSR), which generates pseudo-novel embeddings and rotated-input augmentations to carve out space for unseen domains. The latent-space pseudo-classes repel base-class embeddings and expand decision boundaries, while input-space rotations diversify the base, collectively improving generalization on the BSCD-FSL benchmark. Theoretical analysis links LSR to tighter generalization bounds by reducing domain discrepancy and stabilizing class distributions; empirically, CPLSR consistently outperforms DINO and competitive baselines on Mini-ImageNet and Tiered-ImageNet bases across four target datasets in 1- and 5-shot settings. The proposed method demonstrates strong cross-domain generalization with low additional parameter cost and is accompanied by open-source code for reproducibility and further research impact.

Abstract

Despite the progress in cross-domain few-shot learning, a model pre-trained with DINO combined with a prototypical classifier outperforms the latest SOTA methods. A crucial limitation that needs to be overcome is that updating too many parameters of the transformers leads to overfitting due to the scarcity of labeled samples. To address this challenge, we propose a new concept, coalescent projection, as an effective successor to soft prompts. Additionally, we propose a novel pseudo-class generation method, combined with self-supervised transformations, that relies solely on the base domain to prepare the network to encounter unseen samples from different domains. The proposed method exhibits its effectiveness in comprehensive experiments on the extreme domain-shift problem of the BSCD-FSL benchmark. Our code is published at \href{https://github.com/Naeem-Paeedeh/CPLSR}{https://github.com/Naeem-Paeedeh/CPLSR}.

Paper Structure

This paper contains 26 sections, 3 theorems, 36 equations, 4 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

CP can change the relative ordering of original image tokens, whereas soft prompts cannot.

Figures (4)

  • Figure 1: Comparison of inductive methods with DINO on BSCD-FSL benchmark.
  • Figure 2: From top, the calculation of the soft prompts, AttnScale, and , in the attention module. $\odot$ is the Hadamard product.
  • Figure 3: Latent Space Reservation. The arrows show the repulsive forces.
  • Figure 4: UMAP graphs for the Mini-ImageNet and CropDisease datasets.

Theorems & Definitions (4)

  • Theorem 1
  • Corollary 1
  • Theorem 1
  • proof