Table of Contents
Fetching ...

Discriminative Sample-Guided and Parameter-Efficient Feature Space Adaptation for Cross-Domain Few-Shot Learning

Rashindrie Perera, Saman Halgamuge

TL;DR

This work tackles cross-domain few-shot learning by introducing a highly parameter-efficient adaptation mechanism and a discriminative sample-guided loss to shape the feature space. It leverages Masked Image Modelling pre-training for task-agnostic representation learning and attaches lightweight linear adapters that tune only a small depth of layers, significantly reducing trainable parameters. A novel proxy-anchor loss guides both positive and negative hard examples to improve inter-/intra-class separation, while multi-layer feature fusion enriches representations. Empirically, the approach achieves state-of-the-art results on Meta-Dataset with substantial parameter efficiency, demonstrating strong performance on both seen and unseen domains and offering practical benefits for real-world cross-domain few-shot scenarios. The methodology provides a clean, scalable framework for adapting powerful pre-trained backbones to new tasks with limited labeled data.

Abstract

In this paper, we look at cross-domain few-shot classification which presents the challenging task of learning new classes in previously unseen domains with few labelled examples. Existing methods, though somewhat effective, encounter several limitations, which we alleviate through two significant improvements. First, we introduce a lightweight parameter-efficient adaptation strategy to address overfitting associated with fine-tuning a large number of parameters on small datasets. This strategy employs a linear transformation of pre-trained features, significantly reducing the trainable parameter count. Second, we replace the traditional nearest centroid classifier with a discriminative sample-aware loss function, enhancing the model's sensitivity to the inter- and intra-class variances within the training set for improved clustering in feature space. Empirical evaluations on the Meta-Dataset benchmark showcase that our approach not only improves accuracy up to 7.7\% and 5.3\% on previously seen and unseen datasets, respectively, but also achieves the above performance while being at least $\sim3\times$ more parameter-efficient than existing methods, establishing a new state-of-the-art in cross-domain few-shot learning. Our code is available at https://github.com/rashindrie/DIPA.

Discriminative Sample-Guided and Parameter-Efficient Feature Space Adaptation for Cross-Domain Few-Shot Learning

TL;DR

This work tackles cross-domain few-shot learning by introducing a highly parameter-efficient adaptation mechanism and a discriminative sample-guided loss to shape the feature space. It leverages Masked Image Modelling pre-training for task-agnostic representation learning and attaches lightweight linear adapters that tune only a small depth of layers, significantly reducing trainable parameters. A novel proxy-anchor loss guides both positive and negative hard examples to improve inter-/intra-class separation, while multi-layer feature fusion enriches representations. Empirically, the approach achieves state-of-the-art results on Meta-Dataset with substantial parameter efficiency, demonstrating strong performance on both seen and unseen domains and offering practical benefits for real-world cross-domain few-shot scenarios. The methodology provides a clean, scalable framework for adapting powerful pre-trained backbones to new tasks with limited labeled data.

Abstract

In this paper, we look at cross-domain few-shot classification which presents the challenging task of learning new classes in previously unseen domains with few labelled examples. Existing methods, though somewhat effective, encounter several limitations, which we alleviate through two significant improvements. First, we introduce a lightweight parameter-efficient adaptation strategy to address overfitting associated with fine-tuning a large number of parameters on small datasets. This strategy employs a linear transformation of pre-trained features, significantly reducing the trainable parameter count. Second, we replace the traditional nearest centroid classifier with a discriminative sample-aware loss function, enhancing the model's sensitivity to the inter- and intra-class variances within the training set for improved clustering in feature space. Empirical evaluations on the Meta-Dataset benchmark showcase that our approach not only improves accuracy up to 7.7\% and 5.3\% on previously seen and unseen datasets, respectively, but also achieves the above performance while being at least more parameter-efficient than existing methods, establishing a new state-of-the-art in cross-domain few-shot learning. Our code is available at https://github.com/rashindrie/DIPA.
Paper Structure (36 sections, 4 equations, 8 figures, 12 tables)

This paper contains 36 sections, 4 equations, 8 figures, 12 tables.

Figures (8)

  • Figure 1: Visualization of feature space adaptation using the support set by employing (a) NCC and (b) Our approach for a few-shot classification task on ImageNet russakovsky2015imagenet. The $\Delta$ represents class centroids. The clusters formed by NCC are located close to each other, thereby, potentially generating confusing class centroids. In contrast, clusters formed in our approach are well separated.
  • Figure 2: Illustration of our framework during meta-testing. (a): A set of task-specific parameters $(\gamma_m, \beta_m) \in h_{\psi}$ are attached to a ViT backbone $f_{\theta}$, up to a pre-defined tuning depth $d_t$, where $m=1,..,6$. (b)$h_{\psi}$ is fine-tuned on the support set using a set of learnable anchors $A_{\phi}$. (c) Query images are classified by assigning them to the nearest class centre using the fine-tuned model $f_{\hat{\theta}}$. $L$: Number of layers in the ViT, $d_f$: feature fusion depth, $z_l$: $[cls]$ output from $l$th ViT layer, $Z$: fused feature embedding, Norm: Layer Normalization, MLP: Multi-Layer Perceptron, and MHA: Multi-head Attention.
  • Figure 3: Loss gradient calculation during training. The example illustrates a scenario with three unique classes red, green and blue, denoted by $r, g, b$. The anchor of each class is coloured in black and denoted as $a_{class}$. (a) The gradients for positive samples of class $r$ are computed based on the relative hardness of all positive samples, so as to pull harder positives more strongly (thicker black lines). (b) The gradient calculation for negative samples for class $r$ considers the distribution of all negative samples and tries to push harder negatives more strongly (thicker black lines).
  • Figure 4: Variation of accuracies for each dataset in Meta-Dataset as $d_t$ varies in the MDL setting. Average results are reported while more detailed results can be found in Supplementary Table \ref{['appendix_tab:tuning_depth_datasets']}. The dotted lines represent the relatively more challenging datasets. For each dataset, the value of $d_t$ that reports the highest accuracy is annotated with a dot.
  • Figure 5: Variation of average seen, unseen and total average accuracies across all domains in Meta-Dataset as the number of tuned layers $d_t$ varies in the MDL setting. Average results are reported and more detailed results can be found in Supplementary Table \ref{['appendix_tab:tuning_depth_datasets']}. The value of $d_t$ that reports the highest average for each line is annotated with a dot.
  • ...and 3 more figures