Few and Fewer: Learning Better from Few Examples Using Fewer Base Classes

Raphael Lafargue; Yassir Bendou; Bastien Pasdeloup; Jean-Philippe Diguet; Ian Reid; Vincent Gripon; Jack Valmadre

Few and Fewer: Learning Better from Few Examples Using Fewer Base Classes

Raphael Lafargue, Yassir Bendou, Bastien Pasdeloup, Jean-Philippe Diguet, Ian Reid, Vincent Gripon, Jack Valmadre

TL;DR

This work tackles cross-domain few-shot learning by fine-tuning a base feature extractor on a subset of base classes to reduce domain mismatch, enabling better downstream separability with a simple NCM classifier. It introduces Domain-Informed, Task-Informed, and Uninformed settings for selecting base-class subsets, plus practical selection strategies (AA, UOT) and a static library of specialized extractors built via Ward clustering. Empirical results across eight Meta-Dataset domains show that DI (and, to a lesser extent, TI) consistently boosts accuracy, while UI approaches using robust heuristics (SSA, MCS) also yield meaningful gains without task labels. The findings suggest that task-aware adaptation of embeddings, rather than universal feature extractors, can significantly improve few-shot performance, and the paper provides a concrete framework and code to reproduce these gains in practice.

Abstract

When training data is scarce, it is common to make use of a feature extractor that has been pre-trained on a large base dataset, either by fine-tuning its parameters on the ``target'' dataset or by directly adopting its representation as features for a simple classifier. Fine-tuning is ineffective for few-shot learning, since the target dataset contains only a handful of examples. However, directly adopting the features without fine-tuning relies on the base and target distributions being similar enough that these features achieve separability and generalization. This paper investigates whether better features for the target dataset can be obtained by training on fewer base classes, seeking to identify a more useful base dataset for a given task.We consider cross-domain few-shot image classification in eight different domains from Meta-Dataset and entertain multiple real-world settings (domain-informed, task-informed and uninformed) where progressively less detail is known about the target task. To our knowledge, this is the first demonstration that fine-tuning on a subset of carefully selected base classes can significantly improve few-shot learning. Our contributions are simple and intuitive methods that can be implemented in any few-shot solution. We also give insights into the conditions in which these solutions are likely to provide a boost in accuracy. We release the code to reproduce all experiments from this paper on GitHub. https://github.com/RafLaf/Few-and-Fewer.git

Few and Fewer: Learning Better from Few Examples Using Fewer Base Classes

TL;DR

Abstract

Paper Structure (37 sections, 4 equations, 21 figures, 8 tables, 1 algorithm)

This paper contains 37 sections, 4 equations, 21 figures, 8 tables, 1 algorithm.

Introduction
Background and related work
Feature extractors for fewer base classes
Formulation
Choosing class subsets: Informed settings
Choosing class subsets: Uninformed setting
Heuristics for selecting a feature extractor
Experiments
Effect of informed class selection
Uninformed setting
Implementation details
Discussion
Conclusion
Appendix
Impact of learning rate on fine-tuning (DI selection)
...and 22 more sections

Figures (21)

Figure 1: Difference of accuracy with baseline after feature extractor selection using heuristics. Tasks are sampled following the MD protocol. In R (resp. X) heuristics select a feature extractor amongst the R (resp. X) library of feature extractors. The oracle OR (resp. OX) selects the best feature extractor for each task in the R (resp. X) library. The Random Heuristic (RH) picks a random feature extractor. SSA and MCS are the two best performing heuristics. A meaningful choice of class (X) is desirable in particular on datasets with high boosts.
Figure 2: Relative gain in accuracy compared to the baseline after fine-tuning (Domain Informed setting), varying the number of classes $M$ selected using the Average Activation (AA) method. The star ticks correspond to the points where 90% of the cumulative activation across classes is reached. Apart from Aircraft and Fungi, the 90% cumulative activation threshold is reached around the same $M \sim 40$ which is around the peak of difference with baseline. Figure \ref{['pie']} shows the distribution of activation among classes.
Figure 3: Boost in Accuracy compared to the baseline for various learning rates lr using the DI selected feature extractor of each dataset. Learning rate is set to 0 when only batch normalization statistics are updated. In the paper we only show the case of $lr=0.001$. We observe a significant effect of the choice of the learning rate.
Figure 4: Selection of learning rate in DI setting using heuristics in MD sampling. Fixed corresponds to the performance of lr = 0.001 that was presented in the first table of the paper. Our methods outperforms the DI accuracy boost (Fixed) on Aircraft, Omniglot and Traffic Signs. We use the different learning rates presented in Table \ref{['DI_LR_heatmap']}
Figure 5: Selection of learning rate in DI setting using heuristics in 5-ways 5-shots sampling. Fixed corresponds to the performance of lr = 0.001 that was presented in the first table of the paper. Our methods outperforms the DI accuracy boost (Fixed) on Aircraft, Omniglot and Traffic Signs. We use the different learning rates presented in Table \ref{['DI_LR_heatmap']}
...and 16 more figures

Few and Fewer: Learning Better from Few Examples Using Fewer Base Classes

TL;DR

Abstract

Few and Fewer: Learning Better from Few Examples Using Fewer Base Classes

Authors

TL;DR

Abstract

Table of Contents

Figures (21)