Flatness Improves Backbone Generalisation in Few-shot Classification

Rui Li; Martin Trapp; Marcus Klasson; Arno Solin

Flatness Improves Backbone Generalisation in Few-shot Classification

Rui Li, Martin Trapp, Marcus Klasson, Arno Solin

TL;DR

The paper tackles the challenge of generalising few-shot classification across multiple, heterogeneous domains by focusing on backbone quality rather than complex fusion pipelines. It proposes a simple, theoretically grounded protocol: train backbones with flatness-aware objectives (e.g., sharpness-aware minimisation, SAM), fuse information across domains via fine-tuning (including LoRA variants), and select the most compatible backbone for unseen tasks using PARC scores. The authors derive a bound linking the target-domain generalisation gap to the SAM-ERM loss gap and domain divergence, and they provide extensive empirical evidence showing that flatness improves backbone generalisation, fine-tuning effectively fuses information, and the combined SAM+FT approach rivals or surpasses state-of-the-art methods on the Meta-Dataset benchmark. The work offers a practical, competitive baseline that is simple to integrate with existing FSC adaptation methods and suggests broader applicability to other cross-domain learning settings.

Abstract

Deployment of deep neural networks in real-world settings typically requires adaptation to new tasks with few examples. Few-shot classification (FSC) provides a solution to this problem by leveraging pre-trained backbones for fast adaptation to new classes. However, approaches for multi-domain FSC typically result in complex pipelines aimed at information fusion and task-specific adaptation without consideration of the importance of backbone training. In this work, we introduce an effective strategy for backbone training and selection in multi-domain FSC by utilizing flatness-aware training and fine-tuning. Our work is theoretically grounded and empirically performs on par or better than state-of-the-art methods despite being simpler. Further, our results indicate that backbone training is crucial for good generalisation in FSC across different adaptation methods.

Flatness Improves Backbone Generalisation in Few-shot Classification

TL;DR

Abstract

Paper Structure (39 sections, 6 theorems, 24 equations, 4 figures, 12 tables)

This paper contains 39 sections, 6 theorems, 24 equations, 4 figures, 12 tables.

Introduction
Background
Few-shot Classification
Sharpness-aware Minimisation
Methods
Flatness Leads to a Better Backbone for Adaptation
Backbone Training
Flatness Aware Training Objective
Information Fusing using Fine-tuning
Backbone Selection
Experiments
Experimental Setup
Does Flatness Help Generalisation in FSC?
Is Fine-tuning Enough for Information Fusion?
How Does Our Approach Compare with SoTA?
...and 24 more sections

Key Result

Theorem 3.1

First, let $\{\Theta_k \subset \mathbb{R}^d, k=1, \ldots, K\}$, where $d$ is dimension of $\Theta$, be a finite cover of the parameter space $\Theta$ consisting of $K$ closed balls with radius $\rho/2$ where $K \stackrel{\Delta}{=} \lceil(\operatorname{diam}(\Theta) / \rho)^d\rceil$. Denote the $VC$

Figures (4)

Figure 1: Average test accuracy on the Meta-Dataset benchmark for different backbone trainings using the adaptation by li2022tsa. Across different information fusion methods, sharpness-aware minimisation (SAM) leads to better performance than empirical risk minimisation (ERM), showing flatness improves backbone generalisation.
Figure 2: Illustration that solutions in flat areas on the training loss can result in better generalisation behaviour on the test loss.
Figure 3: Decomposition of $f_{\bm{\theta}}(\cdot)$ with $\bm{\theta} = \{\bm{\phi}, \bm{\psi}\}$ into task-agnostic layers parametrised by $\textcolor{asyellow}{\bm{\phi}}$ and task-specific layers parametrised by $\textcolor{asblue}{\bm{\psi}}$. Additional gates $\circ$ are used to switch between task-specific layers and the identity function. Note that this construction is only for theoretical purposes and does not imply any additional operations in practice.
Figure 4: Our training protocol: SAM-based backbone training on a large and diverse data set (e.g., ImageNet), SAM-based fine-tuning of on additional training data sets, backbone selection and adaptation on the selected backbone $\rightarrow$.

Theorems & Definitions (11)

Theorem 3.1: cha2021swad
Theorem 3.2
Lemma A.1
proof
Lemma A.2
proof
Lemma A.3
proof
Lemma A.4
proof
...and 1 more

Flatness Improves Backbone Generalisation in Few-shot Classification

TL;DR

Abstract

Flatness Improves Backbone Generalisation in Few-shot Classification

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (11)