Table of Contents
Fetching ...

Model Diffusion for Certifiable Few-shot Transfer Learning

Fady Rezk, Royson Lee, Henry Gouk, Timothy Hospedales, Minyoung Kim

TL;DR

STEEL introduces Sample ThEn Evaluate Learner, a diffusion-guided, gradient-free transfer method that restricts the downstream hypothesis to a finite set of PEFT adapters learned from upstream tasks. By sampling from a learned diffusion model and selecting the adapter with the lowest empirical loss on a small support set, STEEL enables tight PAC-Bayes risk certificates even in low-shot settings. Across both large language models and vision benchmarks, STEEL achieves non-vacuous generalization guarantees for most episodes while maintaining competitive task performance, outperforming gradient-based and other baselines in certifiability. This approach offers a practical path to certifiable few-shot transfer learning with scalable, interpolation-enabled hypothesis generation.

Abstract

In contemporary deep learning, a prevalent and effective workflow for solving low-data problems is adapting powerful pre-trained foundation models (FMs) to new tasks via parameter-efficient fine-tuning (PEFT). However, while empirically effective, the resulting solutions lack generalisation guarantees to certify their accuracy - which may be required for ethical or legal reasons prior to deployment in high-importance applications. In this paper we develop a novel transfer learning approach that is designed to facilitate non-vacuous learning theoretic generalisation guarantees for downstream tasks, even in the low-shot regime. Specifically, we first use upstream tasks to train a distribution over PEFT parameters. We then learn the downstream task by a sample-and-evaluate procedure -- sampling plausible PEFTs from the trained diffusion model and selecting the one with the highest likelihood on the downstream data. Crucially, this confines our model hypothesis to a finite set of PEFT samples. In contrast to the typical continuous hypothesis spaces of neural network weights, this facilitates tighter risk certificates. We instantiate our bound and show non-trivial generalization guarantees compared to existing learning approaches which lead to vacuous bounds in the low-shot regime.

Model Diffusion for Certifiable Few-shot Transfer Learning

TL;DR

STEEL introduces Sample ThEn Evaluate Learner, a diffusion-guided, gradient-free transfer method that restricts the downstream hypothesis to a finite set of PEFT adapters learned from upstream tasks. By sampling from a learned diffusion model and selecting the adapter with the lowest empirical loss on a small support set, STEEL enables tight PAC-Bayes risk certificates even in low-shot settings. Across both large language models and vision benchmarks, STEEL achieves non-vacuous generalization guarantees for most episodes while maintaining competitive task performance, outperforming gradient-based and other baselines in certifiability. This approach offers a practical path to certifiable few-shot transfer learning with scalable, interpolation-enabled hypothesis generation.

Abstract

In contemporary deep learning, a prevalent and effective workflow for solving low-data problems is adapting powerful pre-trained foundation models (FMs) to new tasks via parameter-efficient fine-tuning (PEFT). However, while empirically effective, the resulting solutions lack generalisation guarantees to certify their accuracy - which may be required for ethical or legal reasons prior to deployment in high-importance applications. In this paper we develop a novel transfer learning approach that is designed to facilitate non-vacuous learning theoretic generalisation guarantees for downstream tasks, even in the low-shot regime. Specifically, we first use upstream tasks to train a distribution over PEFT parameters. We then learn the downstream task by a sample-and-evaluate procedure -- sampling plausible PEFTs from the trained diffusion model and selecting the one with the highest likelihood on the downstream data. Crucially, this confines our model hypothesis to a finite set of PEFT samples. In contrast to the typical continuous hypothesis spaces of neural network weights, this facilitates tighter risk certificates. We instantiate our bound and show non-trivial generalization guarantees compared to existing learning approaches which lead to vacuous bounds in the low-shot regime.

Paper Structure

This paper contains 30 sections, 13 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Generalization bounds for adapting CLIP to novel tasks (5-way classification with 1–16 examples per class). Plots show classification error (y-axis) versus the complexity term (x-axis, log scale; square root terms from Equations \ref{['eq:bound']}, and \ref{['eq:qbound_best']}). Top/Bottom: Mean support/query (train/test) error on new tasks. Shaded regions indicate vacuous bounds, where (support error + complexity) $\geq1$. Non-vacuous guarantees lie in the unshaded region. Competing methods (SGD, BBPT) fail to achieve non-vacuous bounds. In contrast, our method yields non-vacuous guarantees without significantly compromising training fit (top) or test accuracy (bottom).
  • Figure 2: Distribution of generalisation guarantees (x-axis, log scale) obtained over few-shot LLM adaptation episodes. Vertical lines indicate the vacuous bound threshold. STEEL provides a dramatically better distribution of provable generalisation outcomes compared to alternatives.
  • Figure 3: Dependence of generalisation guarantee on training set size. Our finite-hypothesis class learner STEEL achieves non-vacuous guarantees from 4-shot onward. Standard approaches provide no guarantees anywhere in this low-shot range.
  • Figure 4: "Learning curves" illustrating empirical and certified learning dynamics of STEEL with respect to samples/iterations, which is equivalent to hypothesis space size. More samples improves the training (support) error, while increasing the complexity penalty. The sum of these two terms instantiates the generalisation guarantee (Eq. \ref{['eq:bound']}) achieved for a given number of samples.
  • Figure 5: Support/query error and certified risk vs. support set size on iNaturalist birds. As the number of shots increases, support and query errors converge, while the bound continues to tighten. The gap between the certified risk and empirical query error drops to 6% at 128 shots, demonstrating the ability to produce tight certificates.