Model Diffusion for Certifiable Few-shot Transfer Learning
Fady Rezk, Royson Lee, Henry Gouk, Timothy Hospedales, Minyoung Kim
TL;DR
STEEL introduces Sample ThEn Evaluate Learner, a diffusion-guided, gradient-free transfer method that restricts the downstream hypothesis to a finite set of PEFT adapters learned from upstream tasks. By sampling from a learned diffusion model and selecting the adapter with the lowest empirical loss on a small support set, STEEL enables tight PAC-Bayes risk certificates even in low-shot settings. Across both large language models and vision benchmarks, STEEL achieves non-vacuous generalization guarantees for most episodes while maintaining competitive task performance, outperforming gradient-based and other baselines in certifiability. This approach offers a practical path to certifiable few-shot transfer learning with scalable, interpolation-enabled hypothesis generation.
Abstract
In contemporary deep learning, a prevalent and effective workflow for solving low-data problems is adapting powerful pre-trained foundation models (FMs) to new tasks via parameter-efficient fine-tuning (PEFT). However, while empirically effective, the resulting solutions lack generalisation guarantees to certify their accuracy - which may be required for ethical or legal reasons prior to deployment in high-importance applications. In this paper we develop a novel transfer learning approach that is designed to facilitate non-vacuous learning theoretic generalisation guarantees for downstream tasks, even in the low-shot regime. Specifically, we first use upstream tasks to train a distribution over PEFT parameters. We then learn the downstream task by a sample-and-evaluate procedure -- sampling plausible PEFTs from the trained diffusion model and selecting the one with the highest likelihood on the downstream data. Crucially, this confines our model hypothesis to a finite set of PEFT samples. In contrast to the typical continuous hypothesis spaces of neural network weights, this facilitates tighter risk certificates. We instantiate our bound and show non-trivial generalization guarantees compared to existing learning approaches which lead to vacuous bounds in the low-shot regime.
