MetaDiff: Meta-Learning with Conditional Diffusion for Few-Shot Learning
Baoquan Zhang, Chuyao Luo, Demin Yu, Huiwei Lin, Xutao Li, Yunming Ye, Bowen Zhang
TL;DR
This work reframes gradient-based meta-learning for few-shot tasks as a diffusion denoising process, enabling a diffusion-style meta-optimizer (MetaDiff) that eliminates the need to differentiate through inner-loop updates. By treating weight updates as denoising steps and conditioning the denoiser on the support set via a task-conditional UNet, MetaDiff derives a learnable gradient-descent form with momentum and uncertainty without second-order backpropagation. The framework is trained episodically on base data to predict target base-learner weights from Gaussian initializations, and evaluated on MiniImagenet and TieredImagenet where it outperforms key gradient-based baselines and shows robust ablations. Overall, MetaDiff offers a memory-efficient, theoretically grounded approach that improves few-shot learning performance by unifying diffusion modeling with meta-optimization, producing practical gains for rapid adaptation in low-data regimes.
Abstract
Equipping a deep model the abaility of few-shot learning, i.e., learning quickly from only few examples, is a core challenge for artificial intelligence. Gradient-based meta-learning approaches effectively address the challenge by learning how to learn novel tasks. Its key idea is learning a deep model in a bi-level optimization manner, where the outer-loop process learns a shared gradient descent algorithm (i.e., its hyperparameters), while the inner-loop process leverage it to optimize a task-specific model by using only few labeled data. Although these existing methods have shown superior performance, the outer-loop process requires calculating second-order derivatives along the inner optimization path, which imposes considerable memory burdens and the risk of vanishing gradients. Drawing inspiration from recent progress of diffusion models, we find that the inner-loop gradient descent process can be actually viewed as a reverse process (i.e., denoising) of diffusion where the target of denoising is model weights but the origin data. Based on this fact, in this paper, we propose to model the gradient descent optimizer as a diffusion model and then present a novel task-conditional diffusion-based meta-learning, called MetaDiff, that effectively models the optimization process of model weights from Gaussion noises to target weights in a denoising manner. Thanks to the training efficiency of diffusion models, our MetaDiff do not need to differentiate through the inner-loop path such that the memory burdens and the risk of vanishing gradients can be effectvely alleviated. Experiment results show that our MetaDiff outperforms the state-of-the-art gradient-based meta-learning family in few-shot learning tasks.
