Table of Contents
Fetching ...

MetaDiff: Meta-Learning with Conditional Diffusion for Few-Shot Learning

Baoquan Zhang, Chuyao Luo, Demin Yu, Huiwei Lin, Xutao Li, Yunming Ye, Bowen Zhang

TL;DR

This work reframes gradient-based meta-learning for few-shot tasks as a diffusion denoising process, enabling a diffusion-style meta-optimizer (MetaDiff) that eliminates the need to differentiate through inner-loop updates. By treating weight updates as denoising steps and conditioning the denoiser on the support set via a task-conditional UNet, MetaDiff derives a learnable gradient-descent form with momentum and uncertainty without second-order backpropagation. The framework is trained episodically on base data to predict target base-learner weights from Gaussian initializations, and evaluated on MiniImagenet and TieredImagenet where it outperforms key gradient-based baselines and shows robust ablations. Overall, MetaDiff offers a memory-efficient, theoretically grounded approach that improves few-shot learning performance by unifying diffusion modeling with meta-optimization, producing practical gains for rapid adaptation in low-data regimes.

Abstract

Equipping a deep model the abaility of few-shot learning, i.e., learning quickly from only few examples, is a core challenge for artificial intelligence. Gradient-based meta-learning approaches effectively address the challenge by learning how to learn novel tasks. Its key idea is learning a deep model in a bi-level optimization manner, where the outer-loop process learns a shared gradient descent algorithm (i.e., its hyperparameters), while the inner-loop process leverage it to optimize a task-specific model by using only few labeled data. Although these existing methods have shown superior performance, the outer-loop process requires calculating second-order derivatives along the inner optimization path, which imposes considerable memory burdens and the risk of vanishing gradients. Drawing inspiration from recent progress of diffusion models, we find that the inner-loop gradient descent process can be actually viewed as a reverse process (i.e., denoising) of diffusion where the target of denoising is model weights but the origin data. Based on this fact, in this paper, we propose to model the gradient descent optimizer as a diffusion model and then present a novel task-conditional diffusion-based meta-learning, called MetaDiff, that effectively models the optimization process of model weights from Gaussion noises to target weights in a denoising manner. Thanks to the training efficiency of diffusion models, our MetaDiff do not need to differentiate through the inner-loop path such that the memory burdens and the risk of vanishing gradients can be effectvely alleviated. Experiment results show that our MetaDiff outperforms the state-of-the-art gradient-based meta-learning family in few-shot learning tasks.

MetaDiff: Meta-Learning with Conditional Diffusion for Few-Shot Learning

TL;DR

This work reframes gradient-based meta-learning for few-shot tasks as a diffusion denoising process, enabling a diffusion-style meta-optimizer (MetaDiff) that eliminates the need to differentiate through inner-loop updates. By treating weight updates as denoising steps and conditioning the denoiser on the support set via a task-conditional UNet, MetaDiff derives a learnable gradient-descent form with momentum and uncertainty without second-order backpropagation. The framework is trained episodically on base data to predict target base-learner weights from Gaussian initializations, and evaluated on MiniImagenet and TieredImagenet where it outperforms key gradient-based baselines and shows robust ablations. Overall, MetaDiff offers a memory-efficient, theoretically grounded approach that improves few-shot learning performance by unifying diffusion modeling with meta-optimization, producing practical gains for rapid adaptation in low-data regimes.

Abstract

Equipping a deep model the abaility of few-shot learning, i.e., learning quickly from only few examples, is a core challenge for artificial intelligence. Gradient-based meta-learning approaches effectively address the challenge by learning how to learn novel tasks. Its key idea is learning a deep model in a bi-level optimization manner, where the outer-loop process learns a shared gradient descent algorithm (i.e., its hyperparameters), while the inner-loop process leverage it to optimize a task-specific model by using only few labeled data. Although these existing methods have shown superior performance, the outer-loop process requires calculating second-order derivatives along the inner optimization path, which imposes considerable memory burdens and the risk of vanishing gradients. Drawing inspiration from recent progress of diffusion models, we find that the inner-loop gradient descent process can be actually viewed as a reverse process (i.e., denoising) of diffusion where the target of denoising is model weights but the origin data. Based on this fact, in this paper, we propose to model the gradient descent optimizer as a diffusion model and then present a novel task-conditional diffusion-based meta-learning, called MetaDiff, that effectively models the optimization process of model weights from Gaussion noises to target weights in a denoising manner. Thanks to the training efficiency of diffusion models, our MetaDiff do not need to differentiate through the inner-loop path such that the memory burdens and the risk of vanishing gradients can be effectvely alleviated. Experiment results show that our MetaDiff outperforms the state-of-the-art gradient-based meta-learning family in few-shot learning tasks.
Paper Structure (17 sections, 12 equations, 6 figures, 4 tables, 2 algorithms)

This paper contains 17 sections, 12 equations, 6 figures, 4 tables, 2 algorithms.

Figures (6)

  • Figure 1: Connection between gradient descent algorithm (GDA) and diffusion models. The gradient descent process (a) of GDA is similar to denoising process (b) of diffusion models. Based on this, we propose to model GDA as the denoising process of a diffusion model (c) and learn it in a diffusion manner, which does not need to differentiate through inner-loop path such that the issue of memory burdens and vanishing gradients can be alleviated for improving FSL.
  • Figure 2: (a) The overall framework of our MetaDiff-based FSL method. (b) Illustration of our MetaDiff meta-optimizer $\epsilon_{\theta}(\cdot)$.
  • Figure 3: Illustration of our task-conditional UNet (i.e., TCUNet). "EB", "BB", and "DB" denotes the encoder, bottle, and decoder blocks, repsectively. The details of "EB", "BB", and "DB" are all similar. For clarity, we only show the design details of "BB" in figure and others are similar.
  • Figure 4: GPU memory on 1-shot tasks of miniImagenet.
  • Figure 5: Convergence Analysis on miniImagenet.
  • ...and 1 more figures