Table of Contents
Fetching ...

Learning to Learn Weight Generation via Local Consistency Diffusion

Yunchuan Guan, Yu Liu, Ke Zhou, Zhiqi Shen, Jenq-Neng Hwang, Lei Li

TL;DR

This work tackles the challenge of generative weight creation with limited cross-task transfer by marrying diffusion-based weight generation with bi-level meta-learning. The core idea, local consistency diffusion (Mc-Di), learns from local optimization targets along a task trajectory while maintaining alignment with the global optimum, aided by SAM to improve convergence. Empirically, Mc-Di delivers higher accuracy and lower inference latency than state-of-the-art weight-generation methods across transfer learning, few-shot learning, domain generalization, and large-language-model adaptation. The approach enables gradient-free, rapid weight updates in scenarios demanding frequent adaptation, with demonstrated practicality and scalability across diverse architectures and tasks.

Abstract

Diffusion-based algorithms have emerged as promising techniques for weight generation. However, existing solutions are limited by two challenges: generalizability and local target assignment. The former arises from the inherent lack of cross-task transferability in existing single-level optimization methods, limiting the model's performance on new tasks. The latter lies in existing research modeling only global optimal weights, neglecting the supervision signals in local target weights. Moreover, naively assigning local target weights causes local-global inconsistency. To address these issues, we propose Mc-Di, which integrates the diffusion algorithm with meta-learning for better generalizability. Furthermore, we extend the vanilla diffusion into a local consistency diffusion algorithm. Our theory and experiments demonstrate that it can learn from local targets while maintaining consistency with the global optima. We validate Mc-Di's superior accuracy and inference efficiency in tasks that require frequent weight updates, including transfer learning, few-shot learning, domain generalization, and large language model adaptation.

Learning to Learn Weight Generation via Local Consistency Diffusion

TL;DR

This work tackles the challenge of generative weight creation with limited cross-task transfer by marrying diffusion-based weight generation with bi-level meta-learning. The core idea, local consistency diffusion (Mc-Di), learns from local optimization targets along a task trajectory while maintaining alignment with the global optimum, aided by SAM to improve convergence. Empirically, Mc-Di delivers higher accuracy and lower inference latency than state-of-the-art weight-generation methods across transfer learning, few-shot learning, domain generalization, and large-language-model adaptation. The approach enables gradient-free, rapid weight updates in scenarios demanding frequent adaptation, with demonstrated practicality and scalability across diverse architectures and tasks.

Abstract

Diffusion-based algorithms have emerged as promising techniques for weight generation. However, existing solutions are limited by two challenges: generalizability and local target assignment. The former arises from the inherent lack of cross-task transferability in existing single-level optimization methods, limiting the model's performance on new tasks. The latter lies in existing research modeling only global optimal weights, neglecting the supervision signals in local target weights. Moreover, naively assigning local target weights causes local-global inconsistency. To address these issues, we propose Mc-Di, which integrates the diffusion algorithm with meta-learning for better generalizability. Furthermore, we extend the vanilla diffusion into a local consistency diffusion algorithm. Our theory and experiments demonstrate that it can learn from local targets while maintaining consistency with the global optima. We validate Mc-Di's superior accuracy and inference efficiency in tasks that require frequent weight updates, including transfer learning, few-shot learning, domain generalization, and large language model adaptation.

Paper Structure

This paper contains 35 sections, 6 theorems, 45 equations, 11 figures, 8 tables, 2 algorithms.

Key Result

Theorem 1

Given the number of diffusion steps $T$, an increasing schedule $\{\alpha_0,...,\alpha_T\}$, local target weights $\{\theta_d,...,\theta_{k\times d}\}$, and let the inference process align with the vanilla diffusion algorithm, i.e., Then, the denoiser $\epsilon_{\phi}$ can recover the target sequence $\{\theta_d,...,\theta_{M=k\times d}\}$ from standard Gaussian noise $x_0$ with evenly $T/k$ step

Figures (11)

  • Figure 1: Visualization of inference chains in Omniglot’s 2D weight-reduced space. Darker areas indicate lower task loss. The model trained with local target weights produces accurate and efficient inference chains.
  • Figure 1: Ablation of main components on Omniglot and Mini-Imagenet datasets. We record the accuracy of each variant on 5-way 1-shot tasks.
  • Figure 2: Workflow of Mc-Di. In the weight-preparation stage, a real-world optimizer (Adam) is used to optimize the downstream task. We collect the optimization trajectory, i.e., $\{\theta_i\}_{i=0}^M$, and sample from them to obtain local target weights $\{\theta_{i\times d}\}_{i=1}^k$. In the meta-training stage, a meta-learner $f^G_{\phi}$ assigns a base learner $f^G_{\phi_i}$ to each local target weight $\theta_{i\times d}$. Within each inner-loop, the base learner models local targets using local consistency diffusion.
  • Figure 3: Naively assigning local targets creates inconsistency.
  • Figure 4: Segment Number vs. MSE trade-off on 5-way 1-shot and 5-shot tasks.
  • ...and 6 more figures

Theorems & Definitions (10)

  • Theorem 1
  • Theorem 2
  • Theorem 1
  • proof
  • Proposition 1
  • proof
  • Lemma 1
  • proof
  • Theorem 2
  • proof