Learning to Learn Weight Generation via Local Consistency Diffusion
Yunchuan Guan, Yu Liu, Ke Zhou, Zhiqi Shen, Jenq-Neng Hwang, Lei Li
TL;DR
This work tackles the challenge of generative weight creation with limited cross-task transfer by marrying diffusion-based weight generation with bi-level meta-learning. The core idea, local consistency diffusion (Mc-Di), learns from local optimization targets along a task trajectory while maintaining alignment with the global optimum, aided by SAM to improve convergence. Empirically, Mc-Di delivers higher accuracy and lower inference latency than state-of-the-art weight-generation methods across transfer learning, few-shot learning, domain generalization, and large-language-model adaptation. The approach enables gradient-free, rapid weight updates in scenarios demanding frequent adaptation, with demonstrated practicality and scalability across diverse architectures and tasks.
Abstract
Diffusion-based algorithms have emerged as promising techniques for weight generation. However, existing solutions are limited by two challenges: generalizability and local target assignment. The former arises from the inherent lack of cross-task transferability in existing single-level optimization methods, limiting the model's performance on new tasks. The latter lies in existing research modeling only global optimal weights, neglecting the supervision signals in local target weights. Moreover, naively assigning local target weights causes local-global inconsistency. To address these issues, we propose Mc-Di, which integrates the diffusion algorithm with meta-learning for better generalizability. Furthermore, we extend the vanilla diffusion into a local consistency diffusion algorithm. Our theory and experiments demonstrate that it can learn from local targets while maintaining consistency with the global optima. We validate Mc-Di's superior accuracy and inference efficiency in tasks that require frequent weight updates, including transfer learning, few-shot learning, domain generalization, and large language model adaptation.
