Neural Network Diffusion
Kai Wang, Dongwen Tang, Boya Zeng, Yida Yin, Zhaopan Xu, Yukun Zhou, Zelin Zang, Trevor Darrell, Zhuang Liu, Yang You
TL;DR
<3-5 sentence high-level summary>Neural Network Diffusion (p-diff) tackles the problem of generating high-performing neural network parameters without full gradient optimization. It uses a simple two-stage architecture: an autoencoder to learn latent representations of parameter subsets and a diffusion model to synthesize these latents from random noise, decoded back into parameters. Empirically, p-diff matches or exceeds the performance of trained baselines across multiple datasets and architectures, while producing novel, non-memorized parameter configurations. The work demonstrates diffusion models’ versatility beyond image generation and suggests broader potential for parameter-space learning.
Abstract
Diffusion models have achieved remarkable success in image and video generation. In this work, we demonstrate that diffusion models can also \textit{generate high-performing neural network parameters}. Our approach is simple, utilizing an autoencoder and a diffusion model. The autoencoder extracts latent representations of a subset of the trained neural network parameters. Next, a diffusion model is trained to synthesize these latent representations from random noise. This model then generates new representations, which are passed through the autoencoder's decoder to produce new subsets of high-performing network parameters. Across various architectures and datasets, our approach consistently generates models with comparable or improved performance over trained networks, with minimal additional cost. Notably, we empirically find that the generated models are not memorizing the trained ones. Our results encourage more exploration into the versatile use of diffusion models. Our code is available \href{https://github.com/NUS-HPC-AI-Lab/Neural-Network-Diffusion}{here}.
