Table of Contents
Fetching ...

Recurrent Diffusion for Large-Scale Parameter Generation

Kai Wang, Dongwen Tang, Wangbo Zhao, Konstantin Schürholt, Zhangyang Wang, Yang You

TL;DR

RPG presents a scalable framework for generating full neural network parameters at scale on commodity GPUs by decoupling global parameter relationships from the synthesis step. It tokenizes parameters per-layer with layer-wise normalization, introduces a permutation state to resolve symmetry, and uses 2D position embeddings to preserve structure. A recurrent model learns inter-token dependencies to produce prototypes that condition a 1D diffusion process, which denoises parameter tokens into coherent, high-performance weight vectors. Across Vision Transformers, ConvNeXt, ResNets, and LoRA-based LLMs, RPG achieves accuracies on par with fully trained models while enabling generation of up to hundreds of millions of parameters with modest memory and time budgets, including demonstrated generalization to unseen tasks. This work pushes toward AI-generating-AI by enabling efficient, large-scale weight generation and broad applicability to diverse architectures and tasks.

Abstract

Parameter generation has long struggled to match the scale of today large vision and language models, curbing its broader utility. In this paper, we introduce Recurrent Diffusion for Large Scale Parameter Generation (RPG), a novel framework that generates full neural network parameters up to hundreds of millions on a single GPU. Our approach first partitions a networks parameters into non-overlapping tokens, each corresponding to a distinct portion of the model. A recurrent mechanism then learns the inter token relationships, producing prototypes which serve as conditions for a diffusion process that ultimately synthesizes the full parameters. Across a spectrum of architectures and tasks including ResNets, ConvNeXts and ViTs on ImageNet 1K and COCO, and even LoRA based LLMs RPG achieves performance on par with fully trained networks while avoiding excessive memory overhead. Notably, it generalizes beyond its training set to generate valid parameters for previously unseen tasks, highlighting its flexibility in dynamic and open ended scenarios. By overcoming the longstanding memory and scalability barriers, RPG serves as a critical advance in AI generating AI, potentially enabling efficient weight generation at scales previously deemed infeasible.

Recurrent Diffusion for Large-Scale Parameter Generation

TL;DR

RPG presents a scalable framework for generating full neural network parameters at scale on commodity GPUs by decoupling global parameter relationships from the synthesis step. It tokenizes parameters per-layer with layer-wise normalization, introduces a permutation state to resolve symmetry, and uses 2D position embeddings to preserve structure. A recurrent model learns inter-token dependencies to produce prototypes that condition a 1D diffusion process, which denoises parameter tokens into coherent, high-performance weight vectors. Across Vision Transformers, ConvNeXt, ResNets, and LoRA-based LLMs, RPG achieves accuracies on par with fully trained models while enabling generation of up to hundreds of millions of parameters with modest memory and time budgets, including demonstrated generalization to unseen tasks. This work pushes toward AI-generating-AI by enabling efficient, large-scale weight generation and broad applicability to diverse architectures and tasks.

Abstract

Parameter generation has long struggled to match the scale of today large vision and language models, curbing its broader utility. In this paper, we introduce Recurrent Diffusion for Large Scale Parameter Generation (RPG), a novel framework that generates full neural network parameters up to hundreds of millions on a single GPU. Our approach first partitions a networks parameters into non-overlapping tokens, each corresponding to a distinct portion of the model. A recurrent mechanism then learns the inter token relationships, producing prototypes which serve as conditions for a diffusion process that ultimately synthesizes the full parameters. Across a spectrum of architectures and tasks including ResNets, ConvNeXts and ViTs on ImageNet 1K and COCO, and even LoRA based LLMs RPG achieves performance on par with fully trained networks while avoiding excessive memory overhead. Notably, it generalizes beyond its training set to generate valid parameters for previously unseen tasks, highlighting its flexibility in dynamic and open ended scenarios. By overcoming the longstanding memory and scalability barriers, RPG serves as a critical advance in AI generating AI, potentially enabling efficient weight generation at scales previously deemed infeasible.
Paper Structure (75 sections, 5 equations, 8 figures, 20 tables)

This paper contains 75 sections, 5 equations, 8 figures, 20 tables.

Figures (8)

  • Figure 1: Partial roadmap of vision, language, and parameter generation models. Parameter number in vision or language models is at least $\mathbf{10^{3}}$ times larger than that of generated parameters.
  • Figure 2: Illustration of parameter processing (left) and inference of recurrent diffusion (right). The recurrent model integrates permutation states and position embeddings, generating prototypes that condition the diffusion model to synthesize the full parameters.
  • Figure 3: The figure shows the trade-off between accuracy and similarity with ViT-Tiny on ImageNet-1K. The shaded area includes the approximate range of noise-added checkpoints. This plot demonstrates the strong trade-off between accuracy and similarity and highlights our advantages over trivial interpolation.
  • Figure 4: An illustration of our binary embedding strategy and dataset construction. Left: binary embeddings (1022 in total) encode different CIFAR-10 classification tasks, where 1s indicate classes to be classified together (e.g., 'ship' and 'truck' in the first example). Right: the dataset consists of parameter-encoding pairs, formed by network parameters with their corresponding binary embeddings. These pairs are split into non-overlapping training and validation sets.
  • Figure 5: Illustration of the parameters of original and generated models in seen and unseen embeddings. We select 100 parameters of the classification head and visualize its normalized values.
  • ...and 3 more figures