Table of Contents
Fetching ...

Instruction-Guided Autoregressive Neural Network Parameter Generation

Soro Bedionita, Bruno Andreis, Song Chong, Sung Ju Hwang

TL;DR

IGPG introduces an instruction-guided, autoregressive framework for neural network parameter generation that unifies weight synthesis across tasks and architectures. By coupling a Gumbel-VQVAE encoder–decoder with a transformer prior conditioned on dataset embeddings and architecture descriptions, IGPG preserves inter-layer coherence and scales to large models through chunked, token-level generation. Empirically, IGPG achieves competitive or superior performance across Tiny Model Zoo benchmarks, cross-architecture transfer, and LoRA-based extensions, while enabling rapid adaptation on unseen tasks. The approach enables efficient pretrained-weight retrieval, model selection, and fast task-specific fine-tuning, offering a scalable path to leveraging diverse model collections in practical deployments.

Abstract

Learning to generate neural network parameters conditioned on task descriptions and architecture specifications is pivotal for advancing model adaptability and transfer learning. Existing methods especially those based on diffusion models suffer from limited scalability to large architectures, rigidity in handling varying network depths, and disjointed parameter generation that undermines inter-layer coherence. In this work, we propose IGPG (Instruction Guided Parameter Generation), an autoregressive framework that unifies parameter synthesis across diverse tasks and architectures. IGPG leverages a VQ-VAE and an autoregressive model to generate neural network parameters, conditioned on task instructions, dataset, and architecture details. By autoregressively generating neural network weights' tokens, IGPG ensures inter-layer coherence and enables efficient adaptation across models and datasets. Operating at the token level, IGPG effectively captures complex parameter distributions aggregated from a broad spectrum of pretrained models. Extensive experiments on multiple vision datasets demonstrate that IGPG consolidates diverse pretrained models into a single, flexible generative framework. The synthesized parameters achieve competitive or superior performance relative to state-of-the-art methods, especially in terms of scalability and efficiency when applied to large architectures. These results underscore ICPG potential as a powerful tool for pretrained weight retrieval, model selection, and rapid task-specific fine-tuning.

Instruction-Guided Autoregressive Neural Network Parameter Generation

TL;DR

IGPG introduces an instruction-guided, autoregressive framework for neural network parameter generation that unifies weight synthesis across tasks and architectures. By coupling a Gumbel-VQVAE encoder–decoder with a transformer prior conditioned on dataset embeddings and architecture descriptions, IGPG preserves inter-layer coherence and scales to large models through chunked, token-level generation. Empirically, IGPG achieves competitive or superior performance across Tiny Model Zoo benchmarks, cross-architecture transfer, and LoRA-based extensions, while enabling rapid adaptation on unseen tasks. The approach enables efficient pretrained-weight retrieval, model selection, and fast task-specific fine-tuning, offering a scalable path to leveraging diverse model collections in practical deployments.

Abstract

Learning to generate neural network parameters conditioned on task descriptions and architecture specifications is pivotal for advancing model adaptability and transfer learning. Existing methods especially those based on diffusion models suffer from limited scalability to large architectures, rigidity in handling varying network depths, and disjointed parameter generation that undermines inter-layer coherence. In this work, we propose IGPG (Instruction Guided Parameter Generation), an autoregressive framework that unifies parameter synthesis across diverse tasks and architectures. IGPG leverages a VQ-VAE and an autoregressive model to generate neural network parameters, conditioned on task instructions, dataset, and architecture details. By autoregressively generating neural network weights' tokens, IGPG ensures inter-layer coherence and enables efficient adaptation across models and datasets. Operating at the token level, IGPG effectively captures complex parameter distributions aggregated from a broad spectrum of pretrained models. Extensive experiments on multiple vision datasets demonstrate that IGPG consolidates diverse pretrained models into a single, flexible generative framework. The synthesized parameters achieve competitive or superior performance relative to state-of-the-art methods, especially in terms of scalability and efficiency when applied to large architectures. These results underscore ICPG potential as a powerful tool for pretrained weight retrieval, model selection, and rapid task-specific fine-tuning.

Paper Structure

This paper contains 32 sections, 3 equations, 5 figures, 9 tables, 1 algorithm.

Figures (5)

  • Figure 1: Our approach integrates a VQ-VAE autoencoder ($\mathbf{E}$–$\mathbf{D}$) with a transformer prior. First, the VQ-VAE encodes vectorized network parameters (see Section \ref{['approach:enc']}), and then the transformer is trained on the resulting codebook (see Section \ref{['sec:autoreg']}). Additionally, prompts—including data, task, or architecture details—are processed using multimodal or language modeling techniques (see Section \ref{['sec:autoreg']}), with an example training simplified prompt template provided in Remark \ref{['tmp']}.
  • Figure 2: Transfer learning evaluation on novel datasets: CIFAR100, CIFAR10, Aircraft30, and PETS10 compared to random initialization.
  • Figure 3: Performance evaluation with seen and unseen ResNet architectures on CIFAR-10 against models pretrained on CIFAR-100 and Random Initialization.
  • Figure 4: Comparison of IGPG's conditional sampled weight based initialization versus pretrained models across diverse architectures on CIFAR10 and CIFAR100
  • Figure 5: Parameters distribution of diverse architectures pretrained on CIFAR-10 and CIFAR100 all jointly encoded by IGPG

Theorems & Definitions (1)

  • Remark 1