Table of Contents
Fetching ...

NNiT: Width-Agnostic Neural Network Generation with Structurally Aligned Weight Spaces

Jiwoo Kim, Swarajh Mehta, Hao-Lun Hsu, Hyunwoo Ryu, Yudong Liu, Miroslav Pajic

TL;DR

This work introduces Neural Network Diffusion Transformers (NNiTs), which generate weights in a width-agnostic manner by tokenizing weight matrices into patches and modeling them as locally structured fields and generates fully functional networks across a range of architectures.

Abstract

Generative modeling of neural network parameters is often tied to architectures because standard parameter representations rely on known weight-matrix dimensions. Generation is further complicated by permutation symmetries that allow networks to model similar input-output functions while having widely different, unaligned parameterizations. In this work, we introduce Neural Network Diffusion Transformers (NNiTs), which generate weights in a width-agnostic manner by tokenizing weight matrices into patches and modeling them as locally structured fields. We establish that Graph HyperNetworks (GHNs) with a convolutional neural network (CNN) decoder structurally align the weight space, creating the local correlation necessary for patch-based processing. Focusing on MLPs, where permutation symmetry is especially apparent, NNiT generates fully functional networks across a range of architectures. Our approach jointly models discrete architecture tokens and continuous weight patches within a single sequence model. On ManiSkill3 robotics tasks, NNiT achieves >85% success on architecture topologies unseen during training, while baseline approaches fail to generalize.

NNiT: Width-Agnostic Neural Network Generation with Structurally Aligned Weight Spaces

TL;DR

This work introduces Neural Network Diffusion Transformers (NNiTs), which generate weights in a width-agnostic manner by tokenizing weight matrices into patches and modeling them as locally structured fields and generates fully functional networks across a range of architectures.

Abstract

Generative modeling of neural network parameters is often tied to architectures because standard parameter representations rely on known weight-matrix dimensions. Generation is further complicated by permutation symmetries that allow networks to model similar input-output functions while having widely different, unaligned parameterizations. In this work, we introduce Neural Network Diffusion Transformers (NNiTs), which generate weights in a width-agnostic manner by tokenizing weight matrices into patches and modeling them as locally structured fields. We establish that Graph HyperNetworks (GHNs) with a convolutional neural network (CNN) decoder structurally align the weight space, creating the local correlation necessary for patch-based processing. Focusing on MLPs, where permutation symmetry is especially apparent, NNiT generates fully functional networks across a range of architectures. Our approach jointly models discrete architecture tokens and continuous weight patches within a single sequence model. On ManiSkill3 robotics tasks, NNiT achieves >85% success on architecture topologies unseen during training, while baseline approaches fail to generalize.
Paper Structure (31 sections, 5 equations, 9 figures, 7 tables)

This paper contains 31 sections, 5 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Width-Agnostic Synthesis via Multimodal Tokenization. Unlike previous models, NNiT decouples functional logic from fixed matrix dimensions, allowing the zero-shot synthesis of optimal weights for architectural topologies entirely unseen during training.
  • Figure 2: NNiT Framework Overview.Left: Unified Generative Architecture. We formulate neural synthesis as a multimodal sequence task. Discrete architecture tokens (orange) and continuous weight matrices (blue) are unified into a single sequence, with weights processed as spatially correlated patches. A Diffusion Transformer (DiT) models the joint distribution using per-modality timestep conditioning ($\mu_a, \Sigma_a, \mu_w, \Sigma_w$) via the Mixture of Noise Levels (MoNL) framework, enabling both co-design $p(\mathbf{a},\mathbf{w})$ and conditional synthesis $p(\mathbf{w}|\mathbf{a})$. Right: Deployment Pipeline. During inference, sampled architecture tokens are decoded into layer widths. The generated weight tensors are then extracted to match these target dimensions, assembling a directly executable MLP.
  • Figure 3: Visualizing Structural Alignment and Induced Geometry. Comparison of weight magnitude profiles across 35 independent seeds. Top (GHN): The consistent alignment across the seeds demonstrates that the GHN 1) successfully spatially aligns the weight spaces, effectively resolving the permutation ambiguity inherent in neural networks. Furthermore, the visible structural banding indicates that the GHN 2) imposes meaningful geometric structure (spatial correlation), transforming independent parameters into a spatially aligned space. Bottom (SGD): In contrast, SGD weights exhibit unstructured noise due to arbitrary permutations, lacking both structural alignment and local spatial geometry.
  • Figure 4: Visualization of Topological Anchoring. Heatmaps of neuron-wise weight magnitudes for 3 selected architectures across 100 filtered seeds. The vertical banding visually confirms the induction of spatial correlation, validating the premise that these weights can be treated as continuous fields.
  • Figure 5: Dataset Diversity Analysis. Histograms of pairwise $L_2$ distances and Cosine Similarities across all three environments. The consistently high $L_2$ distances and low cosine similarities confirm that the structural alignment imposed by the GHN does not result in mode collapse.
  • ...and 4 more figures