Table of Contents
Fetching ...

Universal Hypernetworks for Arbitrary Models

Xuanfeng Zhou

Abstract

Conventional hypernetworks are typically engineered around a specific base-model parameterization, so changing the target architecture often entails redesigning the hypernetwork and retraining it from scratch. We introduce the \emph{Universal Hypernetwork} (UHN), a fixed-architecture generator that predicts weights from deterministic parameter, architecture, and task descriptors. This descriptor-based formulation decouples the generator architecture from target-network parameterization, so one generator can instantiate heterogeneous models across the tested architecture and task families. Our empirical claims are threefold: (1) one fixed UHN remains competitive with direct training across vision, graph, text, and formula-regression benchmarks; (2) the same UHN supports both multi-model generalization within a family and multi-task learning across heterogeneous models; and (3) UHN enables stable recursive generation with up to three intermediate generated UHNs before the final base model. Our code is available at https://github.com/Xuanfeng-Zhou/UHN.

Universal Hypernetworks for Arbitrary Models

Abstract

Conventional hypernetworks are typically engineered around a specific base-model parameterization, so changing the target architecture often entails redesigning the hypernetwork and retraining it from scratch. We introduce the \emph{Universal Hypernetwork} (UHN), a fixed-architecture generator that predicts weights from deterministic parameter, architecture, and task descriptors. This descriptor-based formulation decouples the generator architecture from target-network parameterization, so one generator can instantiate heterogeneous models across the tested architecture and task families. Our empirical claims are threefold: (1) one fixed UHN remains competitive with direct training across vision, graph, text, and formula-regression benchmarks; (2) the same UHN supports both multi-model generalization within a family and multi-task learning across heterogeneous models; and (3) UHN enables stable recursive generation with up to three intermediate generated UHNs before the final base model. Our code is available at https://github.com/Xuanfeng-Zhou/UHN.

Paper Structure

This paper contains 149 sections, 35 equations, 5 figures, 28 tables, 3 algorithms.

Figures (5)

  • Figure 1: UHN settings: (a) single-model, UHN $H$ generates a single base model $f$; (b) multi-model, UHN $H$ generates a set of base models $\{f_i\}$ within one model family; (c) multi-task, UHN $H$ generates base models $f, g, q, \ldots$ across potentially heterogeneous architectures for different tasks; and (d) recursive generation, in which the root UHN $H_0$ generates an intermediate UHN $H_1$, which then generates the base model $f$.
  • Figure 2: Method overview. We sample the task and architecture of the base model $f_{\mathbf{w}}$ and collect their task-structure descriptors. For each parameter in $f_{\mathbf{w}}$, we collect its index descriptor $\mathbf{v}_{i}$ and feed it into UHN $H_{\boldsymbol{\theta}}$ along with the task-structure descriptors to obtain each individual weight $w_{i}$ of $f$. These weights are then stacked into the weight vector $\mathbf{w}$, which parameterizes $f_{\mathbf{w}}$.
  • Figure 3: Model-family performance scatter plots.
  • Figure 4: Visualization of the third convolution-layer kernels of the generated base model.
  • Figure 5: Multi-model Transformer Mixed epoch-averaged training loss with and without the initialization stage ("w/ init" and "w/o init"; log-scale $y$-axis). Loss is averaged over all training samples within each epoch.