Table of Contents
Fetching ...

HyperNet Fields: Efficiently Training Hypernetworks without Ground Truth by Learning Weight Trajectories

Eric Hedlin, Munawar Hayat, Fatih Porikli, Kwang Moo Yi, Shweta Mahajan

TL;DR

This work introduces Hypernetwork Fields, a gradient-supervised framework that learns the full training trajectory of a task network by conditioning a hypernetwork on the optimization step $t$ and input $\mathbf{x}$. By supervising the gradient along the convergence path rather than matching final converged weights, the method eliminates the need for per-sample ground-truth target weights and substantially reduces precomputation time. The authors demonstrate competitive results on personalized image generation ( DreamBooth ) and 3D shape reconstruction ( Occupancy Net ) with roughly 4× lower training cost and fast inference, validating the approach's generality across domains. The framework promises scalable hypernetwork applications by trading explicit target supervision for trajectory-aware gradient consistency, enabling efficient adaptation in large-scale and diverse settings.

Abstract

To efficiently adapt large models or to train generative models of neural representations, Hypernetworks have drawn interest. While hypernetworks work well, training them is cumbersome, and often requires ground truth optimized weights for each sample. However, obtaining each of these weights is a training problem of its own-one needs to train, e.g., adaptation weights or even an entire neural field for hypernetworks to regress to. In this work, we propose a method to train hypernetworks, without the need for any per-sample ground truth. Our key idea is to learn a Hypernetwork `Field` and estimate the entire trajectory of network weight training instead of simply its converged state. In other words, we introduce an additional input to the Hypernetwork, the convergence state, which then makes it act as a neural field that models the entire convergence pathway of a task network. A critical benefit in doing so is that the gradient of the estimated weights at any convergence state must then match the gradients of the original task -- this constraint alone is sufficient to train the Hypernetwork Field. We demonstrate the effectiveness of our method through the task of personalized image generation and 3D shape reconstruction from images and point clouds, demonstrating competitive results without any per-sample ground truth.

HyperNet Fields: Efficiently Training Hypernetworks without Ground Truth by Learning Weight Trajectories

TL;DR

This work introduces Hypernetwork Fields, a gradient-supervised framework that learns the full training trajectory of a task network by conditioning a hypernetwork on the optimization step and input . By supervising the gradient along the convergence path rather than matching final converged weights, the method eliminates the need for per-sample ground-truth target weights and substantially reduces precomputation time. The authors demonstrate competitive results on personalized image generation ( DreamBooth ) and 3D shape reconstruction ( Occupancy Net ) with roughly 4× lower training cost and fast inference, validating the approach's generality across domains. The framework promises scalable hypernetwork applications by trading explicit target supervision for trajectory-aware gradient consistency, enabling efficient adaptation in large-scale and diverse settings.

Abstract

To efficiently adapt large models or to train generative models of neural representations, Hypernetworks have drawn interest. While hypernetworks work well, training them is cumbersome, and often requires ground truth optimized weights for each sample. However, obtaining each of these weights is a training problem of its own-one needs to train, e.g., adaptation weights or even an entire neural field for hypernetworks to regress to. In this work, we propose a method to train hypernetworks, without the need for any per-sample ground truth. Our key idea is to learn a Hypernetwork `Field` and estimate the entire trajectory of network weight training instead of simply its converged state. In other words, we introduce an additional input to the Hypernetwork, the convergence state, which then makes it act as a neural field that models the entire convergence pathway of a task network. A critical benefit in doing so is that the gradient of the estimated weights at any convergence state must then match the gradients of the original task -- this constraint alone is sufficient to train the Hypernetwork Field. We demonstrate the effectiveness of our method through the task of personalized image generation and 3D shape reconstruction from images and point clouds, demonstrating competitive results without any per-sample ground truth.

Paper Structure

This paper contains 33 sections, 5 equations, 12 figures, 5 tables, 1 algorithm.

Figures (12)

  • Figure 1: Teaser -- We propose hypernetwork fields, where we learn the entire weight space trajectory instead of only the final state, which allows us to train without ever needing to know the final converged weights. Our method can be applied to any application of hypernetworks, including diffusion model personalization and modeling of 3D neural representations.
  • Figure 2: The Hypenet Fields Framework. Left: Our Hypernetwork with gradient-based supervision from a task-specific network -- dreambooth dreambooth. Right: The sampling process to generate personalized images with our hypernetwork framework.
  • Figure 3: Qualitative examples (CelebA HQ) -- Qualitative examples of personalized human face generation using the CelebA HQ dataset celebacelebahq are shown. Our hypernetwork field achieves fast adaptation, producing results comparable to DreamBooth dreambooth and Textual Inversion gal2022image while preserving individual features in a personalized manner. Additional examples are provided in the supplementary material.
  • Figure 4: Qualitative examples (AHFQ) -- Qualitative results of personalized animal image generation from the AFHQ dataset afhq demonstrate the adaptability of our hypernetwork field. Compared to DreamBooth dreambooth and Textual Inversion gal2022image, our method effectively captures the specific animal's characteristics while enabling rapid adaptation. Further examples are available in the supplementary material.
  • Figure 5: Fast training (AFHQ) -- We show example outcomes of our method only after 100 iterations of training, as raw output from our Hypernetwork without fast fine-tuning. As shown, even after only 100 iterations, our model is able to generate images that are visually very similar to the conditioning images.
  • ...and 7 more figures