Table of Contents
Fetching ...

Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models

Luca Eyring, Shyamgopal Karthik, Alexey Dosovitskiy, Nataniel Ruiz, Zeynep Akata

TL;DR

The paper addresses the inefficiency of test-time optimization in diffusion-model generation by introducing NoiseHypernetworks, which learn an optimized initial noise distribution to steer a fixed, distilled generator toward a reward-tilted output. It provides a theoretical foundation in noise space, deriving a tractable KL-based objective that reduces to an $L_2$ penalty on the noise modification and a reward term, enabling amortized optimization via a LoRA-based adapter. Empirically, the method yields substantial quality gains on redness and human-preference rewards across multiple distilled models (SD-Turbo, SANA-Sprint, FLUX-Schnell) with dramatically lower inference cost than traditional test-time approaches. The results demonstrate robust improvements across prompts and model scales, validating the approach as a practical, efficient alternative to per-sample optimization for reward-aligned diffusion. Limitations include dependence on meaningful reward signals and the need for further exploration of reward-model design and broader domain applicability.

Abstract

The new paradigm of test-time scaling has yielded remarkable breakthroughs in Large Language Models (LLMs) (e.g. reasoning models) and in generative vision models, allowing models to allocate additional computation during inference to effectively tackle increasingly complex problems. Despite the improvements of this approach, an important limitation emerges: the substantial increase in computation time makes the process slow and impractical for many applications. Given the success of this paradigm and its growing usage, we seek to preserve its benefits while eschewing the inference overhead. In this work we propose one solution to the critical problem of integrating test-time scaling knowledge into a model during post-training. Specifically, we replace reward guided test-time noise optimization in diffusion models with a Noise Hypernetwork that modulates initial input noise. We propose a theoretically grounded framework for learning this reward-tilted distribution for distilled generators, through a tractable noise-space objective that maintains fidelity to the base model while optimizing for desired characteristics. We show that our approach recovers a substantial portion of the quality gains from explicit test-time optimization at a fraction of the computational cost. Code is available at https://github.com/ExplainableML/HyperNoise

Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models

TL;DR

The paper addresses the inefficiency of test-time optimization in diffusion-model generation by introducing NoiseHypernetworks, which learn an optimized initial noise distribution to steer a fixed, distilled generator toward a reward-tilted output. It provides a theoretical foundation in noise space, deriving a tractable KL-based objective that reduces to an penalty on the noise modification and a reward term, enabling amortized optimization via a LoRA-based adapter. Empirically, the method yields substantial quality gains on redness and human-preference rewards across multiple distilled models (SD-Turbo, SANA-Sprint, FLUX-Schnell) with dramatically lower inference cost than traditional test-time approaches. The results demonstrate robust improvements across prompts and model scales, validating the approach as a practical, efficient alternative to per-sample optimization for reward-aligned diffusion. Limitations include dependence on meaningful reward signals and the need for further exploration of reward-model design and broader domain applicability.

Abstract

The new paradigm of test-time scaling has yielded remarkable breakthroughs in Large Language Models (LLMs) (e.g. reasoning models) and in generative vision models, allowing models to allocate additional computation during inference to effectively tackle increasingly complex problems. Despite the improvements of this approach, an important limitation emerges: the substantial increase in computation time makes the process slow and impractical for many applications. Given the success of this paradigm and its growing usage, we seek to preserve its benefits while eschewing the inference overhead. In this work we propose one solution to the critical problem of integrating test-time scaling knowledge into a model during post-training. Specifically, we replace reward guided test-time noise optimization in diffusion models with a Noise Hypernetwork that modulates initial input noise. We propose a theoretically grounded framework for learning this reward-tilted distribution for distilled generators, through a tractable noise-space objective that maintains fidelity to the base model while optimizing for desired characteristics. We show that our approach recovers a substantial portion of the quality gains from explicit test-time optimization at a fraction of the computational cost. Code is available at https://github.com/ExplainableML/HyperNoise

Paper Structure

This paper contains 59 sections, 7 theorems, 59 equations, 9 figures, 10 tables, 1 algorithm.

Key Result

Theorem 1

Let $A = J_{f_\phi}(\mathbf{x}_0)$ be the $d \times d$ Jacobian matrix of $f_\phi(\mathbf{x}_0)$. Assume $f_\phi$ is $L$-Lipschitz continuous, such that its Lipschitz constant $L < 1$. This implies that the spectral radius $\rho(A) \le L < 1$. Then, the error term $\mathcal{E}(A) = \mathop{\mathrm{T

Figures (9)

  • Figure 1: The same initial random noise is used for the base generation and the initialization of noise hypernetwork. HyperNoise significantly improves upon the initially generated image with respect to both prompt faithfulness and aesthetic quality for both SANA-Sprint and FLUX-Schnell.
  • Figure 2: Illustration of our proposed HyperNoise approach. During training, the LoRA parameters are trained to predict improved noises and are optimized by reward maximization subject to KL regularization. During inference, the noise hypernetwork directly predicts the improved noise initialization which is used for the final generation.
  • Figure 3: An illustrative example of optimizing for learning the tilted distribution with an image redness reward. We show direct LoRA fine-tuning of SANA-Sprint sanasprint in comparison to training a noise hypernetwork with our proposed objective. Notably, when training with our objective, the model optimizes the desired reward while staying considerably closer to $p^{\text{base}}$, as showcased by the model not diverging from the image manifold, unlike in direct LoRA fine-tuning.
  • Figure 4: Trade-off between the redness reward objective and an image quality metric, ImageReward, for direct fine-tuning and Noise Hypernetworks. As opposed to direct fine-tuning, our proposed method optimizes the redness objective while not significantly dropping image quality as indicated by the ImageReward score.
  • Figure 5: Qualitative comparison our proposed noise hypernetwork with popular distilled models such as Flux-Schnell, SD3.5-Turbo, SANA-Sprint for 4-step generation. Both SANA-Sprint and FLUX-Schnell share the initial noise for the base and HyperNoise generation.
  • ...and 4 more figures

Theorems & Definitions (14)

  • Theorem 1: Bound on Log-Determinant Approximation Error
  • Definition 1: Reward-Tilted Output Distribution
  • Proposition 2: KL Objective for Generator Fine-tuning
  • proof
  • Definition 2: Tilted Noise Distribution
  • Theorem 3: Properties of the Tilted Noise Distribution
  • proof
  • Proposition 4: KL Objective for Learning Tilted Noise Density
  • proof
  • Lemma 5: Lipschitz Condition for Invertibility
  • ...and 4 more