Table of Contents
Fetching ...

Deep Parameter Interpolation for Scalar Conditioning

Chicago Y. Park, Michael T. McCann, Cristina Garcia-Cardona, Brendt Wohlberg, Ulugbek S. Kamilov

TL;DR

The paper tackles scalar conditioning in diffusion and flow matching models by introducing Deep Parameter Interpolation (DPI), which conditions a single network on a scalar by interpolating between two learnable parameter sets per layer using a learnable monotonic function $\lambda(s)$. The method preserves the original architecture and adds only lightweight interpolation, enabling expressive conditioning across time or noise level without specialized conditioning modules. Empirical results on DRUNet and ADM demonstrate improved denoising accuracy and unconditional image generation quality in both diffusion and flow matching frameworks, with minimal computational overhead compared to embedding-based or input-level conditioning. DPI offers a flexible, efficient conditioning mechanism with broad applicability to scalar-conditioned generative modeling.

Abstract

We propose deep parameter interpolation (DPI), a general-purpose method for transforming an existing deep neural network architecture into one that accepts an additional scalar input. Recent deep generative models, including diffusion models and flow matching, employ a single neural network to learn a time- or noise level-dependent vector field. Designing a network architecture to accurately represent this vector field is challenging because the network must integrate information from two different sources: a high-dimensional vector (usually an image) and a scalar. Common approaches either encode the scalar as an additional image input or combine scalar and vector information in specific network components, which restricts architecture choices. Instead, we propose to maintain two learnable parameter sets within a single network and to introduce the scalar dependency by dynamically interpolating between the parameter sets based on the scalar value during training and sampling. DPI is a simple, architecture-agnostic method for adding scalar dependence to a neural network. We demonstrate that our method improves denoising performance and enhances sample quality for both diffusion and flow matching models, while achieving computational efficiency comparable to standard scalar conditioning techniques. Code is available at https://github.com/wustl-cig/parameter_interpolation.

Deep Parameter Interpolation for Scalar Conditioning

TL;DR

The paper tackles scalar conditioning in diffusion and flow matching models by introducing Deep Parameter Interpolation (DPI), which conditions a single network on a scalar by interpolating between two learnable parameter sets per layer using a learnable monotonic function . The method preserves the original architecture and adds only lightweight interpolation, enabling expressive conditioning across time or noise level without specialized conditioning modules. Empirical results on DRUNet and ADM demonstrate improved denoising accuracy and unconditional image generation quality in both diffusion and flow matching frameworks, with minimal computational overhead compared to embedding-based or input-level conditioning. DPI offers a flexible, efficient conditioning mechanism with broad applicability to scalar-conditioned generative modeling.

Abstract

We propose deep parameter interpolation (DPI), a general-purpose method for transforming an existing deep neural network architecture into one that accepts an additional scalar input. Recent deep generative models, including diffusion models and flow matching, employ a single neural network to learn a time- or noise level-dependent vector field. Designing a network architecture to accurately represent this vector field is challenging because the network must integrate information from two different sources: a high-dimensional vector (usually an image) and a scalar. Common approaches either encode the scalar as an additional image input or combine scalar and vector information in specific network components, which restricts architecture choices. Instead, we propose to maintain two learnable parameter sets within a single network and to introduce the scalar dependency by dynamically interpolating between the parameter sets based on the scalar value during training and sampling. DPI is a simple, architecture-agnostic method for adding scalar dependence to a neural network. We demonstrate that our method improves denoising performance and enhances sample quality for both diffusion and flow matching models, while achieving computational efficiency comparable to standard scalar conditioning techniques. Code is available at https://github.com/wustl-cig/parameter_interpolation.

Paper Structure

This paper contains 16 sections, 18 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Comparison of scalar conditioning mechanisms for diffusion and flow matching models. Each method integrates a scalar variable $s$ (e.g., time or noise level) into the network, where learnable modules at layer $\ell$ are parameterized by $\bm{\theta}_{\ell}$. (a) Embedding-based conditioning injects a sinusoidal embedding of $s$ through an MLP to modulate features; effective but architecturally constrained, as it requires conditioning modules (e.g., normalization layers). (b) Input-level conditioning concatenates a constant-valued scalar map with the input tensor; simple and architecture-agnostic but less expressive. (c) Deep parameter interpolation --- (ours) maintains two learnable parameter sets, $\bm{\theta}^0$ and $\bm{\theta}^1$, and introduces scalar dependency at the parameter level by interpolating between them based on the scalar value $s$, where a learnable monotonic function $\lambda(s) \in [0,1]$ controls the interpolation to enable smooth adaptation across scalar values.
  • Figure 2: Examples of FFHQ $64 \times 64$ generated samples by DRUNet zhang2021dpir and ADM dhariwal2021beat using diffusion SDE and flow ODE solvers with our deep parameter interpolation (DPI). FID scores are reported in parentheses.
  • Figure 3: Learned interpolation functions $\lambda$ for DRUNet and ADM under both diffusion and flow matching frameworks. The monotonic function $\lambda$ in Section \ref{['sec:learnable_interpolation_function']} defines how the model interpolates two parameter sets within a single network as the scalar variable (time or noise level) progresses. Differences in $\lambda$ shapes indicate architecture- and framework-specific adaptation behavior.
  • Figure 4: Denoising performance across diffusion noise levels for different scalar conditioning methods using DRUNet and ADM. The plot shows the noise-scaled mean squared error (MSE), $\frac{\sigma^2_t}{N}\sum^N_{i=1} \| \bm{\epsilon}_{\theta}(\bm{x}_t^{(i)} ; \sigma_t) - \bm{\epsilon}^{(i)} \|^2_2$, where $\sigma_t^{2}$ denotes the noise variance at each diffusion step and $N$ the number of test samples. Our method consistently achieves lower error across a wide range of noise levels.
  • Figure 5: Examples of FFHQ $64 \times 64$ generated samples by ADM dhariwal2021beat architecture using DDIM sampler song2021ddim with five different scalar conditioning methods.