Table of Contents
Fetching ...

Training-Free Generative Modeling via Kernelized Stochastic Interpolants

Florentin Coeurdoux, Etienne Lempereur, Nathanaël Cuvelle-Magar, Thomas Eboli, Stéphane Mallat, Anastasia Borovykh, Eric Vanden-Eijnden

TL;DR

The paper introduces training-free generative modeling by kernelizing stochastic interpolants: a finite-dimensional drift regression $\hat b_t(x)=\nabla\phi(x)^\top\eta_t$ is learned via a $P\times P$ linear system, where $P$ is independent of data dimension $d$. The diffusion schedule is optimally chosen as $D_t^* = \alpha_t\gamma_t/\beta_t$ to minimize a path KL bound, yielding a drift that effectively preserves transport while controlling estimation error. An integrator handles endpoint divergences ($D_0^* = \infty$, $D_1^* = 0$), and the framework accommodates diverse feature maps, including scattering spectra and pretrained velocity fields, enabling training-free generation and cross-model combination. Applications span financial time series, turbulence, and high-resolution image generation, with ensemble demonstrations showing that combining weak models via the linear system can surpass individual weak learners. This approach offers a scalable, training-free path to powerful generative modeling and model fusion, complementary to moment-guided diffusion methods.

Abstract

We develop a kernel method for generative modeling within the stochastic interpolant framework, replacing neural network training with linear systems. The drift of the generative SDE is $\hat b_t(x) = \nablaφ(x)^\topη_t$, where $η_t\in\R^P$ solves a $P\times P$ system computable from data, with $P$ independent of the data dimension $d$. Since estimates are inexact, the diffusion coefficient $D_t$ affects sample quality; the optimal $D_t^*$ from Girsanov diverges at $t=0$, but this poses no difficulty and we develop an integrator that handles it seamlessly. The framework accommodates diverse feature maps -- scattering transforms, pretrained generative models etc. -- enabling training-free generation and model combination. We demonstrate the approach on financial time series, turbulence, and image generation.

Training-Free Generative Modeling via Kernelized Stochastic Interpolants

TL;DR

The paper introduces training-free generative modeling by kernelizing stochastic interpolants: a finite-dimensional drift regression is learned via a linear system, where is independent of data dimension . The diffusion schedule is optimally chosen as to minimize a path KL bound, yielding a drift that effectively preserves transport while controlling estimation error. An integrator handles endpoint divergences (, ), and the framework accommodates diverse feature maps, including scattering spectra and pretrained velocity fields, enabling training-free generation and cross-model combination. Applications span financial time series, turbulence, and high-resolution image generation, with ensemble demonstrations showing that combining weak models via the linear system can surpass individual weak learners. This approach offers a scalable, training-free path to powerful generative modeling and model fusion, complementary to moment-guided diffusion methods.

Abstract

We develop a kernel method for generative modeling within the stochastic interpolant framework, replacing neural network training with linear systems. The drift of the generative SDE is , where solves a system computable from data, with independent of the data dimension . Since estimates are inexact, the diffusion coefficient affects sample quality; the optimal from Girsanov diverges at , but this poses no difficulty and we develop an integrator that handles it seamlessly. The framework accommodates diverse feature maps -- scattering transforms, pretrained generative models etc. -- enabling training-free generation and model combination. We demonstrate the approach on financial time series, turbulence, and image generation.
Paper Structure (33 sections, 3 theorems, 27 equations, 5 figures, 1 algorithm)

This paper contains 33 sections, 3 theorems, 27 equations, 5 figures, 1 algorithm.

Key Result

Proposition 2.1

Define the Gram matrix of the feature gradients $\{\nabla\phi_i\}_{i=1}^P$ under the law of $I_t$: Assume $K_t$ is positive-definite. Then the minimizer of $L_b$ over $\hat{b}_t(x) = \nabla\phi(x)^\top\eta_t$ is

Figures (5)

  • Figure 1: Generation from a single S&P 500 daily log-returns realization ($d = 6{,}064$) using scattering features ($P=217$). Top: Original (blue) and generated sample (orange), both showing volatility clustering. Bottom left: Log-return densities, with near-perfect agreement including heavy tails. Bottom right: Leverage effect (correlation between past returns and current volatility).
  • Figure 2: Generation of two-dimensional physical fields using scattering-transform features. Top row: ground-truth samples from each dataset. Bottom row: samples generated by Algorithm \ref{['alg:generation']} with $K=5{,}000$ integration steps. From left to right: 3d turbulence (pressure), dark matter (log-density), 3D magnetic turbulence (vorticity), and weak lensing (convergence).
  • Figure 3: Left: MNIST samples. Rows 1--2: individual models (50 and 100 SGD steps). Row 3: kernelized ensemble ($P=20$, 100-step cohort). Right: Oracle log-likelihood vs. ensemble size $P$. Blue: 50-step; orange: 100-step. Error bars: $\pm 1$ std over 5 subsets.
  • Figure 4: CelebA generation ($128\times 128$). Top: samples from an individual weak model (5 epochs of training). Bottom: samples from the kernelized ensemble of $P=25$ weak models via Algorithm \ref{['alg:generation']}. The ensemble produces coherent face images despite each constituent model being severely under-trained.
  • Figure 5: Cross-domain model composition for MNIST generation. Rows 1--4: samples from individual weak models trained on Kuzushiji-MNIST, Fashion-MNIST, EMNIST (letters), and MNIST, respectively. Row 5: samples generated by composing all 40 source- and target-domain models via Algorithm \ref{['alg:generation']}, using MNIST data to solve the linear system. The cross-domain ensemble produces sharper digits than the MNIST-only models (row 4).

Theorems & Definitions (7)

  • Proposition 2.1: Drift estimation
  • proof
  • Proposition 2.2: Optimal diffusion coefficient ma_sit_2024chen2024probabilistic
  • proof
  • Remark 2.3: Behavior at the endpoints
  • Theorem A.1: Exact recovery under characteristic kernels
  • proof