Table of Contents
Fetching ...

Beyond Linear Diffusions: Improved Representations for Rare Conditional Generative Modeling

Kulunu Dharmakeerthi, Yousef El-Laham, Henry H. Wong, Vamsi K. Potluru, Changhong He, Taosong He

TL;DR

The paper tackles modeling $P(Y|X=x)$ under rare conditioning, where standard linear Gaussian diffusions struggle due to limited tail data. It introduces a tail-adaptive diffusion framework grounded in conditional extreme value theory, transforming $(X,Y)$ to $(X^{\star},Z)$ so that $P(Z|X^{\star}=x)\approx G$ in tails and selecting a forward diffusion with equilibrium $e^{-g}$. A time-dependent conditional score model $s_\theta(z;x,t)$ is trained to approximate $\nabla \log p_{\mu_t(\cdot|x)}(z) - \nabla g(z)$, enabling sampling via a reversed SDE and inversion of transformations to recover $Y$. Experiments on two synthetic tasks and stock-return data conditioned on the VIX show superior tail capture compared to standard diffusion with a Gaussian base, validating improved rare-event conditioning. The work offers a practical path to more accurate conditional generation in tail regions and paves the way for learning data-driven transformations and scaling to high-dimensional conditioning.

Abstract

Diffusion models have emerged as powerful generative frameworks with widespread applications across machine learning and artificial intelligence systems. While current research has predominantly focused on linear diffusions, these approaches can face significant challenges when modeling a conditional distribution, $P(Y|X=x)$, when $P(X=x)$ is small. In these regions, few samples, if any, are available for training, thus modeling the corresponding conditional density may be difficult. Recognizing this, we show it is possible to adapt the data representation and forward scheme so that the sample complexity of learning a score-based generative model is small in low probability regions of the conditioning space. Drawing inspiration from conditional extreme value theory we characterize this method precisely in the special case in the tail regions of the conditioning variable, $X$. We show how diffusion with a data-driven choice of nonlinear drift term is best suited to model tail events under an appropriate representation of the data. Through empirical validation on two synthetic datasets and a real-world financial dataset, we demonstrate that our tail-adaptive approach significantly outperforms standard diffusion models in accurately capturing response distributions at the extreme tail conditions.

Beyond Linear Diffusions: Improved Representations for Rare Conditional Generative Modeling

TL;DR

The paper tackles modeling under rare conditioning, where standard linear Gaussian diffusions struggle due to limited tail data. It introduces a tail-adaptive diffusion framework grounded in conditional extreme value theory, transforming to so that in tails and selecting a forward diffusion with equilibrium . A time-dependent conditional score model is trained to approximate , enabling sampling via a reversed SDE and inversion of transformations to recover . Experiments on two synthetic tasks and stock-return data conditioned on the VIX show superior tail capture compared to standard diffusion with a Gaussian base, validating improved rare-event conditioning. The work offers a practical path to more accurate conditional generation in tail regions and paves the way for learning data-driven transformations and scaling to high-dimensional conditioning.

Abstract

Diffusion models have emerged as powerful generative frameworks with widespread applications across machine learning and artificial intelligence systems. While current research has predominantly focused on linear diffusions, these approaches can face significant challenges when modeling a conditional distribution, , when is small. In these regions, few samples, if any, are available for training, thus modeling the corresponding conditional density may be difficult. Recognizing this, we show it is possible to adapt the data representation and forward scheme so that the sample complexity of learning a score-based generative model is small in low probability regions of the conditioning space. Drawing inspiration from conditional extreme value theory we characterize this method precisely in the special case in the tail regions of the conditioning variable, . We show how diffusion with a data-driven choice of nonlinear drift term is best suited to model tail events under an appropriate representation of the data. Through empirical validation on two synthetic datasets and a real-world financial dataset, we demonstrate that our tail-adaptive approach significantly outperforms standard diffusion models in accurately capturing response distributions at the extreme tail conditions.

Paper Structure

This paper contains 39 sections, 3 theorems, 50 equations, 17 figures, 4 algorithms.

Key Result

Theorem 1

Denote by $p(y)$ the target density. Let $\{Y_t\}_{t \in [0,T]}$ be the stochastic process defined by the SDE in eqn:langevin, where $Y_0 \sim p$ and $Y_t \sim p_t$. Suppose $\pi(y)$ is the stationary density of this SDE as $T \to \infty$. Let $\hat{Y}^\leftarrow_0 \sim p_{\theta}(y)$ be the result Under some regularity conditions (see Appendix A song2021maximum),

Figures (17)

  • Figure 1: We visualize a forward diffusion before and after the transformation outlined in Section 3.2. Before transformation, the Langevin diffusion induces quite dramatic changes in the conditional density at tail events ($\{X=x\}, \ x$ very large). This can be seen by looking at the blue particle paths (top left) or the evolving density, $p_{\mu_t(\cdot|x)}(y)$, visualized in the top right plot. After taking the steps outlined in Section 3.2, the tail conditional density does not change dramatically in the forward diffusion. Compare the new particle paths in blue (bottom left plot), or the new conditional densities at time $t$ (bottom right plot). For tail, low-probability conditions, after transformation, the conditional density is already (nearly) at stationarity. Details can be found in Appendix A.2
  • Figure 2: In each subfigure, the left plot shows the standard diffusion with Gaussian base distribution, and the right plot shows our proposed method with a standard Laplace base distribution for the mean-shift example (no transformation) and a Gumbel base distribution for the multivariate Gaussian example (with learned CEVT transformation).
  • Figure 3: QQ plots on test datasets for COVID period for various technology stocks.
  • Figure 4: Performance comparison of Gaussian versus Laplace base distributions based on different values of VIX level for the GFC regime.
  • Figure 5: Top row: Before transformation. Bottom row: After transformation.
  • ...and 12 more figures

Theorems & Definitions (4)

  • Theorem 1
  • proof
  • Theorem 2
  • Theorem 3: Convergence of ULA chewi2023log