Table of Contents
Fetching ...

Diffusion Secant Alignment for Score-Based Density Ratio Estimation

Wei Chen, Shigui Li, Jiacheng Li, Jian Xu, Zhiqi Lin, Junmei Yang, Delu Zeng, John Paisley, Qibin Zhao

TL;DR

This work tackles density-ratio estimation under large distribution shifts by switching from high-variance tangent learning to interval-averaged secant learning along diffusion interpolants. The authors prove variance reduction and smoothness for the secant, and introduce the Secant Alignment Identity to enforce self-consistency with tangents, plus Contraction Interval Annealing to stabilize training. Empirically, ISA-DRE achieves robust, state-of-the-art performance with fewer function evaluations across density and mutual information estimation tasks, mitigating the density-chasm problem. The approach emphasizes stability-first learning, offering practical gains in efficiency and reliability for score-based density estimation in challenging settings.

Abstract

Estimating density ratios has become increasingly important with the recent rise of score-based and diffusion-inspired methods. However, current tangent-based approaches rely on a high-variance learning objective, which leads to unstable training and costly numerical integration during inference. We propose \textit{Interval-annealed Secant Alignment Density Ratio Estimation (ISA-DRE)}, a score-based framework along diffusion interpolants that replaces the instantaneous tangent with its interval integral, the secant, as the learning target. We show theoretically that the secant is a provably lower variance and smoother target for neural approximation, and also a strictly more general representation that contains the tangent as the infinitesimal limit. To make secant learning feasible, we introduce the \textit{Secant Alignment Identity (SAI)} to enforce self consistency between secant and tangent representations, and \textit{Contraction Interval Annealing (CIA)} to ensure stable convergence. Empirically, this stability-first formulation produces high efficiency and accuracy. ISA-DRE achieves comparable or superior results with fewer function evaluations, demonstrating robustness under large distribution discrepancies and effectively mitigating the density-chasm problem.

Diffusion Secant Alignment for Score-Based Density Ratio Estimation

TL;DR

This work tackles density-ratio estimation under large distribution shifts by switching from high-variance tangent learning to interval-averaged secant learning along diffusion interpolants. The authors prove variance reduction and smoothness for the secant, and introduce the Secant Alignment Identity to enforce self-consistency with tangents, plus Contraction Interval Annealing to stabilize training. Empirically, ISA-DRE achieves robust, state-of-the-art performance with fewer function evaluations across density and mutual information estimation tasks, mitigating the density-chasm problem. The approach emphasizes stability-first learning, offering practical gains in efficiency and reliability for score-based density estimation in challenging settings.

Abstract

Estimating density ratios has become increasingly important with the recent rise of score-based and diffusion-inspired methods. However, current tangent-based approaches rely on a high-variance learning objective, which leads to unstable training and costly numerical integration during inference. We propose \textit{Interval-annealed Secant Alignment Density Ratio Estimation (ISA-DRE)}, a score-based framework along diffusion interpolants that replaces the instantaneous tangent with its interval integral, the secant, as the learning target. We show theoretically that the secant is a provably lower variance and smoother target for neural approximation, and also a strictly more general representation that contains the tangent as the infinitesimal limit. To make secant learning feasible, we introduce the \textit{Secant Alignment Identity (SAI)} to enforce self consistency between secant and tangent representations, and \textit{Contraction Interval Annealing (CIA)} to ensure stable convergence. Empirically, this stability-first formulation produces high efficiency and accuracy. ISA-DRE achieves comparable or superior results with fewer function evaluations, demonstrating robustness under large distribution discrepancies and effectively mitigating the density-chasm problem.

Paper Structure

This paper contains 37 sections, 3 theorems, 43 equations, 8 figures, 1 table, 1 algorithm.

Key Result

Theorem 4.1

Let $l$ and $t$ be independent random variables with joint probability density $p(l,t)$ on $[0,1]^2$, conditioned on $l \leq t$. For a fixed data point $\boldsymbol{x} \sim p_1$, define the secant variable $U \triangleq u(\boldsymbol{x}, l, t)$ and tangent variable $S \triangleq s^{(t)}(\boldsymbol{ with equality $\mathrm{iff}$$S$ is constant for $p$-almost every $\tau \in [0,1]$.

Figures (8)

  • Figure 1: Secant-based density ratio estimation generalizes tangent-based methods. The curve represents the time score function $\partial_t \log p_t(\boldsymbol{x})$, whose integral over $t \in [t_0, t_5]$ gives the log density ratio $\log r(\boldsymbol{x})$. (a) The tangent-based method approximates this integral by estimating the instantaneous score $\partial_t \log p_t(\boldsymbol{x})$ at discrete points and summing Riemann rectangles (blue), incurring numerical error (red hatched regions). (b) In contrast, the secant-based method directly predicts the exact integral over each sub-interval (orange shaded areas), eliminating discretization error and enabling accurate few-step inference. (c) By the mean value theorem for integral, the secant integral over $[l,t]$ equals the tangent evaluation at some $\xi \in [l,t]$; thus, the tangent method corresponds precisely to a secant method constrained to infinitesimal intervals. This establishes the tangent-based approach as a limiting case of our general secant framework.
  • Figure 2: Mutual information estimation on the $\mathsf{AdditiveNoise}$ dataset with CIA and fixed tangent ratios. The tangent ratio denotes the proportion of samples with $l = t$, corresponding to tangent-only ($100\%$) or secant-only ($0\%$) supervision (see \ref{['section:practical-choices']}). Shaded areas show "std" across samples. CIA ensures stable and consistent convergence.
  • Figure 3: Comparison of the learned secant function $u_{\boldsymbol{\theta}}(\boldsymbol{x}, 0, t)$ (left) and tangent function $u_{\boldsymbol{\theta}}(\boldsymbol{x}, t, t)$ (right). Each orange curve shows $u$ over time $t$ for a fixed $\boldsymbol{x}$. The secant curves are smoother and more concentrated.
  • Figure 4: Qualitative density estimates on structured multi-modal data. Baselines (e.g., NCE) blur discontinuities and merge modes. D$^3$RE yields noisy estimates. ISA-DRE preserves structural fidelity and accurately captures density chasms.
  • Figure 5: Density estimation (NLL, lower better) on five non-Gaussian tabular datasets (POWER, GAS, HEPMASS, MINIBOONE, BSDS300) across $\text{NFE}\in\{2,5,10,50\}$). Shown: DRE-$\infty$, D$^3$RE, and ISA-DRE (ours). Error bars: std. over $3$ runs. ISA-DRE consistently achieves the lowest NLL.
  • ...and 3 more figures

Theorems & Definitions (7)

  • Theorem 4.1: Low-Variance Secant Target
  • Remark 4.2
  • Proposition 4.2: Smoothness of the Secant Function
  • Proposition 4.2
  • proof
  • proof
  • proof