Diffusion Secant Alignment for Score-Based Density Ratio Estimation
Wei Chen, Shigui Li, Jiacheng Li, Jian Xu, Zhiqi Lin, Junmei Yang, Delu Zeng, John Paisley, Qibin Zhao
TL;DR
This work tackles density-ratio estimation under large distribution shifts by switching from high-variance tangent learning to interval-averaged secant learning along diffusion interpolants. The authors prove variance reduction and smoothness for the secant, and introduce the Secant Alignment Identity to enforce self-consistency with tangents, plus Contraction Interval Annealing to stabilize training. Empirically, ISA-DRE achieves robust, state-of-the-art performance with fewer function evaluations across density and mutual information estimation tasks, mitigating the density-chasm problem. The approach emphasizes stability-first learning, offering practical gains in efficiency and reliability for score-based density estimation in challenging settings.
Abstract
Estimating density ratios has become increasingly important with the recent rise of score-based and diffusion-inspired methods. However, current tangent-based approaches rely on a high-variance learning objective, which leads to unstable training and costly numerical integration during inference. We propose \textit{Interval-annealed Secant Alignment Density Ratio Estimation (ISA-DRE)}, a score-based framework along diffusion interpolants that replaces the instantaneous tangent with its interval integral, the secant, as the learning target. We show theoretically that the secant is a provably lower variance and smoother target for neural approximation, and also a strictly more general representation that contains the tangent as the infinitesimal limit. To make secant learning feasible, we introduce the \textit{Secant Alignment Identity (SAI)} to enforce self consistency between secant and tangent representations, and \textit{Contraction Interval Annealing (CIA)} to ensure stable convergence. Empirically, this stability-first formulation produces high efficiency and accuracy. ISA-DRE achieves comparable or superior results with fewer function evaluations, demonstrating robustness under large distribution discrepancies and effectively mitigating the density-chasm problem.
