On Conditional Stochastic Interpolation for Generative Nonlinear Sufficient Dimension Reduction
Shuntuo Xu, Zhou Yu, Jian Huang
TL;DR
The paper tackles nonlinear SDR by reframing the problem through conditional stochastic interpolation and a flow-based generative model. GenSDR learns a low-dimensional sufficient transformation via a joint optimization over a velocity-field predictor and a representation map, achieving exhaustiveness at the population level and distributional consistency at the sample level. It further extends to non-Euclidean responses using an ensemble-based approach and demonstrates strong empirical performance across synthetic Euclidean and SPD settings, as well as a real-world STL-10 case study. Together, these results establish GenSDR as a theoretically sound and practically effective framework for extracting comprehensive, low-dimensional structure in complex regression problems.
Abstract
Identifying low-dimensional sufficient structures in nonlinear sufficient dimension reduction (SDR) has long been a fundamental yet challenging problem. Most existing methods lack theoretical guarantees of exhaustiveness in identifying lower dimensional structures, either at the population level or at the sample level. We tackle this issue by proposing a new method, generative sufficient dimension reduction (GenSDR), which leverages modern generative models. We show that GenSDR is able to fully recover the information contained in the central $σ$-field at both the population and sample levels. In particular, at the sample level, we establish a consistency property for the GenSDR estimator from the perspective of conditional distributions, capitalizing on the distributional learning capabilities of deep generative models. Moreover, by incorporating an ensemble technique, we extend GenSDR to accommodate scenarios with non-Euclidean responses, thereby substantially broadening its applicability. Extensive numerical results demonstrate the outstanding empirical performance of GenSDR and highlight its strong potential for addressing a wide range of complex, real-world tasks.
