CINDES: Classification induced neural density estimator and simulator
Dehao Dai, Jianqing Fan, Yihong Gu, Debarghya Mukherjee
TL;DR
This work addresses high-dimensional density estimation by exploiting unknown low-dimensional structure through a neural, structure-agnostic approach. It introduces Classification induced neural density estimator and simulator (CINDES), which reframes density estimation as a classification problem and couples explicit density estimation with score-based diffusion for efficient implicit sampling. Theoretical guarantees show non-asymptotic convergence bounds and minimax-like rates under low-dimensional factorizable and hierarchical composition structures, with explicit guidance on hyperparameters. Empirical results from extensive simulations and a real data application demonstrate superior performance of CINDES for both unconditional and conditional density estimation and for generating high-quality samples.
Abstract
Neural network-based methods for (un)conditional density estimation have recently gained substantial attention, as various neural density estimators have outperformed classical approaches in real-data experiments. Despite these empirical successes, implementation can be challenging due to the need to ensure non-negativity and unit-mass constraints, and theoretical understanding remains limited. In particular, it is unclear whether such estimators can adaptively achieve faster convergence rates when the underlying density exhibits a low-dimensional structure. This paper addresses these gaps by proposing a structure-agnostic neural density estimator that is (i) straightforward to implement and (ii) provably adaptive, attaining faster rates when the true density admits a low-dimensional composition structure. Another key contribution of our work is to show that the proposed estimator integrates naturally into generative sampling pipelines, most notably score-based diffusion models, where it achieves provably faster convergence when the underlying density is structured. We validate its performance through extensive simulations and a real-data application.
