Reducing Estimation Uncertainty Using Normalizing Flows and Stratification
Paweł Lorek, Rafał Topolnicki, Tomasz Trzciński, Maciej Zięba, Aleksandra Krystecka
TL;DR
The paper tackles estimating $I=\mathbb{E}[f(\mathbf{X})]$ when the distribution of $\mathbf{X}$ is unknown and only samples are available. It introduces a flow-based density model that maps a Gaussian base to complex data distributions and leverages stratified sampling in the latent space to reduce estimation variance. Two stratification schemes, Cartesian (M1) and spherical (M2), plus high-dimensional approximations (M_rad, High3, Rand3), are developed, with an optimal-allocation scheme to further minimize variance. Empirical results on synthetic and real data, including high-dimensional cases up to $d=128$, show substantial improvements over Crude Monte Carlo and Gaussian mixture models, with training requirements on the order of hundreds to a few thousand samples; the authors provide reproducible code. This work offers a practical and scalable approach to variance reduction in complex, unknown distributions and demonstrates its applicability to high-dimensional estimation tasks.
Abstract
Estimating the expectation of a real-valued function of a random variable from sample data is a critical aspect of statistical analysis, with far-reaching implications in various applications. Current methodologies typically assume (semi-)parametric distributions such as Gaussian or mixed Gaussian, leading to significant estimation uncertainty if these assumptions do not hold. We propose a flow-based model, integrated with stratified sampling, that leverages a parametrized neural network to offer greater flexibility in modeling unknown data distributions, thereby mitigating this limitation. Our model shows a marked reduction in estimation uncertainty across multiple datasets, including high-dimensional (30 and 128) ones, outperforming crude Monte Carlo estimators and Gaussian mixture models. Reproducible code is available at https://github.com/rnoxy/flowstrat.
