Table of Contents
Fetching ...

Enabling stratified sampling in high dimensions via nonlinear dimensionality reduction

Gianluca Geraci, Daniele E. Schiavazzi, Andrea Zanoni

TL;DR

This work proposes a simple methodology for constructing an effective stratification of the input domain that is adapted to the model response and shows that this approach is effective in high dimensions and can be used to further reduce the variance of multifidelity Monte Carlo estimators.

Abstract

We consider the problem of propagating the uncertainty from a possibly large number of random inputs through a computationally expensive model. Stratified sampling is a well-known variance reduction strategy, but its application, thus far, has focused on models with a limited number of inputs due to the challenges of creating uniform partitions in high dimensions. To overcome these challenges, we propose a simple methodology for constructing an effective stratification of the input domain that is adapted to the model response. Our approach leverages neural active manifolds, a recently introduced nonlinear dimensionality reduction technique based on neural networks that identifies a one-dimensional manifold capturing most of the model variability. The resulting one-dimensional latent space is mapped to the unit interval, where stratification is performed with respect to the uniform distribution. The corresponding strata in the original input space are then recovered through the neural active manifold, generating partitions that tend to follow the level sets of the model. We show that our approach is effective in high dimensions and can be used to further reduce the variance of multifidelity Monte Carlo estimators.

Enabling stratified sampling in high dimensions via nonlinear dimensionality reduction

TL;DR

This work proposes a simple methodology for constructing an effective stratification of the input domain that is adapted to the model response and shows that this approach is effective in high dimensions and can be used to further reduce the variance of multifidelity Monte Carlo estimators.

Abstract

We consider the problem of propagating the uncertainty from a possibly large number of random inputs through a computationally expensive model. Stratified sampling is a well-known variance reduction strategy, but its application, thus far, has focused on models with a limited number of inputs due to the challenges of creating uniform partitions in high dimensions. To overcome these challenges, we propose a simple methodology for constructing an effective stratification of the input domain that is adapted to the model response. Our approach leverages neural active manifolds, a recently introduced nonlinear dimensionality reduction technique based on neural networks that identifies a one-dimensional manifold capturing most of the model variability. The resulting one-dimensional latent space is mapped to the unit interval, where stratification is performed with respect to the uniform distribution. The corresponding strata in the original input space are then recovered through the neural active manifold, generating partitions that tend to follow the level sets of the model. We show that our approach is effective in high dimensions and can be used to further reduce the variance of multifidelity Monte Carlo estimators.

Paper Structure

This paper contains 17 sections, 4 theorems, 79 equations, 10 figures, 1 algorithm.

Key Result

Lemma 2.1

Let $\{ D_s \}_{s=1}^S$ be defined as in equation eq:Ds_def. Then where $\lambda$ denotes the one-dimensional Lebesgue measure.

Figures (10)

  • Figure 1: Comparison between the NeurAM-based stratification (left) and the standard stratification made with a regular grid (right), for the simple linear model in \ref{['ex:example']}.
  • Figure 2: Contour plot of the model $\mathcal{Q}_0$ which is used as a test case for the numerical experiments in \ref{['sec:num_param', 'sec:num_heuristic', 'sec:num_AS', 'sec:num_multifidelity']}.
  • Figure 3: TOP: Comparison between standard Monte Carlo estimator $\widehat{q}_\mathrm{MC}$ (dashed line) and the proposed estimator $\widehat{q}_\mathrm{sMC}$ (solid line), varying the number of data $M = 5, 10, 50, 100$ (left), and $K = 10^3, 10^4, 10^5, 10^6$ (right), used to learn the NeurAM and the CDF, respectively. The the gray dash-dotted vertical line represents the exact value of the quantity of interest, while the colored solid vertical lines (left) represent the bias obtained using only the surrogate model $\mathcal{Q}_{\mathrm S}$ for the different values of $M$. BOTTOM: Mean squared error (MSE) of the estimators to be compared with the value 1.93e-4 for Monte Carlo obtained with $N=1024$. The numbers in parentheses denote the ratio between the MSE of each estimator and the Monte Carlo reference value.
  • Figure 4: TOP: NeurAM-based stratification of the domain for the model $\mathcal{Q}_0$, varying the number of strata $S = 4, 9, 16, 25$. MIDDLE-TOP: Comparison between standard Monte Carlo estimator $\widehat{q}_\mathrm{MC}$ (dashed line) and the stratified estimators $\widehat{q}_\mathrm{sMC}$ (solid line) with both standard and NeurAM-based stratification. The gray dash-dotted vertical line represents the exact value of the quantity of interest. MIDDLE-BOTTOM: Mean squared error (MSE) of the estimators to be compared with the value 5.69e-5 for Monte Carlo obtained with $N=3600$. The numbers in parentheses denote the ratio between the MSE of each estimator and the Monte Carlo reference value. BOTTOM: MSE as a function of the computational budget $N$.
  • Figure 5: TOP: Comparison between standard Monte Carlo estimator $\widehat{q}_\mathrm{MC}$ (dashed line) and the proposed estimator $\widehat{q}_\mathrm{sMC}$ (solid line), for different allocation strategies and stratification approaches: uniform (u), halved (h), optimal (o). The gray dash-dotted vertical line represents the exact value of the quantity of interest. MIDDLE: Mean squared error (MSE) of the estimators to be compared with the value 2.02e-4 for Monte Carlo obtained with $N=1000$. The numbers in parentheses denote the ratio between the MSE of each estimator and the Monte Carlo reference value. BOTTOM: MSE as a function of the computational budget $N$.
  • ...and 5 more figures

Theorems & Definitions (15)

  • Lemma 2.1
  • proof
  • Remark 2.2
  • Example 2.3: Uniform stratification
  • Example 2.4
  • Theorem 2.5
  • proof
  • Corollary 2.6
  • proof
  • Remark 2.7
  • ...and 5 more