Table of Contents
Fetching ...

Quasi-Monte Carlo with one categorical variable

Valerie N. P. Ho, Art B. Owen, Zexin Pan

TL;DR

This work advances randomized quasi-Monte Carlo (RQMC) for integrals with a single categorical input by modeling the problem as a mixture across L strata with weights $\alpha_ℓ$. It develops principled, rate-aware stratum allocations that oversample smaller mixture components when within-stratum errors decay at an RQMC rate, and it shows how dyadic (power-of-two) sample splits yield optimal scrambled Sobol' performance. The paper provides a minimax justification for near-equal allocations, a forward allocation algorithm under dyadic constraints, and demonstrations on a toy example and a Saint-Venant flood model that oversampling improves variance. In practice, these results guide how to design RQMC sampling for mixture and importance-sampling problems, particularly when convergence rates are faster than Monte Carlo. All mathematical notation is conveyed with explicit $...$ delimiters.

Abstract

We study randomized quasi-Monte Carlo (RQMC) estimation of a multivariate integral where one of the variables takes only a finite number of values. This problem arises when the variable of integration is drawn from a mixture distribution as is common in importance sampling and also arises in some recent work on transport maps. We find that when integration error decreases at an RQMC rate that it is then important to oversample the smallest mixture components instead of using a proportional allocation. This can even improve the rate of convergence. The optimal allocations depend on the possibly unknown convergence rate. Designing the sample with an incorrect assumption on the rate still attains that convergence rate, with an inferior implied constant. The penalty for using a pessimistic rate is typically higher than for using an optimistic one. We also find that for the most accurate RQMC sampling methods, it is advantageous to arrange that our $n=2^m$ randomized Sobol' points split into subsample sizes that are also powers of $2$.

Quasi-Monte Carlo with one categorical variable

TL;DR

This work advances randomized quasi-Monte Carlo (RQMC) for integrals with a single categorical input by modeling the problem as a mixture across L strata with weights . It develops principled, rate-aware stratum allocations that oversample smaller mixture components when within-stratum errors decay at an RQMC rate, and it shows how dyadic (power-of-two) sample splits yield optimal scrambled Sobol' performance. The paper provides a minimax justification for near-equal allocations, a forward allocation algorithm under dyadic constraints, and demonstrations on a toy example and a Saint-Venant flood model that oversampling improves variance. In practice, these results guide how to design RQMC sampling for mixture and importance-sampling problems, particularly when convergence rates are faster than Monte Carlo. All mathematical notation is conveyed with explicit delimiters.

Abstract

We study randomized quasi-Monte Carlo (RQMC) estimation of a multivariate integral where one of the variables takes only a finite number of values. This problem arises when the variable of integration is drawn from a mixture distribution as is common in importance sampling and also arises in some recent work on transport maps. We find that when integration error decreases at an RQMC rate that it is then important to oversample the smallest mixture components instead of using a proportional allocation. This can even improve the rate of convergence. The optimal allocations depend on the possibly unknown convergence rate. Designing the sample with an incorrect assumption on the rate still attains that convergence rate, with an inferior implied constant. The penalty for using a pessimistic rate is typically higher than for using an optimistic one. We also find that for the most accurate RQMC sampling methods, it is advantageous to arrange that our randomized Sobol' points split into subsample sizes that are also powers of .

Paper Structure

This paper contains 24 sections, 7 theorems, 54 equations, 4 figures, 2 tables, 1 algorithm.

Key Result

Proposition 1

For $n\geqslant1$, let $v_1,\dots,v_n$ be stratified. For $[A,B)\subset[0,1)$, let $n_*$ be the number of $v_i\in[A,B)$. Then where $\beta=B-A$.

Figures (4)

  • Figure 1: These are the partitions of unity into $L$ negative powers of $2$ for $L\in\{4,5,6,7,8\}$.
  • Figure 2: Variance of the 5 estimators for the estimation of ${\mathbb{E}}(f(\boldsymbol{z}))$ versus the sample size $n$, for the toy integrand. The top to bottom ordering of the curves in the legend is the same as they have at $n=4096$. Four of the curves are nearly parallel to reference lines of a given slope.
  • Figure 3: We plot variance versus sample size for $\hat{\mu}_{\mathrm{MC}}$, $\hat{\mu}_{\mathrm{RQMC}}$, $\hat{\mu}_{\mathrm{RQMC}(\mathrm{ADJ})}$, $\hat{\mu}_{\mathrm{RQMC}(2)}$ and $\hat{\mu}_{\mathrm{RQMC}(L)}$ for the Saint-Venant flood depth \ref{['eq:flooddepth']}. The top to bottom ordering of the curves in the legend is the same as they have at $n=4096$. The sample size allocations for $\hat{\mu}_{\mathrm{RQMC}(\mathrm{ADJ})}$, $\hat{\mu}_{\mathrm{RQMC}(2)}$, and $\hat{\mu}_{\mathrm{RQMC}(L)}$ used $\rho=2$.
  • Figure 4: Variance of $\hat{\mu}_{\mathrm{RQMC}(\mathrm{ADJ})}$ versus the sample size $n$ for $\rho \in \{1,2,3,+\infty\}$. The reference line has slope $-1.8$.

Theorems & Definitions (21)

  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • Proposition 3
  • proof
  • Proposition 4
  • proof
  • Proposition 5
  • proof
  • ...and 11 more