Table of Contents
Fetching ...

Insight from the Kullback--Leibler divergence into adaptive importance sampling schemes for rare event analysis in high dimension

Jason Beh, Yonatan Shadmi, Florian Simatos

TL;DR

The paper investigates adaptive importance sampling for rare-event probabilities in high dimensions, focusing on two AIS families: the cross-entropy (CE) method and projection-based projection densities ${g_{proj}}$ (and their estimators). It proves that if the adaptation sample size ${n_g}$ grows polynomially with the dimension and the rare-event probability ${p_f(A)}$ is bounded away from zero, then the AIS estimators are high-dimensional efficient and weight degeneracy is avoided, contrary to common belief. For projection methods, efficiency can be achieved with ${n_g\gg rd}$, highlighting the advantage of low-dimensional projections in high-dimensional settings; in particular, using ${r=d}$ recovers the optimal Gaussian ${g_A}$, and ${\hat g_A}$ requires ${n_g\gg d^2}$. The CE framework is shown to require a polynomial growth rate (with an explicit dependence on the smallest eigenvalue of covariance estimates) to guarantee efficiency, while a simple computational framework for projection methods makes the results transparent. Overall, the work provides KL-divergence-based conditions and CD-type tail bounds that explain when AIS can beat the curse of dimensionality in rare-event analysis and offers insight into the trade-offs between projection dimension, adaptation sample size, and estimator accuracy.

Abstract

We study two adaptive importance sampling schemes for estimating the probability of a rare event in the high-dimensional regime $d \to \infty$ with $d$ the dimension. The first scheme is the prominent cross-entropy (CE) method, and the second scheme, motivated by recent results, uses as auxiliary distribution a projection of the optimal auxiliary distribution on a lower dimensional subspace. In these schemes, two samples are used: the first one to learn the auxiliary distribution and the second one, drawn according to the learned distribution, to perform the final probability estimation. Contrary to the common belief that the sample size needs to grow exponentially in the dimension to make the estimator consistent and avoid the weight degeneracy phenomenon, we find that a polynomial sample size in the first learning step is enough. We prove this result assuming that the sought probability is bounded away from 0. For CE, insight is provided on the polynomial growth rate which remains implicit. In contrast, we study the second scheme in a simple computational framework assuming that samples from the conditional distribution are available. This makes it possible to show that the sample size only needs to grow like $rd$ with $r$ the effective dimension of the projection, which highlights the potential benefits of these projection methods.

Insight from the Kullback--Leibler divergence into adaptive importance sampling schemes for rare event analysis in high dimension

TL;DR

The paper investigates adaptive importance sampling for rare-event probabilities in high dimensions, focusing on two AIS families: the cross-entropy (CE) method and projection-based projection densities (and their estimators). It proves that if the adaptation sample size grows polynomially with the dimension and the rare-event probability is bounded away from zero, then the AIS estimators are high-dimensional efficient and weight degeneracy is avoided, contrary to common belief. For projection methods, efficiency can be achieved with , highlighting the advantage of low-dimensional projections in high-dimensional settings; in particular, using recovers the optimal Gaussian , and requires . The CE framework is shown to require a polynomial growth rate (with an explicit dependence on the smallest eigenvalue of covariance estimates) to guarantee efficiency, while a simple computational framework for projection methods makes the results transparent. Overall, the work provides KL-divergence-based conditions and CD-type tail bounds that explain when AIS can beat the curse of dimensionality in rare-event analysis and offers insight into the trade-offs between projection dimension, adaptation sample size, and estimator accuracy.

Abstract

We study two adaptive importance sampling schemes for estimating the probability of a rare event in the high-dimensional regime with the dimension. The first scheme is the prominent cross-entropy (CE) method, and the second scheme, motivated by recent results, uses as auxiliary distribution a projection of the optimal auxiliary distribution on a lower dimensional subspace. In these schemes, two samples are used: the first one to learn the auxiliary distribution and the second one, drawn according to the learned distribution, to perform the final probability estimation. Contrary to the common belief that the sample size needs to grow exponentially in the dimension to make the estimator consistent and avoid the weight degeneracy phenomenon, we find that a polynomial sample size in the first learning step is enough. We prove this result assuming that the sought probability is bounded away from 0. For CE, insight is provided on the polynomial growth rate which remains implicit. In contrast, we study the second scheme in a simple computational framework assuming that samples from the conditional distribution are available. This makes it possible to show that the sample size only needs to grow like with the effective dimension of the projection, which highlights the potential benefits of these projection methods.
Paper Structure (31 sections, 39 theorems, 186 equations, 2 algorithms)

This paper contains 31 sections, 39 theorems, 186 equations, 2 algorithms.

Key Result

Theorem 2.2

Assume that: Then for every $t \geq 0$, $g_t$ is efficient in high dimension for $A$. Assume in addition that $m \to \infty$. Then for each $t \geq 0$, there exists a finite constant $\kappa_t > 0$ such that if ${n_g} \gg d^{\kappa_t}$, then $\hat{g}_t$ is efficient in high dimension.

Theorems & Definitions (76)

  • Definition 2.1: High-dimensional efficiency for $A$
  • Theorem 2.2
  • Remark 2.3
  • Remark 2.4
  • Theorem 2.5
  • Corollary 2.6
  • Remark 2.7
  • Proposition 2.8
  • Theorem 2.9
  • Lemma 3.1
  • ...and 66 more