Insight from the Kullback--Leibler divergence into adaptive importance sampling schemes for rare event analysis in high dimension

Jason Beh; Yonatan Shadmi; Florian Simatos

Insight from the Kullback--Leibler divergence into adaptive importance sampling schemes for rare event analysis in high dimension

Jason Beh, Yonatan Shadmi, Florian Simatos

TL;DR

The paper investigates adaptive importance sampling for rare-event probabilities in high dimensions, focusing on two AIS families: the cross-entropy (CE) method and projection-based projection densities ${g_{proj}}$ (and their estimators). It proves that if the adaptation sample size ${n_g}$ grows polynomially with the dimension and the rare-event probability ${p_f(A)}$ is bounded away from zero, then the AIS estimators are high-dimensional efficient and weight degeneracy is avoided, contrary to common belief. For projection methods, efficiency can be achieved with ${n_g\gg rd}$, highlighting the advantage of low-dimensional projections in high-dimensional settings; in particular, using ${r=d}$ recovers the optimal Gaussian ${g_A}$, and ${\hat g_A}$ requires ${n_g\gg d^2}$. The CE framework is shown to require a polynomial growth rate (with an explicit dependence on the smallest eigenvalue of covariance estimates) to guarantee efficiency, while a simple computational framework for projection methods makes the results transparent. Overall, the work provides KL-divergence-based conditions and CD-type tail bounds that explain when AIS can beat the curse of dimensionality in rare-event analysis and offers insight into the trade-offs between projection dimension, adaptation sample size, and estimator accuracy.

Abstract

We study two adaptive importance sampling schemes for estimating the probability of a rare event in the high-dimensional regime $d \to \infty$ with $d$ the dimension. The first scheme is the prominent cross-entropy (CE) method, and the second scheme, motivated by recent results, uses as auxiliary distribution a projection of the optimal auxiliary distribution on a lower dimensional subspace. In these schemes, two samples are used: the first one to learn the auxiliary distribution and the second one, drawn according to the learned distribution, to perform the final probability estimation. Contrary to the common belief that the sample size needs to grow exponentially in the dimension to make the estimator consistent and avoid the weight degeneracy phenomenon, we find that a polynomial sample size in the first learning step is enough. We prove this result assuming that the sought probability is bounded away from 0. For CE, insight is provided on the polynomial growth rate which remains implicit. In contrast, we study the second scheme in a simple computational framework assuming that samples from the conditional distribution are available. This makes it possible to show that the sample size only needs to grow like $rd$ with $r$ the effective dimension of the projection, which highlights the potential benefits of these projection methods.

Insight from the Kullback--Leibler divergence into adaptive importance sampling schemes for rare event analysis in high dimension

TL;DR

(and their estimators). It proves that if the adaptation sample size

grows polynomially with the dimension and the rare-event probability

is bounded away from zero, then the AIS estimators are high-dimensional efficient and weight degeneracy is avoided, contrary to common belief. For projection methods, efficiency can be achieved with

, highlighting the advantage of low-dimensional projections in high-dimensional settings; in particular, using

recovers the optimal Gaussian

, and

requires

. The CE framework is shown to require a polynomial growth rate (with an explicit dependence on the smallest eigenvalue of covariance estimates) to guarantee efficiency, while a simple computational framework for projection methods makes the results transparent. Overall, the work provides KL-divergence-based conditions and CD-type tail bounds that explain when AIS can beat the curse of dimensionality in rare-event analysis and offers insight into the trade-offs between projection dimension, adaptation sample size, and estimator accuracy.

Abstract

We study two adaptive importance sampling schemes for estimating the probability of a rare event in the high-dimensional regime

with

the dimension. The first scheme is the prominent cross-entropy (CE) method, and the second scheme, motivated by recent results, uses as auxiliary distribution a projection of the optimal auxiliary distribution on a lower dimensional subspace. In these schemes, two samples are used: the first one to learn the auxiliary distribution and the second one, drawn according to the learned distribution, to perform the final probability estimation. Contrary to the common belief that the sample size needs to grow exponentially in the dimension to make the estimator consistent and avoid the weight degeneracy phenomenon, we find that a polynomial sample size in the first learning step is enough. We prove this result assuming that the sought probability is bounded away from 0. For CE, insight is provided on the polynomial growth rate which remains implicit. In contrast, we study the second scheme in a simple computational framework assuming that samples from the conditional distribution are available. This makes it possible to show that the sample size only needs to grow like

with

the effective dimension of the projection, which highlights the potential benefits of these projection methods.

Paper Structure (31 sections, 39 theorems, 186 equations, 2 algorithms)

This paper contains 31 sections, 39 theorems, 186 equations, 2 algorithms.

Introduction
Main results
Minimal notation
High-dimensional efficiency of CE densities
High-dimensional efficiency of auxiliary distributions using projection methods in a simple computational framework
Discussion of the assumption $\inf_d p_f(A) > 0$
Literature overview
Importance sampling as a sampling scheme
Importance sampling in a reliability context
Proof overview
Preliminary results
Further notation
Results from Chatterjee and Diaconis Chatterjee18:0
General formula for the Kullback--Leibler divergence
Results on the function $\Psi$
...and 16 more sections

Key Result

Theorem 2.2

Assume that: Then for every $t \geq 0$, $g_t$ is efficient in high dimension for $A$. Assume in addition that $m \to \infty$. Then for each $t \geq 0$, there exists a finite constant $\kappa_t > 0$ such that if ${n_g} \gg d^{\kappa_t}$, then $\hat{g}_t$ is efficient in high dimension.

Theorems & Definitions (76)

Definition 2.1: High-dimensional efficiency for $A$
Theorem 2.2
Remark 2.3
Remark 2.4
Theorem 2.5
Corollary 2.6
Remark 2.7
Proposition 2.8
Theorem 2.9
Lemma 3.1
...and 66 more

Insight from the Kullback--Leibler divergence into adaptive importance sampling schemes for rare event analysis in high dimension

TL;DR

Abstract

Insight from the Kullback--Leibler divergence into adaptive importance sampling schemes for rare event analysis in high dimension

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (76)