Table of Contents
Fetching ...

Phase transition for conditional covariance matrices estimated by importance sampling, and implications for cross-entropy schemes in high dimension

Jason Beh, Jerome Morio, Florian Simatos

TL;DR

The paper analyzes high-dimensional covariance estimation in cross-entropy schemes via a random-matrix model with dependent, heavy-tailed weights, proving a phase transition in the polynomial regime $n = d^\kappa$ governed by a threshold $κ_*$. The threshold is tied to the tail behavior of likelihood ratios and to the smallest eigenvalue $λ_{\min}(Σ)$ of the auxiliary covariance; in particular, $κ_* = 1/λ_1$ in the bad projection case $V \subset U_\perp$, and $1 \le κ_* \le 1/λ_1$ in the good case $V \subset U$. The authors connect this spectral viewpoint to CE schemes with projection, showing that larger $λ_1$ (and thus larger $λ_{\min}$ of projected covariance) yields more stable and accurate estimators, a finding supported by numerical experiments across several test functions. The results offer a spectral criterion for designing efficient high-dimensional CE algorithms and open directions for integrating projection strategies with broader CE frameworks and advanced subspace methods.

Abstract

Motivated by the estimation of covariance matrices by importance sampling arising in the cross-entropy (CE) algorithm, we study a random matrix model $\hat Σ= {\bf X} L {\bf X}^\top$ with two distinct features: $\bf X$ and $L$ are dependent, and $L$ is heavy-tailed. In the high-dimensional regime $d \to \infty$, we prove under suitable assumptions that a phase transition occurs in the polynomial regime $n = d^κ$, with $n$ the sample size. Namely, we prove that $\lVert \hat Σ- E \hat Σ\rVert \Rightarrow 0$ if and only if $κ> κ_*$ for some threshold $κ_*$ determined by the behavior of the maximum likelihood ratios. Moreover, we identify general situations where $κ_* = 1/λ_1$, with $λ_1$ the smallest eigenvalue of the covariance matrix of the auxiliary distribution used to estimate $\hat Σ$ by importance sampling. This suggests that importance sampling will work better with covariance matrices having a large smallest eigenvalue. We carry this insight into recent CE schemes proposed to estimate the probability of high-dimensional rare events. Through numerical simulations, we demonstrate that better CE schemes are also the ones with larger smallest eigenvalue, even though these algorithms were not designed to smooth the spectrum. This new spectral interpretation raises stimulating questions and opens research directions for the design of efficient high-dimensional algorithms.

Phase transition for conditional covariance matrices estimated by importance sampling, and implications for cross-entropy schemes in high dimension

TL;DR

The paper analyzes high-dimensional covariance estimation in cross-entropy schemes via a random-matrix model with dependent, heavy-tailed weights, proving a phase transition in the polynomial regime governed by a threshold . The threshold is tied to the tail behavior of likelihood ratios and to the smallest eigenvalue of the auxiliary covariance; in particular, in the bad projection case , and in the good case . The authors connect this spectral viewpoint to CE schemes with projection, showing that larger (and thus larger of projected covariance) yields more stable and accurate estimators, a finding supported by numerical experiments across several test functions. The results offer a spectral criterion for designing efficient high-dimensional CE algorithms and open directions for integrating projection strategies with broader CE frameworks and advanced subspace methods.

Abstract

Motivated by the estimation of covariance matrices by importance sampling arising in the cross-entropy (CE) algorithm, we study a random matrix model with two distinct features: and are dependent, and is heavy-tailed. In the high-dimensional regime , we prove under suitable assumptions that a phase transition occurs in the polynomial regime , with the sample size. Namely, we prove that if and only if for some threshold determined by the behavior of the maximum likelihood ratios. Moreover, we identify general situations where , with the smallest eigenvalue of the covariance matrix of the auxiliary distribution used to estimate by importance sampling. This suggests that importance sampling will work better with covariance matrices having a large smallest eigenvalue. We carry this insight into recent CE schemes proposed to estimate the probability of high-dimensional rare events. Through numerical simulations, we demonstrate that better CE schemes are also the ones with larger smallest eigenvalue, even though these algorithms were not designed to smooth the spectrum. This new spectral interpretation raises stimulating questions and opens research directions for the design of efficient high-dimensional algorithms.

Paper Structure

This paper contains 24 sections, 16 theorems, 117 equations, 3 figures, 1 table, 4 algorithms.

Key Result

Theorem 2.1

Assume that Assumptions ass:Sigma, ass:FID and ass:MW hold, and let $\kappa_* = 1/(1-\gamma_*)$. Assume in addition that $\inf_d p > 0$ and that either $V \subset U$ or $V \subset U_\perp$. Then in the regime $n = d^\kappa$, the following phase transition holds: Moreover, if $V \subset U_\perp$ then $\kappa_* = 1/\lambda_1$ while if $V \subset U$, then $1 \leq \kappa_* \leq 1/\lambda_1$.

Figures (3)

  • Figure 1: Interpretation of the performance of CE, CE-eig and CE-mean via the spectral behavior. Results for CE on the test functions $\varphi_{\rm quad}$ and $\varphi_{\rm fin}$ are not displayed because this scheme did not converge. For similar reasons, results for CE is not displayed for $\lambda_{\max}(\hat{\Sigma}_4)$ on the test function $\varphi_{\rm lin}$ since most repetitions diverge beyond the third iteration. Top figures display the distribution of the relative error $\lvert \hat{p} - p \rvert / p$. Bottom figures show the distribution of $\lambda_{\min}(\hat{\Sigma}^{\textnormal{proj}}_t)$ and $\lambda_{\max}(\hat{\Sigma}_{t+1})$ for the first three iterations $t = 1,2$ and $3$.
  • Figure 2: Interpretation of the performance of iCE, iCE-eig and iCE-mean via the spectral behavior. Top figures display the distribution of the relative error $\lvert \hat{p} - p \rvert / p$. Bottom figures show the distribution of $\lambda_{\min}(\hat{\Sigma}^{\textnormal{proj}}_t)$ and $\lambda_{\max}(\hat{\Sigma}_{t+1})$ for the first three iterations $t = 1,2$ and $3$.
  • Figure 3: Comparison of CE and iCE on the linear test function $\varphi_{\rm lin}$. (a) Distribution of the relative error $\lvert \hat{p} - p \rvert / p$. (b) Evolution of $\lambda_{\min}(\hat{\Sigma}^{\textnormal{proj}}_t)$ and $\lambda_{\max}(\hat{\Sigma}_{t+1})$ during the first three iterations. Most repetitions diverge beyond the third iteration for CE so that results of $\lambda_{\max}(\hat{\Sigma}_4)$ are not displayed.

Theorems & Definitions (26)

  • Theorem 2.1
  • Proposition 2.2
  • Corollary 4.1
  • Theorem 4.2: Theorem $2.2$ in beh2023insight
  • Lemma 5.1
  • proof
  • Lemma 5.2
  • proof
  • Lemma 5.3
  • proof
  • ...and 16 more