Table of Contents
Fetching ...

Asymptotic Behavior of Principal Component Projections for Multivariate Extremes

Holger Drees

TL;DR

This work develops a PCA-based approach for estimating the extremal dependence structure of a regularly varying $d$-dimensional vector via its angular measure. By projecting extreme observations onto a data-driven lower-dimensional subspace, the authors derive asymptotic results for the PCA projection and the resulting excess risk, including explicit limit distributions when the limit projection is unique. They establish local empirical-process results around the optimal projection and propose a dimension-selection rule that adapts the projection dimension to the data. Simulation studies demonstrate potential gains in finite samples across diverse high-dimensional extreme-value models, suggesting practical benefits for estimating angular measures in moderate-to-high dimensions. Overall, the paper provides a rigorous asymptotic framework and actionable methodology for dimension reduction in multivariate extremes with angular-dependence structure.

Abstract

The extremal dependence structure of a regularly varying $d$-dimensional random vector can be described by its angular measure. The standard nonparametric estimator of this measure is the empirical measure of the observed angles of the $k$ random vectors with largest norm, for a suitably chosen number $k$. Due to the curse of dimensionality, for moderate or large $d$, this estimator is often inaccurate. If the angular measure is concentrated on a vicinity of a lower dimensional subspace, then first projecting the data on a lower dimensional subspace obtained by a principal component analysis of the angles of extreme observations can substantially improve the performance of the estimator. We derive the asymptotic behavior of such PCA projections and the resulting excess risk. In particular, it is shown that, under mild conditions, the excess risk (as a function of $k$) decreases much faster than it was suggested by empirical risk bounds obtained in \cite{DS21}. Moreover, functional limit theorems for local empirical processes of the (empirical) reconstruction error of projections uniformly over neighborhoods of the true optimal projection are established. Based on these asymptotic results, we propose a data-driven method to select the dimension of the projection space. Finally, the finite sample performance of resulting estimators is examined in a simulation study.

Asymptotic Behavior of Principal Component Projections for Multivariate Extremes

TL;DR

This work develops a PCA-based approach for estimating the extremal dependence structure of a regularly varying -dimensional vector via its angular measure. By projecting extreme observations onto a data-driven lower-dimensional subspace, the authors derive asymptotic results for the PCA projection and the resulting excess risk, including explicit limit distributions when the limit projection is unique. They establish local empirical-process results around the optimal projection and propose a dimension-selection rule that adapts the projection dimension to the data. Simulation studies demonstrate potential gains in finite samples across diverse high-dimensional extreme-value models, suggesting practical benefits for estimating angular measures in moderate-to-high dimensions. Overall, the paper provides a rigorous asymptotic framework and actionable methodology for dimension reduction in multivariate extremes with angular-dependence structure.

Abstract

The extremal dependence structure of a regularly varying -dimensional random vector can be described by its angular measure. The standard nonparametric estimator of this measure is the empirical measure of the observed angles of the random vectors with largest norm, for a suitably chosen number . Due to the curse of dimensionality, for moderate or large , this estimator is often inaccurate. If the angular measure is concentrated on a vicinity of a lower dimensional subspace, then first projecting the data on a lower dimensional subspace obtained by a principal component analysis of the angles of extreme observations can substantially improve the performance of the estimator. We derive the asymptotic behavior of such PCA projections and the resulting excess risk. In particular, it is shown that, under mild conditions, the excess risk (as a function of ) decreases much faster than it was suggested by empirical risk bounds obtained in \cite{DS21}. Moreover, functional limit theorems for local empirical processes of the (empirical) reconstruction error of projections uniformly over neighborhoods of the true optimal projection are established. Based on these asymptotic results, we propose a data-driven method to select the dimension of the projection space. Finally, the finite sample performance of resulting estimators is examined in a simulation study.

Paper Structure

This paper contains 10 sections, 7 theorems, 67 equations, 6 figures.

Key Result

Theorem 2.1

If condition (B) is met, then for $\Delta_n:= \hat{\Sigma}_{n,k}-\Sigma_{n,k}$ one has where $\mathbf{U}=(U_{ij})_{1\le i,j\le d}$ is a symmetric centered Gaussian matrix with $Cov(U_{ij},U_{\ell m}) =$$Cov_\infty(\theta_i\theta_j,\theta_\ell\theta_m)$ for all $1\le i,j,\ell,m\le d$.

Figures (6)

  • Figure 1: RMSE of the estimators of the probabilities (i)--(iv) based on $\hat{H}_{n,k}$ (black, solid), $\hat{H}_{n,k}^{PCA}$ (blue, dashed) and $\hat{H}_{n,k,10}^{PCA}$ (red, dashed) versus $k$ in the Dirichlet model with fixed dimensions $p=2$ and $d=10$. The colored solid lines indicate the corresponding RMSE when the dimension is chosen as in \ref{['eq:choice_p']}.
  • Figure 2: Empirical mean of the selected dimension $\hat{p}$ versus $k$ in the Dirichlet model with $d=10$.
  • Figure 3: RMSE of the estimators of the probabilities (i)--(iv) based on $\hat{H}_{n,k}$ (black, solid), $\hat{H}_{n,k}^{PCA}$ (blue, dashed) and $\hat{H}_{n,k,10}^{PCA}$ (red, dashed) versus $k$ for randomly rotated Dirichlet observations with fixed dimensions $p=2$ and $d=10$. The colored solid lines indicate the corresponding RMSE when the dimension is chosen as in \ref{['eq:choice_p']}.
  • Figure 4: RMSE of the estimators of the probabilities (i)--(iv) based on $\hat{H}_{n,k}$ (black, solid), $\hat{H}_{n,k}^{PCA}$ (blue, dashed) and $\hat{H}_{n,k,10}^{PCA}$ (red, dashed) versus $k$ in the Gumbel model with fixed dimensions $p=2$ and $d=10$. The colored solid lines indicate the corresponding RMSE when the dimension is chosen as in \ref{['eq:choice_p']}.
  • Figure 5: RMSE of the estimators of the probabilities (i)--(iv) based on $\hat{H}_{n,k}$ (black, solid), $\hat{H}_{n,k}^{PCA}$ (blue, dashed) and $\hat{H}_{n,k,15}^{PCA}$ (red, dashed) versus $k$ in the Dirichlet model with fixed dimensions $p=5$ and $d=100$. The colored solid lines indicate the corresponding RMSE when the dimension is chosen as in \ref{['eq:choice_p']}.
  • ...and 1 more figures

Theorems & Definitions (15)

  • Theorem 2.1
  • proof
  • Remark 2.2
  • Theorem 3.1
  • proof
  • Corollary 4.1
  • proof
  • Corollary 4.2
  • proof
  • Corollary 4.3
  • ...and 5 more