Table of Contents
Fetching ...

Statistical inference for extremal directions in high-dimensional spaces

Lucas Butsch, Vicky Fasen-Hartmann

Abstract

In multivariate extreme value statistics, the first step in understanding the dependence structure of extremes is identifying the directions in which they occur. The novelty of this paper is the analysis of high-dimensional extreme value models in which both the model dimension and the number of bias directions go to infinity as the number of observations tends to infinity; we estimate the number of extremal directions. To address the curse of dimensionality, we extend and investigate the information criteria (AIC, BICU, BICL, QAIC and MSEIC) from the fixed-dimensional case (Butsch and Fasen-Hartmann, 2025a; Meyer and Wintenberger, 2023), which employ the concept of sparse regular variation that is closely related to multivariate regular variation, for the estimation of the number of extremal directions. For all information criteria, we derive sufficient conditions for consistency. Unlike in the fixed-dimensional case, where only the Bayesian information criteria (BICU and BICL) and the QAIC are consistent, the AIC and MSEIC are also consistent in high dimensions under certain model assumptions. We compare the performance of the different information criteria in a simulation study that includes a detailed analysis of the model assumptions and the necessary and sufficient conditions for consistency.

Statistical inference for extremal directions in high-dimensional spaces

Abstract

In multivariate extreme value statistics, the first step in understanding the dependence structure of extremes is identifying the directions in which they occur. The novelty of this paper is the analysis of high-dimensional extreme value models in which both the model dimension and the number of bias directions go to infinity as the number of observations tends to infinity; we estimate the number of extremal directions. To address the curse of dimensionality, we extend and investigate the information criteria (AIC, BICU, BICL, QAIC and MSEIC) from the fixed-dimensional case (Butsch and Fasen-Hartmann, 2025a; Meyer and Wintenberger, 2023), which employ the concept of sparse regular variation that is closely related to multivariate regular variation, for the estimation of the number of extremal directions. For all information criteria, we derive sufficient conditions for consistency. Unlike in the fixed-dimensional case, where only the Bayesian information criteria (BICU and BICL) and the QAIC are consistent, the AIC and MSEIC are also consistent in high dimensions under certain model assumptions. We compare the performance of the different information criteria in a simulation study that includes a detailed analysis of the model assumptions and the necessary and sufficient conditions for consistency.

Paper Structure

This paper contains 16 sections, 4 theorems, 76 equations, 5 figures.

Key Result

Theorem 3.1

Suppose asu:T holds. Then $\mathop{\mathrm{BICU}}\nolimits$ and $\mathop{\mathrm{BICL}}\nolimits$ are weakly consistent in the sense of

Figures (5)

  • Figure 2: Simulations for the asymptotically independent model with $s^* = 75$ and $d_n=200$: Boxplots for the empirical estimator $\widehat{s}_n/k_n$ (in (a)), the empirical estimator for $q$ (in (b)), and the empirical estimators for $g_{\mathop{\mathrm{AIC}}\nolimits}(q,\mu),g_{\mathop{\mathrm{MSEIC}}\nolimits}(q,\mu)$ and $g_{\mathop{\mathrm{QAIC}}\nolimits}(q)$ (in (c)) are plotted against the sample sizes $n = 10.000, 25.000$ and $50.000$ on the $x$-axis, where $k_{10.000} = 500$, $k_{25.000}=2.200$ and $k_{50.000}=5.000$, respectively.
  • Figure 3: Simulations for the asymptotically independent model with $s^* = 75$ and $d_n=200$: Boxplots for the estimated number of extremal directions (left plot (a)) and the Hellinger distance (right plot (b)), respectively, are plotted against the sample sizes $n = 10.000, 25.000$ and $50.000$ on the $x$-axis, where $k_{10.000} = 500$, $k_{25.000}=2.200$ and $k_{50.000}=5.000$, respectively.
  • Figure 5: Simulations for the asymptotically dependent model with $s^* = 50$ and $d_n=300$: The boxplots for $\widehat{s}_n/k_n$ (in (a)), the empirical estimator for $q$ (in (b)), and the empirical estimators for $g_{\mathop{\mathrm{AIC}}\nolimits}(q,\mu),g_{\mathop{\mathrm{MSEIC}}\nolimits}(q,\mu)$ and $g_{\mathop{\mathrm{QAIC}}\nolimits}(q)$ (in (c)) are plotted against the sample sizes $n = 10.000, 25.000$ and $50.000$ on the $x$-axis, where $k_{10.000} = 750$, $k_{25.000}=2.500$ and $k_{50.000}=5.000$, respectively.
  • Figure 6: Simulations for the asymptotically dependent model with $s^* = 50$ and $d_n=300$: Boxplots for estimated number of extremal directions (left plot (a)) and the Hellinger distance (right plot (b)), respectively are plotted against the sample sizes $n = 10.000, 25.000$ and $50.000$ on the $x$-axis, where $k_{10.000} = 750$, $k_{25.000}=2.500$ and $k_{50.000}=5.000$, respectively.
  • Figure 7: A realization of $\mathop{\mathrm{BICU}}\nolimits$, $\mathop{\mathrm{BICL}}\nolimits$, $\mathop{\mathrm{AIC}}\nolimits$, $\mathop{\mathrm{QAIC}}\nolimits$ and $\mathop{\mathrm{MSEIC}}\nolimits$ in the asymptotically dependent model with $s^* = 50$, $n = 50.000$ and $d_n=300$: The solid lines indicate $\hbox{IC}_{k_n}(s)-\hbox{IC}_{k_n}(s^*)$ for $48\leq s\leq 70$ and the dashed lines indicate $[(\hbox{IC}_{k_n}(s)-P(s))-(\hbox{IC}_{k_n}(s^*)-P(s^*)]$ (without the penalty term).

Theorems & Definitions (9)

  • Remark 2.1
  • Remark 2.2
  • Theorem 3.1
  • Theorem 3.2
  • Remark 3.3
  • Theorem 3.4
  • Remark 3.5
  • Theorem 3.6
  • Remark 3.7