Extreme-PLS with missing data under weak dependence
Stéphane Girard, Cambyse Pakzad
TL;DR
This work develops Extreme-PLS (EPLS) for dimension reduction in the tails of a response with high-dimensional covariates under missing data and weak temporal dependence. It combines a single-index inverse regression model with a MAR mechanism on the covariates and an $\alpha$-mixing framework to derive consistent, and in some regimes asymptotically Gaussian, estimators for the tail projection direction $\beta$. The paper establishes a rigorous asymptotic theory for the EPLS estimator based on tail-moment estimators, accompanied by extensive simulations across eleven dependence schemes (including ARMA, GARCH, and ESTAR) and a real-data NOAA application showing EPLS effectively recovers tail directions. The results demonstrate robustness to heavy tails, missing data, and serial dependence, highlighting practical applicability to environmental and financial time series where extremes are most informative.
Abstract
This paper develops a theoretical framework for Extreme Partial Least Squares (EPLS) dimension reduction in the presence of missing data and weak temporal dependence. Building upon the recent EPLS methodology for modeling extremal dependence between a response variable and high-dimensional covariates, we extend the approach to more realistic data settings where both serial correlation and missing-ness occur. Specifically, we consider a single-index inverse regression model under heavy-tailed conditions and introduce a Missing-at-Random (MAR) mechanism acting on the covariates, whose probability depends on the extremeness of the response. The asymptotic behavior of the proposed estimator is established within an alpha-mixing framework, leading to consistency results under regularly varying tails. Extensive Monte-Carlo experiments covering eleven dependence schemes (including ARMA, GARCH, and nonlinear ESTAR processes) demonstrate that the method performs robustly across a wide range of heavy-tailed and dependent scenarios, even when substantial portions of data are missing. A real-world application to environmental data further confirms the method's capacity to recover meaningful tail directions.
