Propagation of Input Tail Uncertainty in Rare-Event Estimation: A Light versus Heavy Tail Dichotomy
Zhiyuan Huang, Henry Lam, Zhenyuan Liu
TL;DR
This work analyzes how uncertainty about input tail information propagates to rare-event estimates for sums of i.i.d. inputs, showing heavy-tailed problems are far more sensitive to tail misspecification than light-tailed ones. It develops a theory contrasting tail-truncation effects under heavy versus light tails, derives data-driven thresholds via empirical truncation levels, and connects these to practical uncertainty quantification using bootstrap and extreme-value theory. Numerical experiments confirm that standard bootstrap often under-covers for heavy-tailed problems, while tail-extrapolation via generalized Pareto models can improve coverage but may introduce bias; extreme-value index estimators can effectively signal when additional data collection is warranted. The paper provides a clear, data-informed roadmap for reliable rare-event estimation under tail uncertainty, emphasizing that data size must typically exceed roughly $n/p$ in risky heavy-tail settings to avoid substantial under-estimation and misleading uncertainty quantification.
Abstract
We consider the estimation of small probabilities or other risk quantities associated with rare but catastrophic events. In the model-based literature, much of the focus has been devoted to efficient Monte Carlo computation or analytical approximation assuming the model is accurately specified. In this paper, we study a distinct direction on the propagation of model uncertainty and how it impacts the reliability of rare-event estimates. Specifically, we consider the basic setup of the exceedance of i.i.d. sum, and investigate how the lack of tail information of each input summand can affect the output probability. We argue that heavy-tailed problems are much more vulnerable to input uncertainty than light-tailed problems, reasoned through their large deviations behaviors and numerical evidence. We also investigate some approaches to quantify model errors in this problem using a combination of the bootstrap and extreme value theory, showing some positive outcomes but also uncovering some statistical challenges.
