Table of Contents
Fetching ...

Propagation of Input Tail Uncertainty in Rare-Event Estimation: A Light versus Heavy Tail Dichotomy

Zhiyuan Huang, Henry Lam, Zhenyuan Liu

TL;DR

This work analyzes how uncertainty about input tail information propagates to rare-event estimates for sums of i.i.d. inputs, showing heavy-tailed problems are far more sensitive to tail misspecification than light-tailed ones. It develops a theory contrasting tail-truncation effects under heavy versus light tails, derives data-driven thresholds via empirical truncation levels, and connects these to practical uncertainty quantification using bootstrap and extreme-value theory. Numerical experiments confirm that standard bootstrap often under-covers for heavy-tailed problems, while tail-extrapolation via generalized Pareto models can improve coverage but may introduce bias; extreme-value index estimators can effectively signal when additional data collection is warranted. The paper provides a clear, data-informed roadmap for reliable rare-event estimation under tail uncertainty, emphasizing that data size must typically exceed roughly $n/p$ in risky heavy-tail settings to avoid substantial under-estimation and misleading uncertainty quantification.

Abstract

We consider the estimation of small probabilities or other risk quantities associated with rare but catastrophic events. In the model-based literature, much of the focus has been devoted to efficient Monte Carlo computation or analytical approximation assuming the model is accurately specified. In this paper, we study a distinct direction on the propagation of model uncertainty and how it impacts the reliability of rare-event estimates. Specifically, we consider the basic setup of the exceedance of i.i.d. sum, and investigate how the lack of tail information of each input summand can affect the output probability. We argue that heavy-tailed problems are much more vulnerable to input uncertainty than light-tailed problems, reasoned through their large deviations behaviors and numerical evidence. We also investigate some approaches to quantify model errors in this problem using a combination of the bootstrap and extreme value theory, showing some positive outcomes but also uncovering some statistical challenges.

Propagation of Input Tail Uncertainty in Rare-Event Estimation: A Light versus Heavy Tail Dichotomy

TL;DR

This work analyzes how uncertainty about input tail information propagates to rare-event estimates for sums of i.i.d. inputs, showing heavy-tailed problems are far more sensitive to tail misspecification than light-tailed ones. It develops a theory contrasting tail-truncation effects under heavy versus light tails, derives data-driven thresholds via empirical truncation levels, and connects these to practical uncertainty quantification using bootstrap and extreme-value theory. Numerical experiments confirm that standard bootstrap often under-covers for heavy-tailed problems, while tail-extrapolation via generalized Pareto models can improve coverage but may introduce bias; extreme-value index estimators can effectively signal when additional data collection is warranted. The paper provides a clear, data-informed roadmap for reliable rare-event estimation under tail uncertainty, emphasizing that data size must typically exceed roughly in risky heavy-tail settings to avoid substantial under-estimation and misleading uncertainty quantification.

Abstract

We consider the estimation of small probabilities or other risk quantities associated with rare but catastrophic events. In the model-based literature, much of the focus has been devoted to efficient Monte Carlo computation or analytical approximation assuming the model is accurately specified. In this paper, we study a distinct direction on the propagation of model uncertainty and how it impacts the reliability of rare-event estimates. Specifically, we consider the basic setup of the exceedance of i.i.d. sum, and investigate how the lack of tail information of each input summand can affect the output probability. We argue that heavy-tailed problems are much more vulnerable to input uncertainty than light-tailed problems, reasoned through their large deviations behaviors and numerical evidence. We also investigate some approaches to quantify model errors in this problem using a combination of the bootstrap and extreme value theory, showing some positive outcomes but also uncovering some statistical challenges.
Paper Structure (22 sections, 9 theorems, 179 equations, 9 figures, 2 tables)

This paper contains 22 sections, 9 theorems, 179 equations, 9 figures, 2 tables.

Key Result

Theorem 1

Suppose $X_i$'s are i.i.d. random variables with regularly varying tail distribution $\bar{F}$ in the form rv with $\alpha>2$ and $E|X|^{2+\delta}<\infty$. Let $n\to\infty$ and $\gamma=n\mu+\omega(\sqrt{n\log n})$. Assume $u\leq \mu+O((\gamma-n\mu)/\sqrt{\log n})$. The discrepancy between using a tr

Figures (9)

  • Figure 1: Probability estimation with untruncated and truncated distributions. Each $p$ approximately has the same magnitude.
  • Figure 2: Relative errors (the ratios of error to the ground truth). The compared tail distributions include generalized Pareto distributions with various tail indices (i.e. $\xi^{-1}$) and light-tailed distributions, including exponential (Exp), Gaussian (Gaus), and light-tailed Weibull (L.Weib).
  • Figure 3: Relative errors (the ratios of error to the ground truth) in simulation results with empirical distribution. The tail distributions include generalized Pareto distributions and $t$-distributions with various tail indices (i.e. $\xi^{-1}$ for Pareto and $\nu$ for Student's $t$).
  • Figure 4: Relative errors (the ratios of error to the ground truth) in simulation results with empirical distribution. The tail distributions include $t$-distributions with various choices of $\nu$, log-normal distribution (Log.N), heavy-tailed Weibull (H.Weib) distribution and light-tailed distributions, including exponential (Exp) and light-tailed Weibull (L.Weib).
  • Figure 5: Results of plain bootstrap with varying sample sizes based on 100 replications for each sample size under two settings: (1) Light-tailed case (Gaussian tail) with sample sizes ranging from $10^2$ to $10^4$ and (2) Heavy-tailed case (Student's $t$ tails, $\nu=4$) with sample sizes ranging from $10^4$ to $10^7$. (a) Coverages of the 95% bootstrap CIs. (b) Distributions of CI width/target probability.
  • ...and 4 more figures

Theorems & Definitions (21)

  • Theorem 1: Unreliable approximation of $p(F)$ for heavy-tailed distributions
  • Theorem 2: Reliable approximation of $p(F)$ for heavy-tailed distributions
  • Lemma 1: Exact asymptotics for truncated distributions
  • Theorem 3: Reliable approximation of $p(F)$ for light-tailed distributions
  • Example 1
  • Example 2
  • Example 3
  • Example 4
  • Theorem 4: Unreliable estimation of $\tilde{p}(F)$ for the heavy-tailed distribution
  • Theorem 5: Reliable estimation of $\tilde{p}(F)$ for the heavy-tailed distribution
  • ...and 11 more