Censored extreme value estimation

Martin Bladt; Igor Rodionov

Censored extreme value estimation

Martin Bladt, Igor Rodionov

TL;DR

The paper develops a unified framework for censored extreme value analysis by marrying Kaplan--Meier survival methods with extreme value theory through extreme Kaplan--Meier integrals (EKMI). It provides a central decomposition that expresses EKMI as sums of (conditionally) i.i.d. terms plus a vanishing remainder, yielding consistency and asymptotic normality under regular variation, and it extends residual-based methods to all max-domains of attraction. By introducing generalized EKMI and generalized residual estimators, the authors derive censored versions of Hill-type and moment estimators for the extreme value index, including bias and second-order considerations, with extensive finite-sample validation via simulations and a brain cancer dataset. The work enables tail inference and tail-quantile estimation under random censoring, offering robust tools across Fréchet, Gumbel, and Weibull domains and opening paths to broader tail-characteristic estimation under censoring. Overall, the methodology enhances tail inference in censored data and provides practical, threshold-robust procedures for real-data applications.

Abstract

A novel and comprehensive methodology designed to tackle the challenges posed by extreme values in the context of random censorship is introduced. The main focus is on the analysis of integrals based on the product-limit estimator of normalized upper order statistics, called extreme Kaplan--Meier integrals. These integrals allow for the transparent derivation of various important asymptotic distributional properties, offering an alternative approach to conventional plug-in estimation methods. Notably, this methodology demonstrates robustness and wide applicability among various tail regimes. A noteworthy by-product is the extension of generalized Hill-type estimators of extremes to encompass arbitrary tail behavior, which is of independent interest. The theoretical framework is applied to construct novel estimators for real-valued extreme value indices for right-censored data. Simulation studies confirm the asymptotic results and, in a competitor case, mostly show superiority in mean square error. An application to brain cancer data demonstrates that censoring effects are properly accounted for, even when focusing solely on tail classification.

Censored extreme value estimation

TL;DR

Abstract

Paper Structure (18 sections, 30 theorems, 290 equations, 4 figures)

This paper contains 18 sections, 30 theorems, 290 equations, 4 figures.

Introduction
Preliminaries and setting
Random censoring
Extreme value theory
Extreme Kaplan--Meier Integrals
Generalized residual estimators for non-censored data
Generalized extreme Kaplan--Meier integrals in all max-domains of attraction
Extreme value index estimation
Finite-sample behavior
Simulation study for $\gamma_F>0$
Simulation study for $\gamma_F\in \amsmathbb{R}$
Finite versus asymptotic behavior for $\gamma_F\in\amsmathbb{R}$
Brain cancer dataset
Applications of generalized residual estimators
Proofs of Section \ref{['sec:ekmi']}
...and 3 more sections

Key Result

Theorem 2.1

Let $X$ be a random variable with distribution function $F$. Then the only possible non-degenerate limit of $\amsmathbb{P}(X - u \leq a(u) x +b(u) \mid X > u)$ with measurable functions $a(u)>0$ and $b(u)$ is $G_\gamma$.

Figures (4)

Figure 1: Mean Square Error (MSE) of the estimators $\widehat{m}^r_{k}$ (solid), $\widehat{m}^{KM,r}_{k}$ (dashed), and $\;\widehat{m}^{B,r}_{k}$ (dotted) as a function of $k$, for $n=1000$ and across $1000$ simulations. We consider $k\in\{5,\dots,n/2\}$ and $r=1,2,3$ in black, red and blue, respectively.
Figure 2: Top panels: Mean Square Error (MSE) of the estimators $\mathbb{g}_{k,n}$ (solid) and $\widehat{\gamma}^M_{k}$ (dashed) as a function of $k/n$, for $n=10^3,\,10^4$ (black and purple, respectively) and across $1000$ simulations. Bottom panels: corresponding miss-classification rates.
Figure 3: Top panels: difference between empirical and desired coverage probabilities of the estimator $\mathbb{g}_{k,n}$, for three distributions in different max-domains of attraction, sample sizes $n=10^3,\,10^4$ (solid and dashed, respectively), as a function of $k/n$, and across $1000$ simulations. Bottom panels: corresponding difference between finite-sample and asymptotic standard deviations.
Figure 4: Brain cancer dataset. Left panel: estimators $\mathbb{g}_{k,n}$ (solid) and $\widehat{\gamma}^M_{k}$ (dashed) as a function of $k$, where the sample size is $n=1342$. For reference, we also include the usual moment estimator (dotted), given by either of the two previous estimators when setting all censoring indicators to one. Right panel: proportion of non-censored observations given by $(1/k)\sum_{i=1}^k \delta_{[n-i+1:n]}$.

Theorems & Definitions (63)

Theorem 2.1: Pickands--Balkema--de Haan Theorem
Definition 1
Theorem 3.1: Key decomposition
Theorem 3.2: Weak consistency of EKM integrals
Remark 3.3
Theorem 3.4
Remark 3.5
Theorem 3.6
Corollary 3.7
Remark 3.8
...and 53 more

Censored extreme value estimation

TL;DR

Abstract

Censored extreme value estimation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (63)