Table of Contents
Fetching ...

A Flexible Modeling of Extremes in the Presence of Inliers

Shivshankar Nila, Ishapathik Das, N. Balakrishna

TL;DR

This work tackles extreme-value analysis for data with a mass at zero and a nontrivial tail by introducing FEVM IMM, a three-component mixture that simultaneously models inliers at zero, a bulk below a threshold, and a GPD tail above the threshold with the tail fraction as a parameter. It develops a complete-likelihood ML framework, derives the asymptotic distribution of the estimators, and provides explicit score functions for cases with nonzero and zero shape parameter $\xi$, while tying the threshold $u$ into the estimation process. Through extensive simulations and real-data applications, FEVM IMM yields reduced bias and MSE in key extreme-value parameters, improves threshold and tail estimation, and delivers better goodness-of-fit and risk measures compared with EVMM and FEVMM. The framework has practical impact for reliability, environmental, and epidemiological risk assessment and opens avenues for extensions such as multimodal bulk components, change-point models, and Bayesian estimation.

Abstract

Many random phenomena, including life-testing and environmental data, show positive values and excess zeros, which pose modeling challenges. In life testing, immediate failures result in zero lifetimes, often due to defects or poor quality, especially in electronics and clinical trials. These failures, called inliers at zero, are difficult to model using standard approaches. The presence and proportion of inliers may influence the accuracy of extreme value analysis, bias parameter estimates, or even lead to severe events or extreme effects, such as drought or crop failure. In such scenarios, a key issue in extreme value analysis is determining a suitable threshold to capture tail behaviour accurately. Although some extreme value mixture models address threshold and tail estimation, they often inadequately handle inliers, resulting in suboptimal results. Bulk model misspecification can affect the threshold, extreme value estimates, and, in particular, the tail proportion. There is no unified framework for defining extreme value mixture models, especially the tail proportion. This paper proposes a flexible model that handles extremes, inliers, and the tail proportion. Parameters are estimated using maximum likelihood estimation. Compared the proposed model estimates with the classical mean excess plot, parameter stability plot, and Pickands plot estimates. Theoretical results are established, and the proposed model outperforms traditional methods in both simulation studies and real data analysis.

A Flexible Modeling of Extremes in the Presence of Inliers

TL;DR

This work tackles extreme-value analysis for data with a mass at zero and a nontrivial tail by introducing FEVM IMM, a three-component mixture that simultaneously models inliers at zero, a bulk below a threshold, and a GPD tail above the threshold with the tail fraction as a parameter. It develops a complete-likelihood ML framework, derives the asymptotic distribution of the estimators, and provides explicit score functions for cases with nonzero and zero shape parameter , while tying the threshold into the estimation process. Through extensive simulations and real-data applications, FEVM IMM yields reduced bias and MSE in key extreme-value parameters, improves threshold and tail estimation, and delivers better goodness-of-fit and risk measures compared with EVMM and FEVMM. The framework has practical impact for reliability, environmental, and epidemiological risk assessment and opens avenues for extensions such as multimodal bulk components, change-point models, and Bayesian estimation.

Abstract

Many random phenomena, including life-testing and environmental data, show positive values and excess zeros, which pose modeling challenges. In life testing, immediate failures result in zero lifetimes, often due to defects or poor quality, especially in electronics and clinical trials. These failures, called inliers at zero, are difficult to model using standard approaches. The presence and proportion of inliers may influence the accuracy of extreme value analysis, bias parameter estimates, or even lead to severe events or extreme effects, such as drought or crop failure. In such scenarios, a key issue in extreme value analysis is determining a suitable threshold to capture tail behaviour accurately. Although some extreme value mixture models address threshold and tail estimation, they often inadequately handle inliers, resulting in suboptimal results. Bulk model misspecification can affect the threshold, extreme value estimates, and, in particular, the tail proportion. There is no unified framework for defining extreme value mixture models, especially the tail proportion. This paper proposes a flexible model that handles extremes, inliers, and the tail proportion. Parameters are estimated using maximum likelihood estimation. Compared the proposed model estimates with the classical mean excess plot, parameter stability plot, and Pickands plot estimates. Theoretical results are established, and the proposed model outperforms traditional methods in both simulation studies and real data analysis.
Paper Structure (19 sections, 6 theorems, 53 equations, 8 figures, 14 tables, 1 algorithm)

This paper contains 19 sections, 6 theorems, 53 equations, 8 figures, 14 tables, 1 algorithm.

Key Result

Proposition 1

The density function evmmlpdf1 of the FEVIMM is continuous at the threshold $u$, i.e., $f(u^{-}) = f(u^{+})$, if and only if This condition ensures that the transition between the bulk and the GPD tail is continuous at $u$.

Figures (8)

  • Figure 1: Comparison of MEP, PSP, and PP based on the full dataset including inliers ($x$) and the reduced dataset excluding inliers ($x[x > 0]$). The plots show how the presence of inliers may influence the estimate of the threshold ($u$), subsequently affecting the estimates of the scale ($\sigma$) and shape ($\xi$) parameters of the GPD.
  • Figure 2: Bias and MSE comparison plots under Scenario 1: $\phi_1 = 0.4$, $\phi_2 = 0.15$, $\eta = 1$, $\beta = 5$, $u = 11.5129$, $\xi = 0.2$, $\sigma = 5$.
  • Figure 3: Bias and MSE comparison plots under Scenario 2: $\phi_1 = 0.4$, $\phi_2 = 0.10$, $\eta = 4$, $\beta = 1$, $u = 6.6807$, $\xi = 0.2$, $\sigma = 4$.
  • Figure 4: Bias and MSE comparison plots under Scenario 3: $\phi_1 = 0.4$, $\phi_2 = 0.10$, $\eta = 4$, $\beta = 1$, $u = 6.6808$, $\xi = -0.2$, $\sigma = 4$.
  • Figure 5: Sensitivity of FEVIMM parameter estimates (Bias) to inlier proportion ($\phi_1$).
  • ...and 3 more figures

Theorems & Definitions (12)

  • Proposition 1: Continuity at the threshold $u$
  • Proposition 2: Differentiability at the threshold $u$
  • Proposition 3: Quantile function
  • Proposition 4: Risk measures: Value-at-Risk and Tail-Value-at-Risk
  • Remark 1
  • Remark 2: Threshold assumption
  • Remark 3: Regularity conditions for the GPD shape parameter ($\xi$)
  • Theorem 1: Asymptotic normality of the MLE (for known threshold $u_0 > 0$)
  • Remark 4: Asymptotic confidence intervals for the MLE
  • Theorem 2: Asymptotic normality of the MLE (for known threshold $u_0 > 0$, $\xi=0$)
  • ...and 2 more