Modelling heavy tail data with bayesian nonparametric mixtures
Luis E. Nieto-Barajas
TL;DR
This paper tackles the challenge of modeling heavy-tailed data without discarding information from the bulk by proposing a Bayesian nonparametric mixture based on a four-parameter shifted gamma-gamma kernel $SGG(\mu,\gamma,\alpha,\beta)$ with a normalised stable process prior $G\sim NS(\nu,G_0)$. The authors develop an adaptive Metropolis-Hastings scheme within a marginal sampler to infer the posterior distribution over the latent cluster structure and kernel parameters, allowing some clusters to describe the body while others capture the tail through the tail parameter $\alpha$. They demonstrate the method on simulated data and real datasets (insurance claims and English population sizes), showing accurate recovery of bulk/tail components, informative posterior distributions for $\alpha$, and improved predictive densities over single-kernel models. The approach offers a flexible, threshold-free framework for extreme-value analysis with potential extensions to covariates and multivariate tails, providing practical tools for risk assessment and inference in domains with heavy-tailed phenomena.
Abstract
In the study of heavy tail data, several models have been introduced. If the interest is in the tail of the distribution, block maxima or excess over thresholds are the typical approaches, wasting relevant information in the bulk of the data. To avoid this, two building block mixture models for the body (below the threshold) and the tail (above the threshold) are proposed. In this paper, we exploit the richness of nonparametric mixture models to model heavy tail data. We specifically consider mixtures of shifted gamma-gamma distributions with four parameters and a normalised stable processes as a mixing distribution. One of these parameters is associated with the tail. By studying the posterior distribution of the tail parameter, we are able to estimate the proportion of the data that supports a heavy tail component. We develop an efficient MCMC method with adapting Metropolis-Hastings steps to obtain posterior inference and illustrate with simulated and real datasets.
