Table of Contents
Fetching ...

Modelling heavy tail data with bayesian nonparametric mixtures

Luis E. Nieto-Barajas

TL;DR

This paper tackles the challenge of modeling heavy-tailed data without discarding information from the bulk by proposing a Bayesian nonparametric mixture based on a four-parameter shifted gamma-gamma kernel $SGG(\mu,\gamma,\alpha,\beta)$ with a normalised stable process prior $G\sim NS(\nu,G_0)$. The authors develop an adaptive Metropolis-Hastings scheme within a marginal sampler to infer the posterior distribution over the latent cluster structure and kernel parameters, allowing some clusters to describe the body while others capture the tail through the tail parameter $\alpha$. They demonstrate the method on simulated data and real datasets (insurance claims and English population sizes), showing accurate recovery of bulk/tail components, informative posterior distributions for $\alpha$, and improved predictive densities over single-kernel models. The approach offers a flexible, threshold-free framework for extreme-value analysis with potential extensions to covariates and multivariate tails, providing practical tools for risk assessment and inference in domains with heavy-tailed phenomena.

Abstract

In the study of heavy tail data, several models have been introduced. If the interest is in the tail of the distribution, block maxima or excess over thresholds are the typical approaches, wasting relevant information in the bulk of the data. To avoid this, two building block mixture models for the body (below the threshold) and the tail (above the threshold) are proposed. In this paper, we exploit the richness of nonparametric mixture models to model heavy tail data. We specifically consider mixtures of shifted gamma-gamma distributions with four parameters and a normalised stable processes as a mixing distribution. One of these parameters is associated with the tail. By studying the posterior distribution of the tail parameter, we are able to estimate the proportion of the data that supports a heavy tail component. We develop an efficient MCMC method with adapting Metropolis-Hastings steps to obtain posterior inference and illustrate with simulated and real datasets.

Modelling heavy tail data with bayesian nonparametric mixtures

TL;DR

This paper tackles the challenge of modeling heavy-tailed data without discarding information from the bulk by proposing a Bayesian nonparametric mixture based on a four-parameter shifted gamma-gamma kernel with a normalised stable process prior . The authors develop an adaptive Metropolis-Hastings scheme within a marginal sampler to infer the posterior distribution over the latent cluster structure and kernel parameters, allowing some clusters to describe the body while others capture the tail through the tail parameter . They demonstrate the method on simulated data and real datasets (insurance claims and English population sizes), showing accurate recovery of bulk/tail components, informative posterior distributions for , and improved predictive densities over single-kernel models. The approach offers a flexible, threshold-free framework for extreme-value analysis with potential extensions to covariates and multivariate tails, providing practical tools for risk assessment and inference in domains with heavy-tailed phenomena.

Abstract

In the study of heavy tail data, several models have been introduced. If the interest is in the tail of the distribution, block maxima or excess over thresholds are the typical approaches, wasting relevant information in the bulk of the data. To avoid this, two building block mixture models for the body (below the threshold) and the tail (above the threshold) are proposed. In this paper, we exploit the richness of nonparametric mixture models to model heavy tail data. We specifically consider mixtures of shifted gamma-gamma distributions with four parameters and a normalised stable processes as a mixing distribution. One of these parameters is associated with the tail. By studying the posterior distribution of the tail parameter, we are able to estimate the proportion of the data that supports a heavy tail component. We develop an efficient MCMC method with adapting Metropolis-Hastings steps to obtain posterior inference and illustrate with simulated and real datasets.
Paper Structure (9 sections, 14 equations, 9 figures, 3 tables)

This paper contains 9 sections, 14 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: SGG densities for varying parameters. $\mu\in\{0,3,6\}$ (top left); $\gamma\in\{0.5,1,3\}$ (top right); $\alpha\in\{0.5,1.3\}$ (bottom left); $\beta\in\{0.5,1,3\}$ (bottom right). Darker colours mean larger parameter values.
  • Figure 2: Simulated data. Average acceptance rates in batches of size $b=50$ iterations for $\mu_j^*$ (left) and $\alpha_j^*$ (right), $j=1,\ldots,m$, when we take $\nu\sim\hbox{Be}(1/2,1/2)$. Target rate limits are shown as dotted horizontal lines.
  • Figure 3: Simulated data. Posterior distribution of the number of groups $m$. Fixed $\nu=0.05$ (left) and prior $\nu\sim\hbox{Be}(1/2,1/2)$ (right).
  • Figure 4: Simulated data. Histogram of aggregated posterior samples from $\mu_i$ (left) and $\alpha_i$ (right), $i=1,\ldots,n$.
  • Figure 5: Simulated data and fit. Histogram of data (background), true density (dashed line), posterior density estimate (solid line), 95% posterior CI (dotted lines).
  • ...and 4 more figures