Table of Contents
Fetching ...

Heavy-Tailed NGG Mixture Models

Vianey Palacios Ramirez, Miguel de Carvalho, Luis Gutierrez Inostroza

TL;DR

The paper characterizes the tails of the normalized generalized gamma (NGG) process, showing that, unlike the Dirichlet process, NGG tails closely track the centering distribution's heaviness when the centering is heavy-tailed. Building on this, it develops two classes of heavy-tailed NGG mixtures—scale mixtures and shape mixtures—with multivariate and predictor-dependent extensions, and derives precise tail-index results: for NGG over the non-Dirichlet class, scale mixtures yield a tail index $(F)=_0/$, while shape mixtures align with the kernel’s tail properties. The work further extends to multivariate settings and covariate-directed densities via a dependent stable process, proving that heavy-tailed behavior is preserved across marginals and conditional joint densities. Through simulations and a neuroscience application to EEG brain data, the authors show that NGG-$ ext{N}$ scale mixtures outperform DP mixtures in tail and bulk performance, with practical utility for modeling extreme events and covariate effects in high-dimensional heavy-tailed data.

Abstract

Heavy tails are often found in practice, and yet they are an Achilles heel of a variety of mainstream random probability measures such as the Dirichlet process (DP). The first contribution of this paper focuses on characterizing the tails of the so-called normalized generalized gamma (NGG) process. We show that the right tail of an NGG process is heavy-tailed provided that the centering distribution is itself heavy-tailed; the DP is the only member of the NGG class that fails to obey this convenient property. A second contribution of the paper rests on the development of two classes of heavy-tailed mixture models and the assessment of their relative merits. Multivariate extensions of the proposed heavy-tailed mixtures are devised here, along with a predictor-dependent version, to learn about the effect of covariates on a multivariate heavy-tailed response. The simulation study suggests that the proposed method performs well in various scenarios, and we showcase the application of the proposed methods in a neuroscience dataset.

Heavy-Tailed NGG Mixture Models

TL;DR

The paper characterizes the tails of the normalized generalized gamma (NGG) process, showing that, unlike the Dirichlet process, NGG tails closely track the centering distribution's heaviness when the centering is heavy-tailed. Building on this, it develops two classes of heavy-tailed NGG mixtures—scale mixtures and shape mixtures—with multivariate and predictor-dependent extensions, and derives precise tail-index results: for NGG over the non-Dirichlet class, scale mixtures yield a tail index , while shape mixtures align with the kernel’s tail properties. The work further extends to multivariate settings and covariate-directed densities via a dependent stable process, proving that heavy-tailed behavior is preserved across marginals and conditional joint densities. Through simulations and a neuroscience application to EEG brain data, the authors show that NGG- scale mixtures outperform DP mixtures in tail and bulk performance, with practical utility for modeling extreme events and covariate effects in high-dimensional heavy-tailed data.

Abstract

Heavy tails are often found in practice, and yet they are an Achilles heel of a variety of mainstream random probability measures such as the Dirichlet process (DP). The first contribution of this paper focuses on characterizing the tails of the so-called normalized generalized gamma (NGG) process. We show that the right tail of an NGG process is heavy-tailed provided that the centering distribution is itself heavy-tailed; the DP is the only member of the NGG class that fails to obey this convenient property. A second contribution of the paper rests on the development of two classes of heavy-tailed mixture models and the assessment of their relative merits. Multivariate extensions of the proposed heavy-tailed mixtures are devised here, along with a predictor-dependent version, to learn about the effect of covariates on a multivariate heavy-tailed response. The simulation study suggests that the proposed method performs well in various scenarios, and we showcase the application of the proposed methods in a neuroscience dataset.
Paper Structure (17 sections, 8 theorems, 47 equations, 5 figures, 2 tables)

This paper contains 17 sections, 8 theorems, 47 equations, 5 figures, 2 tables.

Key Result

Theorem 1

Let $G(y)$ be the distribution of a $\text{NGG}(M, \tau, D, G_0)$ process with $(M, \tau, D) \in \mathcal{N}$ and with non-atomic $G_0$. Then, with $g_r(t) = \exp\{-r\log|\log t|/t\}$ and $h_r(t) = \exp\{- 1 / (t |\log t|^{r})\}$, for $0 < t < 1$.

Figures (5)

  • Figure 1: Asymptotic envelopes for Example \ref{['paretoex']}. Left: Asymptotic envelopes that follow from Theorem \ref{['tailNGG']} along with random trajectories of the log survival functions from a stable process $\text{NGG}(1, 0, 0.5, G_0)$. Right: The same envelopes for log survival function of a stable process $\text{NGG}(1, 0, 0.5, G_0)$ (solid) against those of a Dirichlet process $\text{NGG}(1, 1, 0, G_0)$ (dashed); $G_0$ is the standard unit Pareto distribution.
  • Figure 2: Mean of the Monte Carlo fits (dashed line) for the log-survival function for the univariate scenario obtained using a particular NGG-$\mathcal{N}$ mixture model (i.e., stable process scale mixture) from Section \ref{['mixture']}, focusing from the $95\%$ to the $99\%$ quantile, plotted against the true (solid line). The dotted line shows the Monte Carlo mean of the fits from an NGG-$\mathcal{D}$ mixture with the same Erlang kernel, whereas the dashed--dotted line shows the Monte Carlo mean of the fits from an NGG-$\mathcal{D}$ mixture with a Pareto kernel and a gamma centering distribution.
  • Figure 3: Monte Carlo Simulation for the Bivariate Scenarios 1--3: Contours of the joint density estimates (gray) obtained with proposed stable process scale mixture model from Section \ref{['multivariate']}, for the 100 simulated data sets, plotted against the true (black).
  • Figure 4: Spectral power ($\mu V^2$) of alpha and beta brainwave data plotted against time in seconds ($s$).
  • Figure 6: Contours of the posterior conditional joint density estimate of alpha and beta power for each specific stimulus along with raw data; the fit was obtained using the stable process scale mixture from Section \ref{['conditional']}.

Theorems & Definitions (13)

  • Theorem 1: Tails of $\text{NGG}$ in $\mathcal{D}$
  • proof
  • Theorem 2: Tails of $\text{NGG}$ in $\mathcal{N}$
  • Corollary 3: Stability of the heavy-tail property in $\mathcal{N}$
  • Example 1: Pareto centering distribution: $\mathcal{D}$ versus $\mathcal{N}$
  • Theorem 4: Heavy-tailed NGG-mixtures
  • Example 2: NGG scale mixtures with an Erlang kernel
  • Example 3: NGG shape mixtures with a Pareto-type kernel
  • Theorem 5: Multivariate heavy-tailed NGG-mixtures
  • Remark 1
  • ...and 3 more