Heavy-Tailed NGG Mixture Models
Vianey Palacios Ramirez, Miguel de Carvalho, Luis Gutierrez Inostroza
TL;DR
The paper characterizes the tails of the normalized generalized gamma (NGG) process, showing that, unlike the Dirichlet process, NGG tails closely track the centering distribution's heaviness when the centering is heavy-tailed. Building on this, it develops two classes of heavy-tailed NGG mixtures—scale mixtures and shape mixtures—with multivariate and predictor-dependent extensions, and derives precise tail-index results: for NGG over the non-Dirichlet class, scale mixtures yield a tail index $(F)=_0/$, while shape mixtures align with the kernel’s tail properties. The work further extends to multivariate settings and covariate-directed densities via a dependent stable process, proving that heavy-tailed behavior is preserved across marginals and conditional joint densities. Through simulations and a neuroscience application to EEG brain data, the authors show that NGG-$ ext{N}$ scale mixtures outperform DP mixtures in tail and bulk performance, with practical utility for modeling extreme events and covariate effects in high-dimensional heavy-tailed data.
Abstract
Heavy tails are often found in practice, and yet they are an Achilles heel of a variety of mainstream random probability measures such as the Dirichlet process (DP). The first contribution of this paper focuses on characterizing the tails of the so-called normalized generalized gamma (NGG) process. We show that the right tail of an NGG process is heavy-tailed provided that the centering distribution is itself heavy-tailed; the DP is the only member of the NGG class that fails to obey this convenient property. A second contribution of the paper rests on the development of two classes of heavy-tailed mixture models and the assessment of their relative merits. Multivariate extensions of the proposed heavy-tailed mixtures are devised here, along with a predictor-dependent version, to learn about the effect of covariates on a multivariate heavy-tailed response. The simulation study suggests that the proposed method performs well in various scenarios, and we showcase the application of the proposed methods in a neuroscience dataset.
