Table of Contents
Fetching ...

Discrete Sequential Barycenter Arrays: Representation, Approximation, and Modeling of Probability Measures

Alejandro Jara, Carlos Sing-Long

TL;DR

This paper introduces discrete sequential barycenter arrays (SBA) as a principled representation for univariate probability measures, with strong theoretical guarantees on approximation (weak and Wasserstein convergence) and exactness for finite-support distributions. Leveraging SBA, the authors develop discrete SBA (DSBA) mixture models that preserve a prescribed mean (or a mean distribution) while maintaining large support and flexible nonparametric behavior. They extend DSBA to location-scale mixtures (DSBAS) and provide full Bayesian posterior computation tools, including parsimonious and general variants, with practical updates for latent allocations and kernel parameters. Through simulations and real data (galaxy velocities, AIS data, SIMCE), DSBA models demonstrate competitive predictive performance and transparent incorporation of mean constraints, highlighting their usefulness for mean-constrained nonparametric density estimation. The work positions SBA-based representations as a rigorous, flexible framework for integrating functional constraints into nonparametric modeling, with promising directions for trans-dimensional extensions and multivariate generalizations.

Abstract

Constructing flexible probability models that respect constraints on key functionals -- such as the mean -- is a fundamental problem in nonparametric statistics. Existing approaches lack systematic tools for enforcing such constraints while retaining full modeling flexibility. This paper introduces a new representation for univariate probability measures based on discrete sequential barycenter arrays (SBA). We study structural properties of SBA representations and establish new approximation results. In particular, we show that for any target distribution, its SBA-based discrete approximations converge in both the weak topology and in Wasserstein distances, and that the representation is exact for all distributions with finite discrete support. We further characterize a broad class of measures whose SBA partitions exhibit regularity and induce increasingly fine meshes, and we prove that this class is dense in standard probabilistic topologies. These theoretical results enable the construction of probability models that preserve prescribed values -- or full distributions -- of the mean while maintaining large support. As an application, we derive a mixture model for density estimation whose induced mixing distribution has a fixed or user-specified mean. The resulting framework provides a principled mechanism for incorporating mean constraints in nonparametric modeling while preserving strong approximation properties. The approach is illustrated using both simulated and real data.

Discrete Sequential Barycenter Arrays: Representation, Approximation, and Modeling of Probability Measures

TL;DR

This paper introduces discrete sequential barycenter arrays (SBA) as a principled representation for univariate probability measures, with strong theoretical guarantees on approximation (weak and Wasserstein convergence) and exactness for finite-support distributions. Leveraging SBA, the authors develop discrete SBA (DSBA) mixture models that preserve a prescribed mean (or a mean distribution) while maintaining large support and flexible nonparametric behavior. They extend DSBA to location-scale mixtures (DSBAS) and provide full Bayesian posterior computation tools, including parsimonious and general variants, with practical updates for latent allocations and kernel parameters. Through simulations and real data (galaxy velocities, AIS data, SIMCE), DSBA models demonstrate competitive predictive performance and transparent incorporation of mean constraints, highlighting their usefulness for mean-constrained nonparametric density estimation. The work positions SBA-based representations as a rigorous, flexible framework for integrating functional constraints into nonparametric modeling, with promising directions for trans-dimensional extensions and multivariate generalizations.

Abstract

Constructing flexible probability models that respect constraints on key functionals -- such as the mean -- is a fundamental problem in nonparametric statistics. Existing approaches lack systematic tools for enforcing such constraints while retaining full modeling flexibility. This paper introduces a new representation for univariate probability measures based on discrete sequential barycenter arrays (SBA). We study structural properties of SBA representations and establish new approximation results. In particular, we show that for any target distribution, its SBA-based discrete approximations converge in both the weak topology and in Wasserstein distances, and that the representation is exact for all distributions with finite discrete support. We further characterize a broad class of measures whose SBA partitions exhibit regularity and induce increasingly fine meshes, and we prove that this class is dense in standard probabilistic topologies. These theoretical results enable the construction of probability models that preserve prescribed values -- or full distributions -- of the mean while maintaining large support. As an application, we derive a mixture model for density estimation whose induced mixing distribution has a fixed or user-specified mean. The resulting framework provides a principled mechanism for incorporating mean constraints in nonparametric modeling while preserving strong approximation properties. The approach is illustrated using both simulated and real data.

Paper Structure

This paper contains 47 sections, 30 theorems, 370 equations, 4 figures.

Key Result

Lemma 1

Let $G\in \mathcal{P}_1(\Theta)$ be such that $\operatorname{supp}(G)$ is a non-degenerate interval. Then, $G$ has a regular SBA. Furthermore, for every $n \in\mathbb{N}$ the collection of intervals $\{\bar{\Theta}_{n,l}\}_{l= 1}^{2^{n}}$ defined in eq:regSBA:partitionOfSupport are non-degenerate an

Figures (4)

  • Figure 1: Galaxy data: Posterior mean and $95\%$ pointwise credible bands for the density. Panels (a), (d), and (g) correspond to the parsimonious DSBAS model with $n = 4$, $5$, and $6$, respectively. Panels (b), (e), and (h) correspond to the general DSBAS model with $n = 4$, $5$, and $6$, respectively. Panels (c), (f), and (i) correspond to the parametric approximation to the DPM model with $L = 8$, $16$, and $32$, respectively.
  • Figure 2: Simulated data: Posterior mean (solid line) and $95\%$ point-wise HPD credible band for the error density. The true density is shown as a dotted line. Panel (a) and (b) correspond to the parsimonious and general DSBAS model, respectively.
  • Figure 3: Australian data: Posterior mean (solid line) and $95\%$ point-wise HPD credible band for the error density. Panel (a) and (b) correspond to the parsimonious and general DSBAS model, respectively.
  • Figure 4: SIMCE data. Posterior mean (solid line) and $95\%$ point-wise HPD credible band for the abilities density under the parsimonious (panel a) and general (panel b) DSBAS models.

Theorems & Definitions (36)

  • Definition 1
  • Definition 2
  • Definition 3
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • ...and 26 more