Table of Contents
Fetching ...

\texttt{BayesBreak}: Generalized Hierarchical Bayesian Segmentation with Irregular Designs, Multi-Sample Hierarchies, and Grouped/Latent-Group Designs

Omid Shams Solari

Abstract

Bayesian change-point and segmentation models provide uncertainty-aware piecewise-constant representations of ordered data, but exact inference is often tied to narrow likelihood classes, single-sequence settings, or index-uniform designs. We present \texttt{BayesBreak}, a modular offline Bayesian segmentation framework built around a simple separation: each candidate block contributes a marginal likelihood and any required moment numerators, and a global dynamic program combines those block scores into posterior quantities over segment counts, boundary locations, and latent signals. For weighted exponential-family likelihoods with conjugate priors, block evidences and posterior moments are available in closed form from cumulative sufficient statistics, yielding exact sum-product inference for $P(y\mid k)$, $P(k\mid y)$, boundary marginals, and Bayes regression curves. We also distinguish these quantities from the \emph{joint} MAP segmentation, which is recovered by a separate max-sum backtracking recursion.

\texttt{BayesBreak}: Generalized Hierarchical Bayesian Segmentation with Irregular Designs, Multi-Sample Hierarchies, and Grouped/Latent-Group Designs

Abstract

Bayesian change-point and segmentation models provide uncertainty-aware piecewise-constant representations of ordered data, but exact inference is often tied to narrow likelihood classes, single-sequence settings, or index-uniform designs. We present \texttt{BayesBreak}, a modular offline Bayesian segmentation framework built around a simple separation: each candidate block contributes a marginal likelihood and any required moment numerators, and a global dynamic program combines those block scores into posterior quantities over segment counts, boundary locations, and latent signals. For weighted exponential-family likelihoods with conjugate priors, block evidences and posterior moments are available in closed form from cumulative sufficient statistics, yielding exact sum-product inference for , , boundary marginals, and Bayes regression curves. We also distinguish these quantities from the \emph{joint} MAP segmentation, which is recovered by a separate max-sum backtracking recursion.
Paper Structure (132 sections, 17 theorems, 110 equations, 5 figures, 5 tables, 17 algorithms)

This paper contains 132 sections, 17 theorems, 110 equations, 5 figures, 5 tables, 17 algorithms.

Key Result

Theorem 5.1

Fix a block $(i,j]$ and a conjugate prior $p(\theta\mid\alpha_0,\beta_0)$. Under the weighted EF model equation eq:EFweighted, the integrated single–segment evidence and the $r$th moment integral of the observation-scale EF mean $m(\theta):=\mathbb{E}[Y\mid\theta]$ are

Figures (5)

  • Figure 1: Single-sequence Gaussian example. Top: marginal posterior probability that each interior index $i\in\{1,\dots,n-1\}$ is a changepoint, with dashed vertical lines marking the true changepoints. Bottom: observed data, true latent mean, exported joint MAP segmentation, and 90% segment-level posterior intervals.
  • Figure 2: Likelihood-family showcase. The first three panels use closed-form conjugate block integrals (Gaussian, Poisson, and Binomial), while the final panel uses the one-dimensional Beta-response quadrature routine. In each case the fitted BayesBreak signal is close to the true piecewise-constant latent parameter.
  • Figure 3: Calibration of marginal boundary posterior probabilities under synthetic Gaussian data. The dashed diagonal denotes perfect calibration; the orange curve is the empirical frequency within bins of predicted probability.
  • Figure 4: Latent-group pooling on synthetic Gaussian sequences. Left: posterior responsibilities for each sequence and latent group (sequences sorted for display). Right: group-specific marginal boundary probabilities under the fitted latent templates.
  • Figure 5: Runtime scaling for the Gaussian implementation as a function of series length $n$ for two values of $k_{\max}$. Points are empirical means over repeated runs; error bars denote one standard deviation.

Theorems & Definitions (38)

  • Theorem 5.1: EF–conjugate block integration
  • proof
  • Theorem 6.1: Correctness of the DP
  • proof
  • Theorem 6.2: Time and space complexity
  • proof
  • Proposition 7.1: Homogeneous Poisson-process prior: regular-grid index-uniform as a special case
  • proof
  • Proposition 7.2: Coarse-to-fine consistency of inherited partitions
  • proof
  • ...and 28 more