Table of Contents
Fetching ...

Bayesian local clustering of functional data via semi-Markovian random partitions

Giovanni Toto, Antonio Canale

Abstract

We introduce a Bayesian framework for indirect local clustering of functional data, leveraging B-spline basis expansions and a novel dependent random partition model. By exploiting the local support properties of B-splines, our approach allows partially coincident functional behaviors, achieved when shared basis coefficients span sufficiently contiguous regions. This is accomplished through a cutting-edge dependent random partition model that enforces semi-Markovian dependence across a sequence of partitions. By matching the order of the B-spline basis with the semi-Markovian dependence structure, the proposed model serves as a highly flexible prior, enabling efficient modeling of localized features in functional data. Furthermore, we extend the utility of the dependent random partition model beyond functional data, demonstrating its applicability to a broad class of problems where sequences of dependent partitions are central, and standard Markovian assumptions prove overly restrictive. Empirical illustrations, including analyses of simulated data and tide level measurements from the Venice Lagoon, showcase the effectiveness and versatility of the proposed methodology.

Bayesian local clustering of functional data via semi-Markovian random partitions

Abstract

We introduce a Bayesian framework for indirect local clustering of functional data, leveraging B-spline basis expansions and a novel dependent random partition model. By exploiting the local support properties of B-splines, our approach allows partially coincident functional behaviors, achieved when shared basis coefficients span sufficiently contiguous regions. This is accomplished through a cutting-edge dependent random partition model that enforces semi-Markovian dependence across a sequence of partitions. By matching the order of the B-spline basis with the semi-Markovian dependence structure, the proposed model serves as a highly flexible prior, enabling efficient modeling of localized features in functional data. Furthermore, we extend the utility of the dependent random partition model beyond functional data, demonstrating its applicability to a broad class of problems where sequences of dependent partitions are central, and standard Markovian assumptions prove overly restrictive. Empirical illustrations, including analyses of simulated data and tide level measurements from the Venice Lagoon, showcase the effectiveness and versatility of the proposed methodology.

Paper Structure

This paper contains 21 sections, 2 theorems, 61 equations, 6 figures, 1 algorithm.

Key Result

Proposition 1

The conditional probability reported in Equation eq:rho_prior can be written as $\blacktriangleleft$$\blacktriangleleft$

Figures (6)

  • Figure 1: Illustrative example of local-global functional clustering. Gray dashed lines represent functional data. Colored thick lines represent three global clusters that collapse in certain subregions of the domain.
  • Figure 2: Boxplot of the average posterior ARI across the sequence of partitions computed on $R=50$ simulated datasets under Markovian (top) and second-order dependence (bottom). Each figure considers a different combination of number of functions $n^{(rep)}$ and order dependence: from top left clockwise, $(10,1)$, $(10,2)$, $(30,2)$ and $(30,1)$. Each figure shows the low-noise scenario on the left ($\sigma^{*2}=1$) and the high-noise one on the right ($\sigma^{*2}=4$).
  • Figure 3: Boxplot of the average posterior yARI across the sequence of functional partitions computed on $R=50$ simulated datasets. The figure on the left considers $n^{(rep)}=10$, while the one on the right $n^{(rep)}=30$. Each figure shows the low-noise scenario on the left ($\sigma^{2}=1$) on the left and the high-noise one on the right ($\sigma^{2}=4$).
  • Figure 4: Boxplot of the average posterior fARI across the sequence of functional observations (top) and posterior RMSE between functional observations and predicted counterparts (bottom) computed on $R=50$ simulated dataset under miss-specification of the B-splines. The figures on the left consider $n^{(rep)}=10$, while the ones on the right $n^{(rep)}=30$. Each figure shows the low-noise setting on the left ($\sigma^{2}=1$) on the left and the high-noise one on the right ($\sigma^{2}=4$).
  • Figure 5: Top: comparison of the observed tide levels (grey) and the corresponding predicted curves for the time interval October 19-26, 2023. Bottom: posterior distributions of the number of clusters $J_k$ at each basis $k$; the posterior probability is represented with a color scale ranging from white for low probability to red for higher probability.
  • ...and 1 more figures

Theorems & Definitions (5)

  • Definition 1
  • Proposition 1
  • Proposition 2
  • proof : Proof of Proposition \ref{['prop:rhok']}
  • proof : Proof of Proposition \ref{['prop:Rk']}