Table of Contents
Fetching ...

Bayesian repulsive mixture model for multivariate functional data

Ricardo Cunha Pedroso, Fernando Andrés Quintana, Rosangela Helena Loschi

TL;DR

A repulsive mixture model to cluster observation units represented by multivariate functional data, based on similarity of curve shapes and individual-specific covariates, favors the identification of well-differentiated clusters, avoiding the presence of redundant ones.

Abstract

We introduce a repulsive mixture model to cluster observation units represented by multivariate functional data, based on similarity of curve shapes and individual-specific covariates. We propose a repulsive prior distribution for the component-specific location parameters that depends on a B-spline curve-tailored distance, extending existent repulsive priors to the context of multivariate functional data. The proposed model favors the identification of well-differentiated clusters, avoiding the presence of redundant ones. To sample from the posterior distribution, we propose an MCMC algorithm that includes a novel split-merge step that significantly improves the chain mixing. Different features of the proposed model, including the effects of repulsion and covariates in the clustering, are evaluated through simulation. The proposed model is fitted to analyze Chronic Ankle Instability (CAI) data, focusing on identifing individuals with similar types of physical dysfunctions based on the similarity of movement patterns.

Bayesian repulsive mixture model for multivariate functional data

TL;DR

A repulsive mixture model to cluster observation units represented by multivariate functional data, based on similarity of curve shapes and individual-specific covariates, favors the identification of well-differentiated clusters, avoiding the presence of redundant ones.

Abstract

We introduce a repulsive mixture model to cluster observation units represented by multivariate functional data, based on similarity of curve shapes and individual-specific covariates. We propose a repulsive prior distribution for the component-specific location parameters that depends on a B-spline curve-tailored distance, extending existent repulsive priors to the context of multivariate functional data. The proposed model favors the identification of well-differentiated clusters, avoiding the presence of redundant ones. To sample from the posterior distribution, we propose an MCMC algorithm that includes a novel split-merge step that significantly improves the chain mixing. Different features of the proposed model, including the effects of repulsion and covariates in the clustering, are evaluated through simulation. The proposed model is fitted to analyze Chronic Ankle Instability (CAI) data, focusing on identifing individuals with similar types of physical dysfunctions based on the similarity of movement patterns.
Paper Structure (18 sections, 2 theorems, 38 equations, 15 figures, 3 tables)

This paper contains 18 sections, 2 theorems, 38 equations, 15 figures, 3 tables.

Key Result

Theorem A.1

(Theorem 1 of polson2013) Let $f(\omega)$ denote the density of the random variable $\omega\sim\textup{PG}(b,0), b>0$. Then the following integral identity holds for all $a\in\mathbb{R}\!:$ where $k=a-b/2$.

Figures (15)

  • Figure 1: Cluster-specific mean curves of each dimension $d=1,\dots,4$. Each cluster is identified by curves with the same line type and color across the four dimensions.
  • Figure 2: Box-plots of the logarithm of the average of the pairwise distances between the component-specific mean curves in each $d$, by fitting the MFRMMx model for the 20 simulated data sets, with $p=10$, $A=10$ and the $\phi$ values specified in the horizontal axes.
  • Figure 3: Box-plots of the logarithm of the mean rand-index by fitting the MFRMMx model assuming $p=4,7,10$, $A=5,10$ and the $\phi$ values specified in the horizontal axes.
  • Figure 4: Average co-clustering matrices estimated over the 20 simulated data sets by the MFPPMx and MFRMMx models, assuming $p=10$, $A=10$ and a few different values of $\phi$. The true clusters are identified by the red boxes. The three values bellow each co-clustering matrix are the minimum, mean and maximum values of the Rand-index computed over the simulated data sets.
  • Figure 5: Average co-clustering matrices computed over the co-clustering matrices estimated by the dependent MFRMMx for each simulated data set, with $p=4$ and $A=10$ and $\phi=50,75,100$. The true clusters are identified by the red boxes. The three values bellow each co-clustering matrix are the minimum, mean and maximum values of the Rand-index computed over the simulated data sets.
  • ...and 10 more figures

Theorems & Definitions (3)

  • Theorem A.1
  • Proposition B.1
  • proof