Table of Contents
Fetching ...

Dirichlet process mixture model based on topologically augmented signal representation for clustering infant vocalizations

Guillem Bonafos, Clara Bourot, Pierre Pudlo, Jean-Marc Freyermuth, Laurence Reboul, Samuel Tronçon, Arnaud Rey

TL;DR

This work uses a topologically augmented representation of the vocalizations, employing two persistence diagrams for each vocalization, and fits a non-parametric Bayesian mixture model with a Dirichlet process prior to model the number of components.

Abstract

Based on audio recordings made once a month during the first 12 months of a child's life, we propose a new method for clustering this set of vocalizations. We use a topologically augmented representation of the vocalizations, employing two persistence diagrams for each vocalization: one computed on the surface of its spectrogram and one on the Takens' embeddings of the vocalization. A synthetic persistent variable is derived for each diagram and added to the MFCCs (Mel-frequency cepstral coefficients). Using this representation, we fit a non-parametric Bayesian mixture model with a Dirichlet process prior to model the number of components. This procedure leads to a novel data-driven categorization of vocal productions. Our findings reveal the presence of 8 clusters of vocalizations, allowing us to compare their temporal distribution and acoustic profiles in the first 12 months of life.

Dirichlet process mixture model based on topologically augmented signal representation for clustering infant vocalizations

TL;DR

This work uses a topologically augmented representation of the vocalizations, employing two persistence diagrams for each vocalization, and fits a non-parametric Bayesian mixture model with a Dirichlet process prior to model the number of components.

Abstract

Based on audio recordings made once a month during the first 12 months of a child's life, we propose a new method for clustering this set of vocalizations. We use a topologically augmented representation of the vocalizations, employing two persistence diagrams for each vocalization: one computed on the surface of its spectrogram and one on the Takens' embeddings of the vocalization. A synthetic persistent variable is derived for each diagram and added to the MFCCs (Mel-frequency cepstral coefficients). Using this representation, we fit a non-parametric Bayesian mixture model with a Dirichlet process prior to model the number of components. This procedure leads to a novel data-driven categorization of vocal productions. Our findings reveal the presence of 8 clusters of vocalizations, allowing us to compare their temporal distribution and acoustic profiles in the first 12 months of life.
Paper Structure (11 sections, 1 equation, 1 figure, 3 tables)

This paper contains 11 sections, 1 equation, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Proportion of monthly production of vocalization per cluster. Parents did not record during three months, yet the gap.