Table of Contents
Fetching ...

All-pole centroids in the Wasserstein metric with applications to clustering of spectral densities

Rumeshika Pallewela, Filip Elvander

Abstract

In this work, we propose a method for computing centroids, or barycenters, in the spectral Wasserstein-2 metric for sets of power spectral densities, where the barycenters are restricted to belong to the set of all-pole spectra with a certain model order. This may be interpreted as finding an autoregressive representative for sets of second-order stationary Gaussian processes. While Wasserstein, or optimal transport, barycenters have been successfully used earlier in problems of spectral estimation and clustering, the resulting barycenters are non-parametric and the complexity of representing and storing them depends on, e.g., the choice of discretization grid. In contrast, the herein proposed method yields compact, low-dimensional, and interpretable spectral centroids that can be used in downstream tasks. Computing the all-pole centroids corresponds to solving a non-convex optimization problem in the model parameters, and we present a gradient descent scheme for addressing this. Although convergence to a globally optimal point cannot be guaranteed, the sub-optimality of the obtained centroids can be quantified. The proposed method is illustrated on a problem of phoneme classification.

All-pole centroids in the Wasserstein metric with applications to clustering of spectral densities

Abstract

In this work, we propose a method for computing centroids, or barycenters, in the spectral Wasserstein-2 metric for sets of power spectral densities, where the barycenters are restricted to belong to the set of all-pole spectra with a certain model order. This may be interpreted as finding an autoregressive representative for sets of second-order stationary Gaussian processes. While Wasserstein, or optimal transport, barycenters have been successfully used earlier in problems of spectral estimation and clustering, the resulting barycenters are non-parametric and the complexity of representing and storing them depends on, e.g., the choice of discretization grid. In contrast, the herein proposed method yields compact, low-dimensional, and interpretable spectral centroids that can be used in downstream tasks. Computing the all-pole centroids corresponds to solving a non-convex optimization problem in the model parameters, and we present a gradient descent scheme for addressing this. Although convergence to a globally optimal point cannot be guaranteed, the sub-optimality of the obtained centroids can be quantified. The proposed method is illustrated on a problem of phoneme classification.
Paper Structure (9 sections, 17 equations, 3 figures, 1 table)

This paper contains 9 sections, 17 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Phoneme /iy/: (a) two original normalized PSD samples (b) obtained using arithmetic mean, OT barycenter (OT-BC), and parametric OT barycenter (OT-P) where $P=10$.
  • Figure 2: Comparison of average entropic costs for OT-BC, YW-initialization, and OT-P ($P=2,...,10$ and $\varepsilon =0.07$)
  • Figure 3: Confusion matrices for multi-phoneme classification using IS, KL, $\ell_2$, OT-BC and OT-P.