A Simple HMM with Self-Supervised Representations for Phone Segmentation

Gene-Ping Yang; Hao Tang

A Simple HMM with Self-Supervised Representations for Phone Segmentation

Gene-Ping Yang, Hao Tang

TL;DR

It is shown that peak detection on Mel spectrograms is a strong baseline, better than many self-supervised approaches, and a simple hidden Markov model is proposed that uses self-supervised representations and features at the boundaries for phone segmentation.

Abstract

Despite the recent advance in self-supervised representations, unsupervised phonetic segmentation remains challenging. Most approaches focus on improving phonetic representations with self-supervised learning, with the hope that the improvement can transfer to phonetic segmentation. In this paper, contrary to recent approaches, we show that peak detection on Mel spectrograms is a strong baseline, better than many self-supervised approaches. Based on this finding, we propose a simple hidden Markov model that uses self-supervised representations and features at the boundaries for phone segmentation. Our results demonstrate consistent improvements over previous approaches, with a generalized formulation allowing versatile design adaptations.

A Simple HMM with Self-Supervised Representations for Phone Segmentation

TL;DR

Abstract

Paper Structure (12 sections, 7 equations, 2 figures, 4 tables)

This paper contains 12 sections, 7 equations, 2 figures, 4 tables.

Introduction
Boundary Features in Mel Spectrogram
Applying HMMs to unsupervised phone segmentation
HMM Formulation
Boundary Features as Transition Penalty
Related Work
Experiments
Self-supervised Features using Peak Detection
Proposed HMMs
HMM Training vs. Two-stage Decoding
HMM Phone Purity Analysis
Conclusion

Figures (2)

Figure 1: Peak detection using Mel spectrogram on the sample utterance fadg0_sx289 from TIMIT. From top to bottom: Mel spectrogram, spectral variations, and ground truth phone segments.
Figure 2: Comparison of the detected boundaries by different HMMs using HuBERT features on fadg0_si1909 from TIMIT.

A Simple HMM with Self-Supervised Representations for Phone Segmentation

TL;DR

Abstract

A Simple HMM with Self-Supervised Representations for Phone Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (2)