Speaker-Independent Acoustic-to-Articulatory Inversion through Multi-Channel Attention Discriminator

Woo-Jin Chung; Hong-Goo Kang

Speaker-Independent Acoustic-to-Articulatory Inversion through Multi-Channel Attention Discriminator

Woo-Jin Chung, Hong-Goo Kang

TL;DR

This work uses representations from a pre-trained self-supervised learning model to more effectively estimate the global, local, and kinematic pattern information in Electromagnetic Articulography signals during the AAI process, overcoming the limitations observed in conventional AAI models that rely on acoustic features derived from restricted datasets.

Abstract

We present a novel speaker-independent acoustic-to-articulatory inversion (AAI) model, overcoming the limitations observed in conventional AAI models that rely on acoustic features derived from restricted datasets. To address these challenges, we leverage representations from a pre-trained self-supervised learning (SSL) model to more effectively estimate the global, local, and kinematic pattern information in Electromagnetic Articulography (EMA) signals during the AAI process. We train our model using an adversarial approach and introduce an attention-based Multi-duration phoneme discriminator (MDPD) designed to fully capture the intricate relationship among multi-channel articulatory signals. Our method achieves a Pearson correlation coefficient of 0.847, marking state-of-the-art performance in speaker-independent AAI models. The implementation details and code can be found online.

Speaker-Independent Acoustic-to-Articulatory Inversion through Multi-Channel Attention Discriminator

TL;DR

Abstract

Speaker-Independent Acoustic-to-Articulatory Inversion through Multi-Channel Attention Discriminator

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)