Table of Contents
Fetching ...

Group-Aware Partial Model Merging for Children's Automatic Speech Recognition

Thomas Rolland, Alberto Abad

Abstract

While supervised fine-tuning of adult pre-trained models for children's ASR has shown promise, it often fails to capture group-specific characteristics and variations among children. To address this, we introduce GRoup-Aware PARtial model Merging, a parameter-efficient approach that combines unsupervised clustering, partial fine-tuning, and model merging. Our approach adapts adult-pre-trained models to children by first grouping the children's data based on acoustic similarity. Each group is used to partially fine-tune an adult pre-trained model, and the resulting models are merged at the parameter level. Experiments conducted on the MyST children's speech corpus indicate that GRAPAM achieves a relative WER improvement of 6%, using the same amount of data, outperforming full fine-tuning while training fewer parameters.

Group-Aware Partial Model Merging for Children's Automatic Speech Recognition

Abstract

While supervised fine-tuning of adult pre-trained models for children's ASR has shown promise, it often fails to capture group-specific characteristics and variations among children. To address this, we introduce GRoup-Aware PARtial model Merging, a parameter-efficient approach that combines unsupervised clustering, partial fine-tuning, and model merging. Our approach adapts adult-pre-trained models to children by first grouping the children's data based on acoustic similarity. Each group is used to partially fine-tune an adult pre-trained model, and the resulting models are merged at the parameter level. Experiments conducted on the MyST children's speech corpus indicate that GRAPAM achieves a relative WER improvement of 6%, using the same amount of data, outperforming full fine-tuning while training fewer parameters.

Paper Structure

This paper contains 16 sections, 6 equations, 1 figure, 5 tables.

Figures (1)

  • Figure 1: Overview of the four stages of the GRAPAM pipeline. With $\mathcal{D}$ the full dataset and $\theta_0$ the pre-trained adult model parameters. The fire icon denotes stages with ASR fine-tuning.