Table of Contents
Fetching ...

Dynamic Data Pruning for Automatic Speech Recognition

Qiao Xiao, Pingchuan Ma, Adriana Fernandez-Lopez, Boqian Wu, Lu Yin, Stavros Petridis, Mykola Pechenizkiy, Maja Pantic, Decebal Constantin Mocanu, Shiwei Liu

TL;DR

This paper presents the first investigation of dynamic data pruning for ASR, finding that it can reach the full-data performance by dynamically selecting 70% of data, and introduces Dynamic Data Pruning for ASR (DDP-ASR), which offers several fine-grained pruning granularities specifically tailored for speech-related datasets.

Abstract

The recent success of Automatic Speech Recognition (ASR) is largely attributed to the ever-growing amount of training data. However, this trend has made model training prohibitively costly and imposed computational demands. While data pruning has been proposed to mitigate this issue by identifying a small subset of relevant data, its application in ASR has been barely explored, and existing works often entail significant overhead to achieve meaningful results. To fill this gap, this paper presents the first investigation of dynamic data pruning for ASR, finding that we can reach the full-data performance by dynamically selecting 70% of data. Furthermore, we introduce Dynamic Data Pruning for ASR (DDP-ASR), which offers several fine-grained pruning granularities specifically tailored for speech-related datasets, going beyond the conventional pruning of entire time sequences. Our intensive experiments show that DDP-ASR can save up to 1.6x training time with negligible performance loss.

Dynamic Data Pruning for Automatic Speech Recognition

TL;DR

This paper presents the first investigation of dynamic data pruning for ASR, finding that it can reach the full-data performance by dynamically selecting 70% of data, and introduces Dynamic Data Pruning for ASR (DDP-ASR), which offers several fine-grained pruning granularities specifically tailored for speech-related datasets.

Abstract

The recent success of Automatic Speech Recognition (ASR) is largely attributed to the ever-growing amount of training data. However, this trend has made model training prohibitively costly and imposed computational demands. While data pruning has been proposed to mitigate this issue by identifying a small subset of relevant data, its application in ASR has been barely explored, and existing works often entail significant overhead to achieve meaningful results. To fill this gap, this paper presents the first investigation of dynamic data pruning for ASR, finding that we can reach the full-data performance by dynamically selecting 70% of data. Furthermore, we introduce Dynamic Data Pruning for ASR (DDP-ASR), which offers several fine-grained pruning granularities specifically tailored for speech-related datasets, going beyond the conventional pruning of entire time sequences. Our intensive experiments show that DDP-ASR can save up to 1.6x training time with negligible performance loss.

Paper Structure

This paper contains 17 sections, 1 equation, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Instance-wise pruning approaches: (a) Easy: Instances with the lowest scores are selected. (b) Hard: Instances with the highest scores are selected. (c) Easy2hard: Following the essence of curriculum learning, models initially train on relatively easy instances and progressively shift focus to more challenging ones as the training progresses.
  • Figure 2: A toy example comparing different time-wise pruning approaches: (a) Point Dropping, (b) Chunk Dropping, where signals highlighted in gray are pruned during training.
  • Figure 3: The distribution of length on Librispeech and LRS3.
  • Figure 4: Comparing instance-wise pruning strategies across three subsets of the Librispeech "test-clean" set, with instance-wise kept ratio of 50% for each method.