Table of Contents
Fetching ...

Complexity boosted adaptive training for better low resource ASR performance

Hongxuan Lu, Shenjian Wang, Biao Li

TL;DR

The paper tackles rigid training strategies in low-resource ASR by introducing complexity-boosted adaptive training (CBA), a two-stage framework that adaptively adjusts data augmentation and intermediate CTC regularization according to sample complexity. It defines a MinMax-IBF policy to compute per-sample complexity via $x_i = \frac{L_i - L_{min}}{L_{max} - L_{min}}$ and $f_{DA} = 1 - \mathrm{IBF}(x_i)$, and couples this with batch-level regularization through $L_{total} = (1-\lambda)L_{CTC} + f_{CTC}\lambda L_{InterCTC}$ where $f_{CTC}$ aggregates complexity across the batch. The method is evaluated on AISHELL-1 and LibriSpeech 100h using a WeNet Conformer setup, achieving up to 13.4% and 14.1% relative WER reductions on LibriSpeech and 6.3% CER reduction on AISHELL-1 compared to SpecAugment baselines, with ablations confirming the contributions of the MinMax-IBF policy and the two-stage training. The approach yields clear, practical improvements without extra inference cost and has potential to generalize to other internal modules and datasets.

Abstract

During the entire training process of the ASR model, the intensity of data augmentation and the approach of calculating training loss are applied in a regulated manner based on preset parameters. For example, SpecAugment employs a predefined strength of augmentation to mask parts of the time-frequency domain spectrum. Similarly, in CTC-based multi-layer models, the loss is generally determined based on the output of the encoder's final layer during the training process. However, ignoring dynamic characteristics may suboptimally train models. To address the issue, we present a two-stage training method, known as complexity-boosted adaptive (CBA) training. It involves making dynamic adjustments to data augmentation strategies and CTC loss propagation based on the complexity of the training samples. In the first stage, we train the model with intermediate-CTC-based regularization and data augmentation without any adaptive policy. In the second stage, we propose a novel adaptive policy, called MinMax-IBF, which calculates the complexity of samples. We combine the MinMax-IBF policy to data augmentation and intermediate CTC loss regularization to continue training. The proposed CBA training approach shows considerable improvements, up to 13.4% and 14.1% relative reduction in WER on the LibriSpeech 100h test-clean and test-other dataset and also up to 6.3% relative reduction on AISHELL-1 test set, over the Conformer architecture in Wenet.

Complexity boosted adaptive training for better low resource ASR performance

TL;DR

The paper tackles rigid training strategies in low-resource ASR by introducing complexity-boosted adaptive training (CBA), a two-stage framework that adaptively adjusts data augmentation and intermediate CTC regularization according to sample complexity. It defines a MinMax-IBF policy to compute per-sample complexity via and , and couples this with batch-level regularization through where aggregates complexity across the batch. The method is evaluated on AISHELL-1 and LibriSpeech 100h using a WeNet Conformer setup, achieving up to 13.4% and 14.1% relative WER reductions on LibriSpeech and 6.3% CER reduction on AISHELL-1 compared to SpecAugment baselines, with ablations confirming the contributions of the MinMax-IBF policy and the two-stage training. The approach yields clear, practical improvements without extra inference cost and has potential to generalize to other internal modules and datasets.

Abstract

During the entire training process of the ASR model, the intensity of data augmentation and the approach of calculating training loss are applied in a regulated manner based on preset parameters. For example, SpecAugment employs a predefined strength of augmentation to mask parts of the time-frequency domain spectrum. Similarly, in CTC-based multi-layer models, the loss is generally determined based on the output of the encoder's final layer during the training process. However, ignoring dynamic characteristics may suboptimally train models. To address the issue, we present a two-stage training method, known as complexity-boosted adaptive (CBA) training. It involves making dynamic adjustments to data augmentation strategies and CTC loss propagation based on the complexity of the training samples. In the first stage, we train the model with intermediate-CTC-based regularization and data augmentation without any adaptive policy. In the second stage, we propose a novel adaptive policy, called MinMax-IBF, which calculates the complexity of samples. We combine the MinMax-IBF policy to data augmentation and intermediate CTC loss regularization to continue training. The proposed CBA training approach shows considerable improvements, up to 13.4% and 14.1% relative reduction in WER on the LibriSpeech 100h test-clean and test-other dataset and also up to 6.3% relative reduction on AISHELL-1 test set, over the Conformer architecture in Wenet.

Paper Structure

This paper contains 12 sections, 10 equations, 2 figures, 6 tables.

Figures (2)

  • Figure 1: The two stages of proposed CBA training.
  • Figure 2: The two passes for the continued training stage.