DAISY: Data Adaptive Self-Supervised Early Exit for Speech Representation Models

Tzu-Quan Lin; Hung-yi Lee; Hao Tang

DAISY: Data Adaptive Self-Supervised Early Exit for Speech Representation Models

Tzu-Quan Lin, Hung-yi Lee, Hao Tang

TL;DR

Data Adaptive Self-Supervised Early Exit (DAISY), an approach that decides when to exit based on the self-supervised loss, eliminating the need for multiple round of training and fine-tuning of the entire pretrained model is introduced.

Abstract

Self-supervised speech models have shown to be useful for various tasks, but their large size limits the use in devices with low computing power and memory. In this work, we explore early exit, an approach for reducing latency by exiting the forward process of a network early. Most approaches of early exit need a separate early exit model for each task, with some even requiring fine-tuning of the entire pretrained model. We introduce Data Adaptive Self-Supervised Early Exit (DAISY), an approach that decides when to exit based on the self-supervised loss, eliminating the need for multiple round of training and fine-tuning. DAISY matches the performance of HuBERT on the MiniSUPERB benchmark, but with much faster inference times. Our analysis on the adaptivity of DAISY shows that the model exits early (using fewer layers) on clean data while exits late (using more layers) on noisy data, dynamically adjusting the computational cost of inference based on the noise level of each sample.

DAISY: Data Adaptive Self-Supervised Early Exit for Speech Representation Models

TL;DR

Abstract

Paper Structure (12 sections, 4 equations, 5 figures, 1 table)

This paper contains 12 sections, 4 equations, 5 figures, 1 table.

Introduction
Related Work
Methodology
Training early exit branches
Training downstream tasks
Exit strategies at inference time
Experiments
Downstream performance and speed-up
Noise adaptivity of DAISY
Applications of noise adaptativity
Conclusion
Acknowledgement

Figures (5)

Figure 1: Three stages of DAISY: (a) the training of early exit branches, (b) the training of downstream models, and (c) early exit at inference time. EEB is used to denote early exit branches, where the linear classifier and entropy computation happen. Model parameters are frozen when the boxes are in gray; model parameters are being trained when the boxes are in orange.
Figure 2: The average entropy of each early exit branch on four datasets of MiniSUPERB wang2023minisuperb.
Figure 3: The comparison of DAISY and the early exit baseline at a fixed layer. The colored dots represent DAISY, while the black crosses represent early exit at the 6th layer. For each downstream task, we present the results of three different $\rho$ values (0.7, 0.76, 1.0) combined with the three exit strategy at inference time, resulting in a total of 9 dots.
Figure 4: Violin plot of the early exit probability at each layer when applying different levels of MUSAN noise snyder2015musan to the Librispeech test-clean set. The horizontal line represents the maximum, average, and minimum of the early exit layer, respectively.
Figure 5: The word error rate of DAISY on samples with different level of noises. HuBERT-first-6L represents the results of statically early exiting at 6th layer.

DAISY: Data Adaptive Self-Supervised Early Exit for Speech Representation Models

TL;DR

Abstract

DAISY: Data Adaptive Self-Supervised Early Exit for Speech Representation Models

Authors

TL;DR

Abstract

Table of Contents

Figures (5)