Table of Contents
Fetching ...

DQ-Whisper: Joint Distillation and Quantization for Efficient Multilingual Speech Recognition

Hang Shao, Bei Liu, Wei Wang, Xun Gong, Yanmin Qian

TL;DR

DQ-Whisper is proposed, a novel joint distillation and quantization framework to compress Whisper for efficient inference and shows that the suggested distillation approach can effectively enhance the multilingual capabilities of small Whisper models without increasing computational costs.

Abstract

As a popular multilingual and multitask pre-trained speech model, Whisper has the problem of curse of multilinguality. To enhance multilingual capabilities in small Whisper models, we propose DQ-Whisper, a novel joint distillation and quantization framework to compress Whisper for efficient inference. Firstly, we propose a novel dynamic matching distillation strategy. Then, a quantization-aware distillation framework is introduced to integrate quantization with distillation. Experimental results on various multilingual datasets show that our suggested distillation approach can effectively enhance the multilingual capabilities of small Whisper models without increasing computational costs. Up to 5.18x reduction in model size is achieved with marginal performance degradation. In addition, quantization is compatible with distillation, which can result in a higher compression rate.

DQ-Whisper: Joint Distillation and Quantization for Efficient Multilingual Speech Recognition

TL;DR

DQ-Whisper is proposed, a novel joint distillation and quantization framework to compress Whisper for efficient inference and shows that the suggested distillation approach can effectively enhance the multilingual capabilities of small Whisper models without increasing computational costs.

Abstract

As a popular multilingual and multitask pre-trained speech model, Whisper has the problem of curse of multilinguality. To enhance multilingual capabilities in small Whisper models, we propose DQ-Whisper, a novel joint distillation and quantization framework to compress Whisper for efficient inference. Firstly, we propose a novel dynamic matching distillation strategy. Then, a quantization-aware distillation framework is introduced to integrate quantization with distillation. Experimental results on various multilingual datasets show that our suggested distillation approach can effectively enhance the multilingual capabilities of small Whisper models without increasing computational costs. Up to 5.18x reduction in model size is achieved with marginal performance degradation. In addition, quantization is compatible with distillation, which can result in a higher compression rate.
Paper Structure (14 sections, 8 equations, 1 figure, 4 tables)

This paper contains 14 sections, 8 equations, 1 figure, 4 tables.

Figures (1)

  • Figure 1: Comparison between three different layer selection methods in hidden layer distillation. Red denotes teacher layer and Green denotes the student layer.