AFD-SLU: Adaptive Feature Distillation for Spoken Language Understanding

Yan Xie; Yibo Cui; Liang Xie; Erwei Yin

AFD-SLU: Adaptive Feature Distillation for Spoken Language Understanding

Yan Xie, Yibo Cui, Liang Xie, Erwei Yin

TL;DR

This paper tackles data scarcity and deployment cost in SLU by introducing Adaptive Feature Distillation for SLU (AFD-SLU), which uses a frozen General Text Embeddings (GTE) teacher to guide a lightweight joint SLU student through an adaptive dynamic adapter. The adapter combines a Residual Projection Neural Network (RPNN) for feature alignment and a Dynamic Distillation Coefficient (DDC) to balance distillation with task learning, with a loss combining L_task and L_distill in a cosine-annealed schedule. Experiments on the ProSLU Chinese benchmark show state-of-the-art results in intent accuracy, slot F1, and overall accuracy, with substantial gains and robust ablations validating the importance of RPNN and DDC. The work highlights the practical potential of task-aligned GTE teachers for resource-constrained SLU and points to future improvements via data augmentation and model compression.

Abstract

Spoken Language Understanding (SLU) is a core component of conversational systems, enabling machines to interpret user utterances. Despite its importance, developing effective SLU systems remains challenging due to the scarcity of labeled training data and the computational burden of deploying Large Language Models (LLMs) in real-world applications. To further alleviate these issues, we propose an Adaptive Feature Distillation framework that transfers rich semantic representations from a General Text Embeddings (GTE)-based teacher model to a lightweight student model. Our method introduces a dynamic adapter equipped with a Residual Projection Neural Network (RPNN) to align heterogeneous feature spaces, and a Dynamic Distillation Coefficient (DDC) that adaptively modulates the distillation strength based on real-time feedback from intent and slot prediction performance. Experiments on the Chinese profile-based ProSLU benchmark demonstrate that AFD-SLU achieves state-of-the-art results, with 95.67% intent accuracy, 92.02% slot F1 score, and 85.50% overall accuracy.

AFD-SLU: Adaptive Feature Distillation for Spoken Language Understanding

TL;DR

Abstract

AFD-SLU: Adaptive Feature Distillation for Spoken Language Understanding

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (1)