Evaluating and Improving Continual Learning in Spoken Language Understanding
Muqiao Yang, Xiang Li, Umberto Cappellazzo, Shinji Watanabe, Bhiksha Raj
TL;DR
This work addresses the challenge of evaluating continual learning in Spoken Language Understanding (SLU) by introducing the Dual-transfer Matching Index (DMI), a unified metric that disentangles stability, plasticity, and generalizability. It couples DMI with a class-incremental SLU pipeline and multiple knowledge distillation techniques (Audio-KD, Seq-KD, Sent-KD) to improve the three properties during sequential task learning. The authors demonstrate that DMI provides a more sensitive and comprehensive view of model behavior than existing metrics and that KD strategies can simultaneously enhance stability, plasticity, and generalizability. The framework is validated on FSC and SLURP, showing improved continual learning performance and revealing tasks-order effects that prior metrics often miss, with practical impact for robust SLU systems in evolving environments.
Abstract
Continual learning has emerged as an increasingly important challenge across various tasks, including Spoken Language Understanding (SLU). In SLU, its objective is to effectively handle the emergence of new concepts and evolving environments. The evaluation of continual learning algorithms typically involves assessing the model's stability, plasticity, and generalizability as fundamental aspects of standards. However, existing continual learning metrics primarily focus on only one or two of the properties. They neglect the overall performance across all tasks, and do not adequately disentangle the plasticity versus stability/generalizability trade-offs within the model. In this work, we propose an evaluation methodology that provides a unified evaluation on stability, plasticity, and generalizability in continual learning. By employing the proposed metric, we demonstrate how introducing various knowledge distillations can improve different aspects of these three properties of the SLU model. We further show that our proposed metric is more sensitive in capturing the impact of task ordering in continual learning, making it better suited for practical use-case scenarios.
