Evaluating and Improving Continual Learning in Spoken Language Understanding

Muqiao Yang; Xiang Li; Umberto Cappellazzo; Shinji Watanabe; Bhiksha Raj

Evaluating and Improving Continual Learning in Spoken Language Understanding

Muqiao Yang, Xiang Li, Umberto Cappellazzo, Shinji Watanabe, Bhiksha Raj

TL;DR

This work addresses the challenge of evaluating continual learning in Spoken Language Understanding (SLU) by introducing the Dual-transfer Matching Index (DMI), a unified metric that disentangles stability, plasticity, and generalizability. It couples DMI with a class-incremental SLU pipeline and multiple knowledge distillation techniques (Audio-KD, Seq-KD, Sent-KD) to improve the three properties during sequential task learning. The authors demonstrate that DMI provides a more sensitive and comprehensive view of model behavior than existing metrics and that KD strategies can simultaneously enhance stability, plasticity, and generalizability. The framework is validated on FSC and SLURP, showing improved continual learning performance and revealing tasks-order effects that prior metrics often miss, with practical impact for robust SLU systems in evolving environments.

Abstract

Continual learning has emerged as an increasingly important challenge across various tasks, including Spoken Language Understanding (SLU). In SLU, its objective is to effectively handle the emergence of new concepts and evolving environments. The evaluation of continual learning algorithms typically involves assessing the model's stability, plasticity, and generalizability as fundamental aspects of standards. However, existing continual learning metrics primarily focus on only one or two of the properties. They neglect the overall performance across all tasks, and do not adequately disentangle the plasticity versus stability/generalizability trade-offs within the model. In this work, we propose an evaluation methodology that provides a unified evaluation on stability, plasticity, and generalizability in continual learning. By employing the proposed metric, we demonstrate how introducing various knowledge distillations can improve different aspects of these three properties of the SLU model. We further show that our proposed metric is more sensitive in capturing the impact of task ordering in continual learning, making it better suited for practical use-case scenarios.

Evaluating and Improving Continual Learning in Spoken Language Understanding

TL;DR

Abstract

Paper Structure (21 sections, 12 equations, 4 figures, 2 tables)

This paper contains 21 sections, 12 equations, 4 figures, 2 tables.

Introduction
Evaluation in Continual Learning
Stability-plasticity Dilemma
Formulation of Continual Metrics
Continual Learning with Unified Evaluation
DMI: Dual-transfer Matching Index
Pipeline Formulation
Knowledge Distillations
Audio-KD.
Seq-KD.
Sent-KD.
Experiments
Experimental Setup
Evaluation Results and Analysis
Effect of Task Ordering on Evaluation
...and 6 more sections

Figures (4)

Figure 1: An illustration of the proposed Dual-transfer Matching Index (DMI) vs. other continual learning related metrics, including backward transfer and forward transfer lopez2017gradient. The vertical and horizontal axes represent the sequence of tasks presented to the network for learning. $T$ is the total number of tasks. A circle at index $(i,j)$ means the evaluation on task $j$ after finishing training task $i$. A green circle indicates the evaluated performance of one seen task, while a grey circle is assessing unseen classes in future tasks. By covering the whole $T \times T$ matrix, our DMI provides an evaluation on three aspects of model capabilities, including stability, plasticity and generalizability.
Figure 2: Train-evaluation performance matrix $\mathbf{A}$ in continual learning. The diagonal entries (red) represent the plasticity with the performance of current tasks. The lower triangular matrix (blue) represents the focused field of stability, while the upper triangular matrix (grey) measures generalizability to unseen tasks.
Figure 3: Pipeline overview of our SLU training. Dashed blocks indicate the knowledge distillation from previous tasks.
Figure 4: Qualitative results to show the change of clustering across tasks.

Evaluating and Improving Continual Learning in Spoken Language Understanding

TL;DR

Abstract

Evaluating and Improving Continual Learning in Spoken Language Understanding

Authors

TL;DR

Abstract

Table of Contents

Figures (4)