Table of Contents
Fetching ...

TaSL: Continual Dialog State Tracking via Task Skill Localization and Consolidation

Yujie Feng, Xu Chu, Yongxin Xu, Guangyuan Shi, Bo Liu, Xiao-Ming Wu

TL;DR

TaSL tackles continual dialogue state tracking by separating task-specific and task-shared knowledge through a novel group-wise skill localization metric and consolidating information with fine-grained model averaging. Unlike replay-based methods, TaSL achieves strong forward and backward knowledge transfer while mitigating catastrophic forgetting, demonstrated across multiple backbones including large PEFT models. The key contributions include the gradient-based importance scoring mechanism with smoothing, cumulative accumulation of past task importance, and a case-based averaging strategy that preserves past task integrity while embracing new task knowledge. Empirical results on SGD tasks show notable gains in Avg. JGA and more favorable backward transfer, approaching memory-replay baselines and offering practical, memory-efficient continual learning for DST.

Abstract

A practical dialogue system requires the capacity for ongoing skill acquisition and adaptability to new tasks while preserving prior knowledge. However, current methods for Continual Dialogue State Tracking (DST), a crucial function of dialogue systems, struggle with the catastrophic forgetting issue and knowledge transfer between tasks. We present TaSL, a novel framework for task skill localization and consolidation that enables effective knowledge transfer without relying on memory replay. TaSL uses a novel group-wise technique to pinpoint task-specific and task-shared areas. Additionally, a fine-grained skill consolidation strategy protects task-specific knowledge from being forgotten while updating shared knowledge for bi-directional knowledge transfer. As a result, TaSL strikes a balance between preserving previous knowledge and excelling at new tasks. Comprehensive experiments on various backbones highlight the significant performance improvements of TaSL over existing state-of-the-art methods. The source code is provided for reproducibility.

TaSL: Continual Dialog State Tracking via Task Skill Localization and Consolidation

TL;DR

TaSL tackles continual dialogue state tracking by separating task-specific and task-shared knowledge through a novel group-wise skill localization metric and consolidating information with fine-grained model averaging. Unlike replay-based methods, TaSL achieves strong forward and backward knowledge transfer while mitigating catastrophic forgetting, demonstrated across multiple backbones including large PEFT models. The key contributions include the gradient-based importance scoring mechanism with smoothing, cumulative accumulation of past task importance, and a case-based averaging strategy that preserves past task integrity while embracing new task knowledge. Empirical results on SGD tasks show notable gains in Avg. JGA and more favorable backward transfer, approaching memory-replay baselines and offering practical, memory-efficient continual learning for DST.

Abstract

A practical dialogue system requires the capacity for ongoing skill acquisition and adaptability to new tasks while preserving prior knowledge. However, current methods for Continual Dialogue State Tracking (DST), a crucial function of dialogue systems, struggle with the catastrophic forgetting issue and knowledge transfer between tasks. We present TaSL, a novel framework for task skill localization and consolidation that enables effective knowledge transfer without relying on memory replay. TaSL uses a novel group-wise technique to pinpoint task-specific and task-shared areas. Additionally, a fine-grained skill consolidation strategy protects task-specific knowledge from being forgotten while updating shared knowledge for bi-directional knowledge transfer. As a result, TaSL strikes a balance between preserving previous knowledge and excelling at new tasks. Comprehensive experiments on various backbones highlight the significant performance improvements of TaSL over existing state-of-the-art methods. The source code is provided for reproducibility.
Paper Structure (33 sections, 11 equations, 5 figures, 11 tables, 2 algorithms)

This paper contains 33 sections, 11 equations, 5 figures, 11 tables, 2 algorithms.

Figures (5)

  • Figure 1: Conceptual illustration of TaSL. By identifying task-relevant areas across both previously accumulated and current tasks, we can consolidate the task-specific and task-shared parameters to facilitate efficient knowledge transfer and mitigate forgetting.
  • Figure 2: Overview of TaSL.Step 1: We compute the importance scores of skill units for the current task $\mathcal{T}_{k}$ using our importance-aware skill localization method during fine-tuning. Step 2: Based on a fine-grained model averaging strategy, the skill consolidation method merges the model $\hat{f}_{k-1}$, which accumulates knowledge of all previous tasks, with the current task's model $f_k$. The integration is guided by the importance distributions of skill units across various tasks. We then update the cumulative importance scores for all skill units until task $\mathcal{T}_{k}$ using Eq. (\ref{['eq:norm']}). This process is designed to be iteratively repeated with the introduction of each subsequent task.
  • Figure 3: Performance of TaSL w/ different backbones.
  • Figure 4: Performance trajectory of Task 1 during the Continual DST learning process.
  • Figure 5: Visualization of importance scores for skill units across different backbone models and tasks.