TaSL: Continual Dialog State Tracking via Task Skill Localization and Consolidation
Yujie Feng, Xu Chu, Yongxin Xu, Guangyuan Shi, Bo Liu, Xiao-Ming Wu
TL;DR
TaSL tackles continual dialogue state tracking by separating task-specific and task-shared knowledge through a novel group-wise skill localization metric and consolidating information with fine-grained model averaging. Unlike replay-based methods, TaSL achieves strong forward and backward knowledge transfer while mitigating catastrophic forgetting, demonstrated across multiple backbones including large PEFT models. The key contributions include the gradient-based importance scoring mechanism with smoothing, cumulative accumulation of past task importance, and a case-based averaging strategy that preserves past task integrity while embracing new task knowledge. Empirical results on SGD tasks show notable gains in Avg. JGA and more favorable backward transfer, approaching memory-replay baselines and offering practical, memory-efficient continual learning for DST.
Abstract
A practical dialogue system requires the capacity for ongoing skill acquisition and adaptability to new tasks while preserving prior knowledge. However, current methods for Continual Dialogue State Tracking (DST), a crucial function of dialogue systems, struggle with the catastrophic forgetting issue and knowledge transfer between tasks. We present TaSL, a novel framework for task skill localization and consolidation that enables effective knowledge transfer without relying on memory replay. TaSL uses a novel group-wise technique to pinpoint task-specific and task-shared areas. Additionally, a fine-grained skill consolidation strategy protects task-specific knowledge from being forgotten while updating shared knowledge for bi-directional knowledge transfer. As a result, TaSL strikes a balance between preserving previous knowledge and excelling at new tasks. Comprehensive experiments on various backbones highlight the significant performance improvements of TaSL over existing state-of-the-art methods. The source code is provided for reproducibility.
