TA&AT: Enhancing Task-Oriented Dialog with Turn-Level Auxiliary Tasks and Action-Tree Based Scheduled Sampling
Longxiang Liu, Xiuxing Li, Yang Feng
TL;DR
The paper tackles two core TOD challenges: underutilization of intermediate state annotations for encoder understanding and error accumulation during generation. It introduces turn-level auxiliary tasks to enrich the encoder representation and action-tree based scheduled sampling to improve robustness of the decoder against perturbations in action sequences. Empirically, TA&AT achieves state-of-the-art results among non-continual pre-training methods on MultiWOZ 2.0/2.1/2.2 and exhibits strong performance in low-resource settings, with ablations confirming the contributions of both components. The work advances end-to-end TOD by bridging task-level supervision with sequence-level robustness, enabling more reliable and fluent task-oriented dialogue generation.
Abstract
Task-oriented dialog systems have witnessed substantial progress due to conversational pre-training techniques. Yet, two significant challenges persist. First, most systems primarily utilize the latest turn's state label for the generator. This practice overlooks the comprehensive value of state labels in boosting the model's understanding for future generations. Second, an overreliance on generated policy often leads to error accumulation, resulting in suboptimal responses when adhering to incorrect actions. To combat these challenges, we propose turn-level multi-task objectives for the encoder. With the guidance of essential information from labeled intermediate states, we establish a more robust representation for both understanding and generation. For the decoder, we introduce an action tree-based scheduled sampling technique. Specifically, we model the hierarchical policy as trees and utilize the similarity between trees to sample negative policy based on scheduled sampling, hoping the model to generate invariant responses under perturbations. This method simulates potential pitfalls by sampling similar negative policy, bridging the gap between task-oriented dialog training and inference. Among methods without continual pre-training, our approach achieved state-of-the-art (SOTA) performance on the MultiWOZ dataset series and was also competitive with pre-trained SOTA methods.
