TA&AT: Enhancing Task-Oriented Dialog with Turn-Level Auxiliary Tasks and Action-Tree Based Scheduled Sampling

Longxiang Liu; Xiuxing Li; Yang Feng

TA&AT: Enhancing Task-Oriented Dialog with Turn-Level Auxiliary Tasks and Action-Tree Based Scheduled Sampling

Longxiang Liu, Xiuxing Li, Yang Feng

TL;DR

The paper tackles two core TOD challenges: underutilization of intermediate state annotations for encoder understanding and error accumulation during generation. It introduces turn-level auxiliary tasks to enrich the encoder representation and action-tree based scheduled sampling to improve robustness of the decoder against perturbations in action sequences. Empirically, TA&AT achieves state-of-the-art results among non-continual pre-training methods on MultiWOZ 2.0/2.1/2.2 and exhibits strong performance in low-resource settings, with ablations confirming the contributions of both components. The work advances end-to-end TOD by bridging task-level supervision with sequence-level robustness, enabling more reliable and fluent task-oriented dialogue generation.

Abstract

Task-oriented dialog systems have witnessed substantial progress due to conversational pre-training techniques. Yet, two significant challenges persist. First, most systems primarily utilize the latest turn's state label for the generator. This practice overlooks the comprehensive value of state labels in boosting the model's understanding for future generations. Second, an overreliance on generated policy often leads to error accumulation, resulting in suboptimal responses when adhering to incorrect actions. To combat these challenges, we propose turn-level multi-task objectives for the encoder. With the guidance of essential information from labeled intermediate states, we establish a more robust representation for both understanding and generation. For the decoder, we introduce an action tree-based scheduled sampling technique. Specifically, we model the hierarchical policy as trees and utilize the similarity between trees to sample negative policy based on scheduled sampling, hoping the model to generate invariant responses under perturbations. This method simulates potential pitfalls by sampling similar negative policy, bridging the gap between task-oriented dialog training and inference. Among methods without continual pre-training, our approach achieved state-of-the-art (SOTA) performance on the MultiWOZ dataset series and was also competitive with pre-trained SOTA methods.

TA&AT: Enhancing Task-Oriented Dialog with Turn-Level Auxiliary Tasks and Action-Tree Based Scheduled Sampling

TL;DR

Abstract

Paper Structure (30 sections, 13 equations, 5 figures, 3 tables)

This paper contains 30 sections, 13 equations, 5 figures, 3 tables.

Introduction
Related Work
Model Framework
Definitions
Objectives
Methodology
Turn-Level Auxiliary Tasks
Turn representation
Slot Type Prediction
Slot Change Prediction
Action Prediction
Response Keywords Prediction
Action-Tree Based Scheduled Sampling
Action Tree
Scheduled Sampling
...and 15 more sections

Figures (5)

Figure 1: Illustration of task-oriented dialog system.
Figure 2: Illustration of our task-oriented dialog system framework. For simplicity, we show an example dialog in the scenario of a user ordering a restaurant, $t=1$ (starts from 0). The memory module will keep track of the new generated belief states, db states, acts, and responses.
Figure 3: Overall framework of our proposed methods. The left part shows the process of extracting turn-level representations and passing them to four multi-dimensional Bernoulli/Categorical classification heads. The right part shows the process of action-tree based scheduled sampling, where the ground truth action $\hat{A}$ will be replaced with the probability of $1-p(t)$, a replacing action sample $A'$ is then sampled according to the normalized similarity score. The calculation of similarity score is based on the action-tree Editing Distance, which will be discussed detailedly in Section \ref{['sec:at']}.
Figure 4: Learning curve for different tasks in training. X-axis represents the number of training steps and Y-axis represents macro F1-score.
Figure 5: Case Study: Delexicalized responses generated by Mars and TA&AT on MultiWOZ 2.0 test data. 'GT' is short for ground truth.

TA&AT: Enhancing Task-Oriented Dialog with Turn-Level Auxiliary Tasks and Action-Tree Based Scheduled Sampling

TL;DR

Abstract

TA&AT: Enhancing Task-Oriented Dialog with Turn-Level Auxiliary Tasks and Action-Tree Based Scheduled Sampling

Authors

TL;DR

Abstract

Table of Contents

Figures (5)