Table of Contents
Fetching ...

Multi-Step Dialogue Workflow Action Prediction

Ramya Ramakrishnan, Ethan R. Elenberg, Hashan Narangodage, Ryan McDonald

TL;DR

This work introduces multi-step Action State Tracking (AST) for task-oriented dialogue, proposing that future workflow actions be represented as a branching graph to capture uncertainty in user responses. It compares three practical approaches—fine-tuning on multi-step sequences, few-shot in-context learning with retrieval prompts, and zero-shot graph traversal—to predict extended action sequences on ABCD and MultiWoz datasets. The study shows that fine-tuned multi-step models generally achieve the strongest performance across traditional and new metrics, while graph traversal robustly models future uncertainty without training, and in-context learning can competitive with higher costs. Beyond prediction, multi-step AST improves downstream tasks such as dialogue statistics estimation and automated action execution, yielding up to roughly 20% more automation of system actions relative to one-step baselines and enabling more efficient human-in-the-loop workflows.

Abstract

In task-oriented dialogue, a system often needs to follow a sequence of actions, called a workflow, that complies with a set of guidelines in order to complete a task. In this paper, we propose the novel problem of multi-step workflow action prediction, in which the system predicts multiple future workflow actions. Accurate prediction of multiple steps allows for multi-turn automation, which can free up time to focus on more complex tasks. We propose three modeling approaches that are simple to implement yet lead to more action automation: 1) fine-tuning on a training dataset, 2) few-shot in-context learning leveraging retrieval and large language model prompting, and 3) zero-shot graph traversal, which aggregates historical action sequences into a graph for prediction. We show that multi-step action prediction produces features that improve accuracy on downstream dialogue tasks like predicting task success, and can increase automation of steps by 20% without requiring as much feedback from a human overseeing the system.

Multi-Step Dialogue Workflow Action Prediction

TL;DR

This work introduces multi-step Action State Tracking (AST) for task-oriented dialogue, proposing that future workflow actions be represented as a branching graph to capture uncertainty in user responses. It compares three practical approaches—fine-tuning on multi-step sequences, few-shot in-context learning with retrieval prompts, and zero-shot graph traversal—to predict extended action sequences on ABCD and MultiWoz datasets. The study shows that fine-tuned multi-step models generally achieve the strongest performance across traditional and new metrics, while graph traversal robustly models future uncertainty without training, and in-context learning can competitive with higher costs. Beyond prediction, multi-step AST improves downstream tasks such as dialogue statistics estimation and automated action execution, yielding up to roughly 20% more automation of system actions relative to one-step baselines and enabling more efficient human-in-the-loop workflows.

Abstract

In task-oriented dialogue, a system often needs to follow a sequence of actions, called a workflow, that complies with a set of guidelines in order to complete a task. In this paper, we propose the novel problem of multi-step workflow action prediction, in which the system predicts multiple future workflow actions. Accurate prediction of multiple steps allows for multi-turn automation, which can free up time to focus on more complex tasks. We propose three modeling approaches that are simple to implement yet lead to more action automation: 1) fine-tuning on a training dataset, 2) few-shot in-context learning leveraging retrieval and large language model prompting, and 3) zero-shot graph traversal, which aggregates historical action sequences into a graph for prediction. We show that multi-step action prediction produces features that improve accuracy on downstream dialogue tasks like predicting task success, and can increase automation of steps by 20% without requiring as much feedback from a human overseeing the system.
Paper Structure (35 sections, 2 equations, 3 figures, 5 tables, 1 algorithm)

This paper contains 35 sections, 2 equations, 3 figures, 5 tables, 1 algorithm.

Figures (3)

  • Figure 1: We propose the problem of multi-step Action State Tracking (AST), which involves predicting many future workflow actions while prior work only predicts one step. We represent predictions as graphs that capture potential branching in future action sequences.
  • Figure 2: Varying the max number of steps in predicted action sequences on the ABCD dataset. When $N\!=\!1$ (the 1-step AST problem), our multi-step model performs equivalently or better than the 1-step AST model. As we increase $N$, our models perform much better.
  • Figure 3: Automation results comparing methods along two axes: % of steps automated and number of suggestions. Multi-step prediction (full) achieves 20% more automation of steps compared with 1-step prediction. Multi-step prediction (dynamic) with various model confidence thresholds shows the tradeoff of more automation vs. more human involvement.