Simulating Task-Oriented Dialogues with State Transition Graphs and Large Language Models

Chris Samarinas; Pracha Promthaw; Atharva Nijasure; Hansi Zeng; Julian Killingback; Hamed Zamani

Simulating Task-Oriented Dialogues with State Transition Graphs and Large Language Models

Chris Samarinas, Pracha Promthaw, Atharva Nijasure, Hansi Zeng, Julian Killingback, Hamed Zamani

TL;DR

TOD systems require large, diverse training data, and single-prompt synthetic data often under-covers task variation. SynTOD addresses this by using a state-transition graph to guide graph-guided multi-prompt LLM generation and retrieval augmentation to produce end-to-end TOD data. The approach yields substantial improvements in intent classification, slot filling, and response relevance across cooking and e-commerce domains, while analyzing model size, data efficiency, and evaluation alignment with humans. The work also provides synthetic datasets, models, and code to support rapid development of domain-specific TOD systems in low-resource settings.

Abstract

This paper explores SynTOD, a new synthetic data generation approach for developing end-to-end Task-Oriented Dialogue (TOD) Systems capable of handling complex tasks such as intent classification, slot filling, conversational question-answering, and retrieval-augmented response generation, without relying on crowdsourcing or real-world data. SynTOD utilizes a state transition graph to define the desired behavior of a TOD system and generates diverse, structured conversations through random walks and response simulation using large language models (LLMs). In our experiments, using graph-guided response simulations leads to significant improvements in intent classification, slot filling and response relevance compared to naive single-prompt simulated conversations. We also investigate the end-to-end TOD effectiveness of different base and instruction-tuned LLMs, with and without the constructed synthetic conversations. Finally, we explore how various LLMs can evaluate responses in a TOD system and how well they are correlated with human judgments. Our findings pave the path towards quick development and evaluation of domain-specific TOD systems. We release our datasets, models, and code for research purposes.

Simulating Task-Oriented Dialogues with State Transition Graphs and Large Language Models

TL;DR

Abstract

Paper Structure (9 sections, 7 figures, 11 tables)

This paper contains 9 sections, 7 figures, 11 tables.

Introduction
Methodology
Experimental Setup
Datasets
Training and Inference
Results and Analyses
Related Work
Conclusion & Future Work
Appendix

Figures (7)

Figure 1: Overview of an end-to-end retrieval-augmented TOD system. A LLM and a retriever are the main components. A conversation history is given as input, and response, intent, slots and documents comprise the output system state.
Figure 2: The state transition graph we defined for the recipe assistant domain. On the right we see transitions to nodes that are possible from any other state.
Figure 2: Diversity of data generated with and without a graph.
Figure 3: Overview of the SynTOD conversation simulation framework.
Figure 4: Frequency distribution of user intents for the recipe domain with and without state transition graph.
...and 2 more figures

Simulating Task-Oriented Dialogues with State Transition Graphs and Large Language Models

TL;DR

Abstract

Simulating Task-Oriented Dialogues with State Transition Graphs and Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (7)