Table of Contents
Fetching ...

Reliable LLM-based User Simulator for Task-Oriented Dialogue Systems

Ivan Sekulić, Silvia Terragni, Victor Guimarães, Nghia Khau, Bruna Guedes, Modestas Filipavicius, André Ferreira Manso, Roland Mathis

TL;DR

This work introduces DAUS, a domain-aware user simulator for task-oriented dialogue systems, built by fine-tuning a pre-trained LLM with LoRA on in-domain dialogues to generate goal-aligned, coherent user utterances. By conditioning on natural-language user goals and dialogue history, DAUS reduces hallucinations and improves goal fulfillment across internal TOD systems and the MultiWOZ benchmark, outperforming rule-based baselines and prior LLM-based approaches. The approach demonstrates strong performance with moderate-sized models (13B Llama-2) and limited training data, while maintaining lexical diversity and enabling robust evaluation and synthetic data generation for TOD systems. The study also discusses generalization limits to unseen subtasks and highlights practical considerations, ethical issues, and environmental costs of deploying LLM-based simulators in research and production contexts.

Abstract

In the realm of dialogue systems, user simulation techniques have emerged as a game-changer, redefining the evaluation and enhancement of task-oriented dialogue (TOD) systems. These methods are crucial for replicating real user interactions, enabling applications like synthetic data augmentation, error detection, and robust evaluation. However, existing approaches often rely on rigid rule-based methods or on annotated data. This paper introduces DAUS, a Domain-Aware User Simulator. Leveraging large language models, we fine-tune DAUS on real examples of task-oriented dialogues. Results on two relevant benchmarks showcase significant improvements in terms of user goal fulfillment. Notably, we have observed that fine-tuning enhances the simulator's coherence with user goals, effectively mitigating hallucinations -- a major source of inconsistencies in simulator responses.

Reliable LLM-based User Simulator for Task-Oriented Dialogue Systems

TL;DR

This work introduces DAUS, a domain-aware user simulator for task-oriented dialogue systems, built by fine-tuning a pre-trained LLM with LoRA on in-domain dialogues to generate goal-aligned, coherent user utterances. By conditioning on natural-language user goals and dialogue history, DAUS reduces hallucinations and improves goal fulfillment across internal TOD systems and the MultiWOZ benchmark, outperforming rule-based baselines and prior LLM-based approaches. The approach demonstrates strong performance with moderate-sized models (13B Llama-2) and limited training data, while maintaining lexical diversity and enabling robust evaluation and synthetic data generation for TOD systems. The study also discusses generalization limits to unseen subtasks and highlights practical considerations, ethical issues, and environmental costs of deploying LLM-based simulators in research and production contexts.

Abstract

In the realm of dialogue systems, user simulation techniques have emerged as a game-changer, redefining the evaluation and enhancement of task-oriented dialogue (TOD) systems. These methods are crucial for replicating real user interactions, enabling applications like synthetic data augmentation, error detection, and robust evaluation. However, existing approaches often rely on rigid rule-based methods or on annotated data. This paper introduces DAUS, a Domain-Aware User Simulator. Leveraging large language models, we fine-tune DAUS on real examples of task-oriented dialogues. Results on two relevant benchmarks showcase significant improvements in terms of user goal fulfillment. Notably, we have observed that fine-tuning enhances the simulator's coherence with user goals, effectively mitigating hallucinations -- a major source of inconsistencies in simulator responses.
Paper Structure (37 sections, 2 equations, 1 figure, 10 tables)

This paper contains 37 sections, 2 equations, 1 figure, 10 tables.

Figures (1)

  • Figure 1: Example conversation between user simulator and TOD system. We aim to minimize common simulator's hallucinations (right) and thus ease the detection of TOD system failures (left).