User Simulation in the Era of Generative AI: User Modeling, Synthetic Data Generation, and System Evaluation

Krisztian Balog; ChengXiang Zhai

User Simulation in the Era of Generative AI: User Modeling, Synthetic Data Generation, and System Evaluation

Krisztian Balog, ChengXiang Zhai

TL;DR

The paper addresses the challenge of personalizing and evaluating interactive AI systems in the Generative AI era by formalizing user simulation as a policy-driven framework operating over states $\mathcal{S}=(T,U,S,H)$ with a mapping $\pi: \mathcal{S} \rightarrow \mathcal{A}$ in an MD P setting, and then surveys definitions, scope, methodologies, and applications. It analyzes three main strands—user modeling, data augmentation, and system evaluation—alongside interdisciplinary connections and the limits of current LLM-based simulators, while underscoring the potential of neurosymbolic and cognitive-plausible approaches to advance realism. The authors discuss the synergy between intelligent agents, ML, and knowledge representation, and argue that open ecosystems, industry validation, and cross-domain collaboration are essential to move toward more capable and safe human–AI interactions and, ultimately, toward AGI. The paper emphasizes practical infrastructure, such as open-source simulators and cross-disciplinary workshops, as levers to accelerate progress and ensure reproducible, scalable evaluation of interactive AI systems.

Abstract

User simulation is an emerging interdisciplinary topic with multiple critical applications in the era of Generative AI. It involves creating an intelligent agent that mimics the actions of a human user interacting with an AI system, enabling researchers to model and analyze user behaviour, generate synthetic data for training, and evaluate interactive AI systems in a controlled and reproducible manner. User simulation has profound implications for diverse fields and plays a vital role in the pursuit of Artificial General Intelligence. This paper provides an overview of user simulation, highlighting its key applications, connections to various disciplines, and outlining future research directions to advance this increasingly important technology.

User Simulation in the Era of Generative AI: User Modeling, Synthetic Data Generation, and System Evaluation

TL;DR

The paper addresses the challenge of personalizing and evaluating interactive AI systems in the Generative AI era by formalizing user simulation as a policy-driven framework operating over states

with a mapping

in an MD P setting, and then surveys definitions, scope, methodologies, and applications. It analyzes three main strands—user modeling, data augmentation, and system evaluation—alongside interdisciplinary connections and the limits of current LLM-based simulators, while underscoring the potential of neurosymbolic and cognitive-plausible approaches to advance realism. The authors discuss the synergy between intelligent agents, ML, and knowledge representation, and argue that open ecosystems, industry validation, and cross-domain collaboration are essential to move toward more capable and safe human–AI interactions and, ultimately, toward AGI. The paper emphasizes practical infrastructure, such as open-source simulators and cross-disciplinary workshops, as levers to accelerate progress and ensure reproducible, scalable evaluation of interactive AI systems.

Abstract

Paper Structure (13 sections, 3 figures, 1 table)

This paper contains 13 sections, 3 figures, 1 table.

Introduction
User Simulation
Definition
Scope
Approaches
Uses of Simulation
Requirements and Desiderata
User Simulation for User Modeling
User Simulation for Data Augmentation
User Simulation for Evaluating Interactive AI Systems
User Simulation as an Interdisciplinary Research Field
User Simulation as a Step toward AGI
Conclusion and Outlook

Figures (3)

Figure 1: Overview of the various uses of user simulation.
Figure 2: Illustration of evaluation methodologies and how user simulation complements them (adapted from Balog:2024:FnTIR).
Figure 3: An innovation ecosystem, where academic researchers develop open-source user simulators, which industry partners validate using real user data, thereby bridging the data divide between academia and industry.

User Simulation in the Era of Generative AI: User Modeling, Synthetic Data Generation, and System Evaluation

TL;DR

Abstract

User Simulation in the Era of Generative AI: User Modeling, Synthetic Data Generation, and System Evaluation

Authors

TL;DR

Abstract

Table of Contents

Figures (3)