Context-Driven Interactive Query Simulations Based on Generative Large Language Models

Björn Engelmann; Timo Breuer; Jana Isabelle Friese; Philipp Schaer; Norbert Fuhr

Context-Driven Interactive Query Simulations Based on Generative Large Language Models

Björn Engelmann, Timo Breuer, Jana Isabelle Friese, Philipp Schaer, Norbert Fuhr

TL;DR

This work tackles the gap between traditional Cranfield-style IR evaluation and real user behavior by introducing context-driven, interactive simulations that account for user knowledge state and session dynamics. It integrates two main query-generation approaches—LLM-based prompting and Doc2Query with evolving knowledge states—along with two retrieval paradigms (sparse BM25 and dense MonoT5) to simulate realistic search sessions. The study introduces context-aware evaluation measures (Effort vs. Effect, sDCG, sRBP) and provides a detailed implementation and datasets, demonstrating that context and feedback substantially improve information gain, with probabilistic prompt strategies outperforming rule-based ones. The open experimental setup and multiple analysis perspectives offer a practical path toward higher-fidelity user simulations and more informative IR evaluations. The findings have practical impact for researchers and developers aiming to benchmark IR systems in user-centric contexts and to better understand the trade-offs between retrieval efficiency and user effort.

Abstract

Simulating user interactions enables a more user-oriented evaluation of information retrieval (IR) systems. While user simulations are cost-efficient and reproducible, many approaches often lack fidelity regarding real user behavior. Most notably, current user models neglect the user's context, which is the primary driver of perceived relevance and the interactions with the search results. To this end, this work introduces the simulation of context-driven query reformulations. The proposed query generation methods build upon recent Large Language Model (LLM) approaches and consider the user's context throughout the simulation of a search session. Compared to simple context-free query generation approaches, these methods show better effectiveness and allow the simulation of more efficient IR sessions. Similarly, our evaluations consider more interaction context than current session-based measures and reveal interesting complementary insights in addition to the established evaluation protocols. We conclude with directions for future work and provide an entirely open experimental setup.

Context-Driven Interactive Query Simulations Based on Generative Large Language Models

TL;DR

Abstract

Paper Structure (18 sections, 6 equations, 5 figures, 4 tables)

This paper contains 18 sections, 6 equations, 5 figures, 4 tables.

Introduction
Related Work
Methodology
Retrieval Models
Click and Stop Decisions
Query Generation
Prompting query reformulations
Doc2Query
Evaluation Measures
Effort vs. Effect
Session-based DCG
Session RBP
Implementation Details and Datasets
Experimental Results
Retrieval Models and Users
...and 3 more sections

Figures (5)

Figure 1: Focus of this work in the context of the Complex Searcher Model DBLP:conf/cikm/MaxwellAJK15.
Figure 2: Results of the simulated sessions for different click behaviors. Core18 (top) and Core17 (bottom). BM25 in solid lines, MonoT5 dashed, respectively.
Figure 3: Results of the simulated sessions for different stop decisions. Core18 (top) and Core17 (bottom). BM25 in solid lines, MonoT5 dashed, respectively.
Figure 4: Distributions for the average number of snippets examined for $i$-th query. Core18 (left) and Core17 (right).
Figure 5: Results of the simulated retrieval sessions for different query generation methods. Core18 (top) and Core17 (bottom).

Context-Driven Interactive Query Simulations Based on Generative Large Language Models

TL;DR

Abstract

Context-Driven Interactive Query Simulations Based on Generative Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (5)