Exploring Human-Like Thinking in Search Simulations with Large Language Models
Erhan Zhang, Xingzhu Wang, Peiyuan Gong, Zixuan Yang, Jiaxin Mao
TL;DR
The paper addresses the challenge of enhancing realism in search simulations by injecting human-like thinking into LLM-driven user models. It introduces a think-aloud dataset collected from 31 participants (296 sessions, 10 tasks each) and uses supervised fine-tuning to train LLMs to generate both cognitive thoughts and subsequent actions. Across two thinking conditions and datasets, the approach demonstrates feasibility and nuanced improvements, providing new directions for more interpretable and cognitively informed user behavior modeling in information retrieval. The findings highlight task-dependent gains, with notable improvements in query-generation fidelity and stopping decisions, while traditional baselines remain strong for fine-grained actions like clicks, underscoring the need for targeted cognitive modeling in IR simulations.
Abstract
Simulating user search behavior is a critical task in information retrieval, which can be employed for user behavior modeling, data augmentation, and system evaluation. Recent advancements in large language models (LLMs) have opened up new possibilities for generating human-like actions including querying, browsing, and clicking. In this work, we explore the integration of human-like thinking into search simulations by leveraging LLMs to simulate users' hidden cognitive processes. Specifically, given a search task and context, we prompt LLMs to first think like a human before executing the corresponding action. As existing search datasets do not include users' thought processes, we conducted a user study to collect a new dataset enriched with users' explicit thinking. We investigate the impact of incorporating such human-like thinking on simulation performance and apply supervised fine-tuning (SFT) to teach LLMs to emulate both human thinking and actions. Our experiments span two dimensions in leveraging LLMs for user simulation: (1) with or without explicit thinking, and (2) with or without fine-tuning on the thinking-augmented dataset. The results demonstrate the feasibility and potential of incorporating human-like thinking in user simulations, though performance improvements on some metrics remain modest. We believe this exploration provides new avenues and inspirations for advancing user behavior modeling in search simulations.
