Large Language Models Know Your Contextual Search Intent: A Prompting Framework for Conversational Search
Kelong Mao, Zhicheng Dou, Fengran Mo, Jiewen Hou, Haonan Chen, Hongjin Qian
TL;DR
The paper targets the challenge of extracting precise contextual search intent in multi-turn conversational search. It introduces LLM4CS, a prompting framework that generates multiple query rewrites and hypothetical responses, then aggregates them into a robust intent representation for retrieval. Through experiments on CAsT-19/20/21, LLM4CS—especially the RAR+Mean+CoT configuration—outperforms strong baselines and even human rewrites on key metrics, underscoring the potential of LLM-driven intent understanding in IR. The study also analyzes prompting and aggregation strategies, showing that careful combination of rewrites, responses, and reasoning improves robustness and effectiveness, with human evaluation confirming high rewrite quality.
Abstract
Precisely understanding users' contextual search intent has been an important challenge for conversational search. As conversational search sessions are much more diverse and long-tailed, existing methods trained on limited data still show unsatisfactory effectiveness and robustness to handle real conversational search scenarios. Recently, large language models (LLMs) have demonstrated amazing capabilities for text generation and conversation understanding. In this work, we present a simple yet effective prompting framework, called LLM4CS, to leverage LLMs as a text-based search intent interpreter to help conversational search. Under this framework, we explore three prompting methods to generate multiple query rewrites and hypothetical responses, and propose to aggregate them into an integrated representation that can robustly represent the user's real contextual search intent. Extensive automatic evaluations and human evaluations on three widely used conversational search benchmarks, including CAsT-19, CAsT-20, and CAsT-21, demonstrate the remarkable performance of our simple LLM4CS framework compared with existing methods and even using human rewrites. Our findings provide important evidence to better understand and leverage LLMs for conversational search.
