Large Language Models Know Your Contextual Search Intent: A Prompting Framework for Conversational Search

Kelong Mao; Zhicheng Dou; Fengran Mo; Jiewen Hou; Haonan Chen; Hongjin Qian

Large Language Models Know Your Contextual Search Intent: A Prompting Framework for Conversational Search

Kelong Mao, Zhicheng Dou, Fengran Mo, Jiewen Hou, Haonan Chen, Hongjin Qian

TL;DR

The paper targets the challenge of extracting precise contextual search intent in multi-turn conversational search. It introduces LLM4CS, a prompting framework that generates multiple query rewrites and hypothetical responses, then aggregates them into a robust intent representation for retrieval. Through experiments on CAsT-19/20/21, LLM4CS—especially the RAR+Mean+CoT configuration—outperforms strong baselines and even human rewrites on key metrics, underscoring the potential of LLM-driven intent understanding in IR. The study also analyzes prompting and aggregation strategies, showing that careful combination of rewrites, responses, and reasoning improves robustness and effectiveness, with human evaluation confirming high rewrite quality.

Abstract

Precisely understanding users' contextual search intent has been an important challenge for conversational search. As conversational search sessions are much more diverse and long-tailed, existing methods trained on limited data still show unsatisfactory effectiveness and robustness to handle real conversational search scenarios. Recently, large language models (LLMs) have demonstrated amazing capabilities for text generation and conversation understanding. In this work, we present a simple yet effective prompting framework, called LLM4CS, to leverage LLMs as a text-based search intent interpreter to help conversational search. Under this framework, we explore three prompting methods to generate multiple query rewrites and hypothetical responses, and propose to aggregate them into an integrated representation that can robustly represent the user's real contextual search intent. Extensive automatic evaluations and human evaluations on three widely used conversational search benchmarks, including CAsT-19, CAsT-20, and CAsT-21, demonstrate the remarkable performance of our simple LLM4CS framework compared with existing methods and even using human rewrites. Our findings provide important evidence to better understand and leverage LLMs for conversational search.

Large Language Models Know Your Contextual Search Intent: A Prompting Framework for Conversational Search

TL;DR

Abstract

Paper Structure (27 sections, 7 equations, 4 figures, 5 tables)

This paper contains 27 sections, 7 equations, 4 figures, 5 tables.

Introduction
Related Work
LLM4CS: Prompting Large Language Models for Conversational Search
Task Formulation
Prompting Methods
Rewriting Prompt (REW)
Rewriting-Then-Response (RTR)
Rewriting-And-Response (RAR)
Incorporating Chain-of-Thought
Content Aggregation
MaxProb
Self-Consistency (SC)
Mean
Retrieval
Experiments
...and 12 more sections

Figures (4)

Figure 1: An overview of LLM4CS.
Figure 2: NDCG@3 comparisons between incorporating our tailored CoT or not across different prompting and aggregation methods on CAsT-20 and CAsT-21 datasets.
Figure 3: Human evaluation results for LLM4CS (REW + MaxProb) and T5QR on the three CAsT datasets.
Figure 4: A general illustration of the prompt of LLM4CS. The prompt consist of three parts, i.e., Instruction, Demonstration, and Input. The red part is for REW prompting, the blue part is for the RTR and RAR promptings, and the orange part is for RTR prompting. The green part is for our designed chain-of-thought.

Large Language Models Know Your Contextual Search Intent: A Prompting Framework for Conversational Search

TL;DR

Abstract

Large Language Models Know Your Contextual Search Intent: A Prompting Framework for Conversational Search

Authors

TL;DR

Abstract

Table of Contents

Figures (4)