Table of Contents
Fetching ...

A Simple but Effective Elaborative Query Reformulation Approach for Natural Language Recommendation

Qianfeng Wen, Yifan Liu, Justin Cui, Joshua Zhang, Anton Korikov, George-Kirollos Saad, Scott Sanner

TL;DR

The paper tackles the challenge of retrieving relevant items for broad or indirect free-form NL queries in recommender systems. It introduces Elaborative Subtopic Query Reformulation (EQR), a simple yet effective LLM-based QR method that simultaneously expands query breadth by inferring multiple subtopics and enriches each with information-rich elaborations (depth), producing a reformulated query $q'$. EQR is evaluated on three new benchmarks—TravelDest, TripAdvisor Hotel, and Yelp Restaurant—against baselines including No QR, Q2E, and Q2D, using two dense-retieval encoders and GPT-4o for reformulation; results show EQR consistently yields superior NDCG and Precision across datasets. Ablation analyses demonstrate the top-$n$ parameter effect, indicating a practical operating point around $n=50$, and expert labeling provides a fair agreement benchmark for ground-truth quality. Overall, the work demonstrates that a unified, prompt-driven QR approach can significantly enhance NL recommender performance for queries expressing broad and indirect intents, with practical implications for multi-source item representations and real-world retrieval tasks. $n$ is used as a top-$n$ passage count in aggregation, and $k$ denotes the number of elaborations generated per subtopic.

Abstract

Natural Language (NL) recommender systems aim to retrieve relevant items from free-form user queries and item descriptions. Existing systems often rely on dense retrieval (DR), which struggles to interpret challenging queries that express broad (e.g., "cities for youth friendly activities") or indirect (e.g., "cities for a high school graduation trip") user intents. While query reformulation (QR) has been widely adopted to improve such systems, existing QR methods tend to focus only on expanding the range of query subtopics (breadth) or elaborating on the potential meaning of a query (depth), but not both. In this paper, we propose EQR (Elaborative Subtopic Query Reformulation), a large language model-based QR method that combines both breadth and depth by generating potential query subtopics with information-rich elaborations. We also introduce three new natural language recommendation benchmarks in travel, hotel, and restaurant domains to establish evaluation of NL recommendation with challenging queries. Experiments show EQR substantially outperforms state-of-the-art QR methods in various evaluation metrics, highlighting that a simple yet effective QR approach can significantly improve NL recommender systems for queries with broad and indirect user intents.

A Simple but Effective Elaborative Query Reformulation Approach for Natural Language Recommendation

TL;DR

The paper tackles the challenge of retrieving relevant items for broad or indirect free-form NL queries in recommender systems. It introduces Elaborative Subtopic Query Reformulation (EQR), a simple yet effective LLM-based QR method that simultaneously expands query breadth by inferring multiple subtopics and enriches each with information-rich elaborations (depth), producing a reformulated query . EQR is evaluated on three new benchmarks—TravelDest, TripAdvisor Hotel, and Yelp Restaurant—against baselines including No QR, Q2E, and Q2D, using two dense-retieval encoders and GPT-4o for reformulation; results show EQR consistently yields superior NDCG and Precision across datasets. Ablation analyses demonstrate the top- parameter effect, indicating a practical operating point around , and expert labeling provides a fair agreement benchmark for ground-truth quality. Overall, the work demonstrates that a unified, prompt-driven QR approach can significantly enhance NL recommender performance for queries expressing broad and indirect intents, with practical implications for multi-source item representations and real-world retrieval tasks. is used as a top- passage count in aggregation, and denotes the number of elaborations generated per subtopic.

Abstract

Natural Language (NL) recommender systems aim to retrieve relevant items from free-form user queries and item descriptions. Existing systems often rely on dense retrieval (DR), which struggles to interpret challenging queries that express broad (e.g., "cities for youth friendly activities") or indirect (e.g., "cities for a high school graduation trip") user intents. While query reformulation (QR) has been widely adopted to improve such systems, existing QR methods tend to focus only on expanding the range of query subtopics (breadth) or elaborating on the potential meaning of a query (depth), but not both. In this paper, we propose EQR (Elaborative Subtopic Query Reformulation), a large language model-based QR method that combines both breadth and depth by generating potential query subtopics with information-rich elaborations. We also introduce three new natural language recommendation benchmarks in travel, hotel, and restaurant domains to establish evaluation of NL recommendation with challenging queries. Experiments show EQR substantially outperforms state-of-the-art QR methods in various evaluation metrics, highlighting that a simple yet effective QR approach can significantly improve NL recommender systems for queries with broad and indirect user intents.

Paper Structure

This paper contains 17 sections, 5 figures, 7 tables, 1 algorithm.

Figures (5)

  • Figure 1: Example recommendation results for the query "Cities for youth-friendly activities" under different QR methods. We show results for four representative cities: Amsterdam (known for nightlife), Bangkok (known for vibrant street life and budget accommodations), and Vancouver (known for outdoor activities) are part of the ground truth, while Bucharest—although budget-friendly—is not considered youth-friendly. Q2D focuses solely on depth, generating an in-depth reformulation that highlights Amsterdam but fails to surface other relevant candidates. Q2E emphasizes breadth by listing diverse keywords, but incorrectly ranks Bucharest highly due to its affordability. In contrast, EQR effectively distinguishes ideal and non-ideal candidates by combining both breadth and depth in its reformulation.
  • Figure 2: Pipeline overview of an NL recommender system with LLM-driven query reformulation (QR). Passage scores represent the cosine similarity between the reformulated query and each passage in the embedding space. Item-level scores are computed by averaging the top-$n$ passage scores.
  • Figure 3: Prompts for EQR discussed in \ref{['qr']}. The first bullet point (in red) adds breadth to the query, while the second bullet point (in blue) introduces depth.
  • Figure 4: Top-n parameter performance among all datasets
  • Figure 5: Kappa Scores distribution on TravelDest