Table of Contents
Fetching ...

AgentDR Dynamic Recommendation with Implicit Item-Item Relations via LLM-based Agents

Mingdai Yang, Nurendra Choudhary, Jiangshu Du, Edward W. Huang, Philip S. Yu, Karthik Subbian, Danai Kourta

TL;DR

AgentDR tackles hallucination and scalability in LLM-driven recommendations by grounding full-ranking tasks in traditional tools and using LLMs to infer user intent over implicit substitutes and complements. The framework combines a profile and two memories with substitute/complement generation, personalized tool selection, and a Dual S&C reranking pipeline, achieving superior full-ranking performance on three grocery datasets. A new Vicinity-DCG (VDCG) metric is introduced to jointly assess semantic alignment and ranking correctness, highlighting the value of relational reasoning in matching user intent. Practically, AgentDR demonstrates scalable, semantically aware recommendations with strong gains over baselines and adaptable fusion of multiple ranking signals for large catalogs.

Abstract

Recent agent-based recommendation frameworks aim to simulate user behaviors by incorporating memory mechanisms and prompting strategies, but they struggle with hallucinating non-existent items and full-catalog ranking. Besides, a largely underexplored opportunity lies in leveraging LLMs'commonsense reasoning to capture user intent through substitute and complement relationships between items, which are usually implicit in datasets and difficult for traditional ID-based recommenders to capture. In this work, we propose a novel LLM-agent framework, AgenDR, which bridges LLM reasoning with scalable recommendation tools. Our approach delegates full-ranking tasks to traditional models while utilizing LLMs to (i) integrate multiple recommendation outputs based on personalized tool suitability and (ii) reason over substitute and complement relationships grounded in user history. This design mitigates hallucination, scales to large catalogs, and enhances recommendation relevance through relational reasoning. Through extensive experiments on three public grocery datasets, we show that our framework achieves superior full-ranking performance, yielding on average a twofold improvement over its underlying tools. We also introduce a new LLM-based evaluation metric that jointly measures semantic alignment and ranking correctness.

AgentDR Dynamic Recommendation with Implicit Item-Item Relations via LLM-based Agents

TL;DR

AgentDR tackles hallucination and scalability in LLM-driven recommendations by grounding full-ranking tasks in traditional tools and using LLMs to infer user intent over implicit substitutes and complements. The framework combines a profile and two memories with substitute/complement generation, personalized tool selection, and a Dual S&C reranking pipeline, achieving superior full-ranking performance on three grocery datasets. A new Vicinity-DCG (VDCG) metric is introduced to jointly assess semantic alignment and ranking correctness, highlighting the value of relational reasoning in matching user intent. Practically, AgentDR demonstrates scalable, semantically aware recommendations with strong gains over baselines and adaptable fusion of multiple ranking signals for large catalogs.

Abstract

Recent agent-based recommendation frameworks aim to simulate user behaviors by incorporating memory mechanisms and prompting strategies, but they struggle with hallucinating non-existent items and full-catalog ranking. Besides, a largely underexplored opportunity lies in leveraging LLMs'commonsense reasoning to capture user intent through substitute and complement relationships between items, which are usually implicit in datasets and difficult for traditional ID-based recommenders to capture. In this work, we propose a novel LLM-agent framework, AgenDR, which bridges LLM reasoning with scalable recommendation tools. Our approach delegates full-ranking tasks to traditional models while utilizing LLMs to (i) integrate multiple recommendation outputs based on personalized tool suitability and (ii) reason over substitute and complement relationships grounded in user history. This design mitigates hallucination, scales to large catalogs, and enhances recommendation relevance through relational reasoning. Through extensive experiments on three public grocery datasets, we show that our framework achieves superior full-ranking performance, yielding on average a twofold improvement over its underlying tools. We also introduce a new LLM-based evaluation metric that jointly measures semantic alignment and ranking correctness.

Paper Structure

This paper contains 43 sections, 14 equations, 5 figures, 13 tables.

Figures (5)

  • Figure 1: Two major drawbacks in directly deploying LLMs for recommendation: LLMs can hallucinate items not existing in the actual product catalog, and token limits of LLMs make them unsuitable for ranking items at scale. AgentDR addresses both limitations by delegating full-ranking tasks to recommendation tools.
  • Figure 2: The framework of AgentDR. Each user agent is equipped with two memory modules: RecTool memory to store tool suitability, and intent memory to track user intent. Ranking results from recommendation tools are aggregated based on RecTool memory. The aggregated result is further refined by dual S&C and general ranking modules.
  • Figure 3: Performance of AgentDR on VDCG without LLM-based modules. The bars of w/o LLM refer to AgentDR without both reranking and tool comparison on three datasets.
  • Figure 4: Ablation study on tool comparison or dual S&C reranking modules. The bars of S/C denote reranking based on substitutes or complements according to intent memory.
  • Figure 5: The KDE plot visualizing the distributions of RecTool memory in all user agents after optimization.