Table of Contents
Fetching ...

AgenticTagger: Structured Item Representation for Recommendation with LLM Agents

Zhouhang Xie, Bo Peng, Zhankui He, Ziqi Chen, Alice Han, Isabella Ye, Benjamin Coleman, Noveen Sachdeva, Fernando Pereira, Julian McAuley, Wang-Cheng Kang, Derek Zhiyuan Cheng, Beidou Wang, Randolph Brown

TL;DR

The paper tackles the challenge of producing high-quality, low-cardinality item representations for recommender systems by generating structured, hierarchical descriptors with LLMs. It introduces AgenticTagger, a two-stage framework that builds a ground-truth-like vocabulary via a multi-agent self-refinement loop (Architect maintains the vocabulary while Annotator LLMs annotate items in parallel) and then assigns vocabulary descriptors to items to create discrete semantic IDs. Across public Amazon domains and a private corpus, AgenticTagger yields consistent improvements in generative retrieval and ranking, and enables term-based retrieval and critique-based controllability, while also offering interpretability and scalability. The work demonstrates the practical impact of agentic feature generation for RecSys, providing a scalable, flexible approach that can adapt to evolving LLM capabilities and downstream models. Future work could integrate collaborative filtering signals during vocabulary construction and explore cross-feature interactions to further boost performance and explainability.

Abstract

High-quality representations are a core requirement for effective recommendation. In this work, we study the problem of LLM-based descriptor generation, i.e., keyphrase-like natural language item representation generation frameworks with minimal constraints on downstream applications. We propose AgenticTagger, a framework that queries LLMs for representing items with sequences of text descriptors. However, open-ended generation provides little control over the generation space, leading to high cardinality, low-performance descriptors that renders downstream modeling challenging. To this end, AgenticTagger features two core stages: (1) a vocabulary building stage where a set of hierarchical, low-cardinality, and high-quality descriptors is identified, and (2) a vocabulary assignment stage where LLMs assign in-vocabulary descriptors to items. To effectively and efficiently ground vocabulary in the item corpus of interest, we design a multi-agent reflection mechanism where an architect LLM iteratively refines the vocabulary guided by parallelized feedback from annotator LLMs that validates the vocabulary against item data. Experiments on public and private data show AgenticTagger brings consistent improvements across diverse recommendation scenarios, including generative and term-based retrieval, ranking, and controllability-oriented, critique-based recommendation.

AgenticTagger: Structured Item Representation for Recommendation with LLM Agents

TL;DR

The paper tackles the challenge of producing high-quality, low-cardinality item representations for recommender systems by generating structured, hierarchical descriptors with LLMs. It introduces AgenticTagger, a two-stage framework that builds a ground-truth-like vocabulary via a multi-agent self-refinement loop (Architect maintains the vocabulary while Annotator LLMs annotate items in parallel) and then assigns vocabulary descriptors to items to create discrete semantic IDs. Across public Amazon domains and a private corpus, AgenticTagger yields consistent improvements in generative retrieval and ranking, and enables term-based retrieval and critique-based controllability, while also offering interpretability and scalability. The work demonstrates the practical impact of agentic feature generation for RecSys, providing a scalable, flexible approach that can adapt to evolving LLM capabilities and downstream models. Future work could integrate collaborative filtering signals during vocabulary construction and explore cross-feature interactions to further boost performance and explainability.

Abstract

High-quality representations are a core requirement for effective recommendation. In this work, we study the problem of LLM-based descriptor generation, i.e., keyphrase-like natural language item representation generation frameworks with minimal constraints on downstream applications. We propose AgenticTagger, a framework that queries LLMs for representing items with sequences of text descriptors. However, open-ended generation provides little control over the generation space, leading to high cardinality, low-performance descriptors that renders downstream modeling challenging. To this end, AgenticTagger features two core stages: (1) a vocabulary building stage where a set of hierarchical, low-cardinality, and high-quality descriptors is identified, and (2) a vocabulary assignment stage where LLMs assign in-vocabulary descriptors to items. To effectively and efficiently ground vocabulary in the item corpus of interest, we design a multi-agent reflection mechanism where an architect LLM iteratively refines the vocabulary guided by parallelized feedback from annotator LLMs that validates the vocabulary against item data. Experiments on public and private data show AgenticTagger brings consistent improvements across diverse recommendation scenarios, including generative and term-based retrieval, ranking, and controllability-oriented, critique-based recommendation.
Paper Structure (39 sections, 1 equation, 6 figures, 9 tables, 2 algorithms)

This paper contains 39 sections, 1 equation, 6 figures, 9 tables, 2 algorithms.

Figures (6)

  • Figure 1: Illustration of the proposed method. We represent items using an ordered sequence of coarse-to-fine LLM-generated natural language descriptors, directly extracting task-relevant information, such as genres for an item corpus of CDs and Vinyls.
  • Figure 2: The AgenticTagger Framework. AgenticTagger exploits LLMs' content-understanding ability to automatically construct an interpretable, hierarchical descriptor vocabulary from data. Then, relevant descriptors in the vocabulary could be assigned to corresponding items by querying LLMs in parallel over the item corpus. These features could then power various downstream applications, such as generative and term-based retrieval, ranking, and critique-based recommendation.
  • Figure 3: Comparison against feature-crossing-based methods in NDCG@10. For ActionPiece, we compare against its ablated variant without inference time model ensembling, to isolate the effect on item-content understanding.
  • Figure 4: Comparison between AgenticTagger (AT) and free-form generated discriptor-tags (FT) variants on term-based retrieval. Top: Recall@k results across baselines. Bottom: Performance-utilization trade-off. AgenticTagger achieves the best cardinality-performance trade-off, with further gain when combined with FT. Our method achieves up to 81.9% feature utilization compared to only 6.1% for free-form generation.
  • Figure 5: Distribution of coverage changes between optimization steps across layers. We report level 2-6, which have enough trials to obtain meaningful results for analysis.
  • ...and 1 more figures