AgenticTagger: Structured Item Representation for Recommendation with LLM Agents
Zhouhang Xie, Bo Peng, Zhankui He, Ziqi Chen, Alice Han, Isabella Ye, Benjamin Coleman, Noveen Sachdeva, Fernando Pereira, Julian McAuley, Wang-Cheng Kang, Derek Zhiyuan Cheng, Beidou Wang, Randolph Brown
TL;DR
The paper tackles the challenge of producing high-quality, low-cardinality item representations for recommender systems by generating structured, hierarchical descriptors with LLMs. It introduces AgenticTagger, a two-stage framework that builds a ground-truth-like vocabulary via a multi-agent self-refinement loop (Architect maintains the vocabulary while Annotator LLMs annotate items in parallel) and then assigns vocabulary descriptors to items to create discrete semantic IDs. Across public Amazon domains and a private corpus, AgenticTagger yields consistent improvements in generative retrieval and ranking, and enables term-based retrieval and critique-based controllability, while also offering interpretability and scalability. The work demonstrates the practical impact of agentic feature generation for RecSys, providing a scalable, flexible approach that can adapt to evolving LLM capabilities and downstream models. Future work could integrate collaborative filtering signals during vocabulary construction and explore cross-feature interactions to further boost performance and explainability.
Abstract
High-quality representations are a core requirement for effective recommendation. In this work, we study the problem of LLM-based descriptor generation, i.e., keyphrase-like natural language item representation generation frameworks with minimal constraints on downstream applications. We propose AgenticTagger, a framework that queries LLMs for representing items with sequences of text descriptors. However, open-ended generation provides little control over the generation space, leading to high cardinality, low-performance descriptors that renders downstream modeling challenging. To this end, AgenticTagger features two core stages: (1) a vocabulary building stage where a set of hierarchical, low-cardinality, and high-quality descriptors is identified, and (2) a vocabulary assignment stage where LLMs assign in-vocabulary descriptors to items. To effectively and efficiently ground vocabulary in the item corpus of interest, we design a multi-agent reflection mechanism where an architect LLM iteratively refines the vocabulary guided by parallelized feedback from annotator LLMs that validates the vocabulary against item data. Experiments on public and private data show AgenticTagger brings consistent improvements across diverse recommendation scenarios, including generative and term-based retrieval, ranking, and controllability-oriented, critique-based recommendation.
