Order-agnostic Identifier for Large Language Model-based Generative Recommendation

Xinyu Lin; Haihan Shi; Wenjie Wang; Fuli Feng; Qifan Wang; See-Kiong Ng; Tat-Seng Chua

Order-agnostic Identifier for Large Language Model-based Generative Recommendation

Xinyu Lin, Haihan Shi, Wenjie Wang, Fuli Feng, Qifan Wang, See-Kiong Ng, Tat-Seng Chua

TL;DR

This work targets the limitations of current item identifiers in LLM-based generative recommendation, notably local optima in beam search for token-sequence identifiers and inefficiencies in autoregressive generation. It proposes two design principles—integrating semantic and CF information and using order-agnostic set identifiers—and introduces SETRec, a framework that tokenizes each item into a set of order-agnostic tokens derived from CF and semantic information. SETRec enables simultaneous generation via query-guided tokens and employs a sparse attention mechanism to eliminate intra-item dependencies, paired with grounding heads to map generated tokens to real items. Empirical results across four real-world datasets and two LLM architectures (T5 and Qwen, from 1.5B to 7B) show SETRec achieves superior recommendation performance and substantial efficiency gains, with strong generalization to cold-start items and scalable benefits as model size increases. The findings highlight the practical potential of multi-dimensional, order-agnostic item representations for deploying effective and efficient LLM-based recommender systems.

Abstract

Leveraging Large Language Models (LLMs) for generative recommendation has attracted significant research interest, where item tokenization is a critical step. It involves assigning item identifiers for LLMs to encode user history and generate the next item. Existing approaches leverage either token-sequence identifiers, representing items as discrete token sequences, or single-token identifiers, using ID or semantic embeddings. Token-sequence identifiers face issues such as the local optima problem in beam search and low generation efficiency due to step-by-step generation. In contrast, single-token identifiers fail to capture rich semantics or encode Collaborative Filtering (CF) information, resulting in suboptimal performance. To address these issues, we propose two fundamental principles for item identifier design: 1) integrating both CF and semantic information to fully capture multi-dimensional item information, and 2) designing order-agnostic identifiers without token dependency, mitigating the local optima issue and achieving simultaneous generation for generation efficiency. Accordingly, we introduce a novel set identifier paradigm for LLM-based generative recommendation, representing each item as a set of order-agnostic tokens. To implement this paradigm, we propose SETRec, which leverages CF and semantic tokenizers to obtain order-agnostic multi-dimensional tokens. To eliminate token dependency, SETRec uses a sparse attention mask for user history encoding and a query-guided generation mechanism for simultaneous token generation. We instantiate SETRec on T5 and Qwen (from 1.5B to 7B). Extensive experiments demonstrate its effectiveness under various scenarios (e.g., full ranking, warm- and cold-start ranking, and various item popularity groups). Moreover, results validate SETRec's superior efficiency and show promising scalability on cold-start items as model sizes increase.

Order-agnostic Identifier for Large Language Model-based Generative Recommendation

TL;DR

Abstract

Order-agnostic Identifier for Large Language Model-based Generative Recommendation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)