Unleash the Potential of Long Semantic IDs for Generative Recommendation

Ming Xia; Zhiqin Zhou; Guoxin Ma; Dongmin Huang

Unleash the Potential of Long Semantic IDs for Generative Recommendation

Ming Xia, Zhiqin Zhou, Guoxin Ma, Dongmin Huang

TL;DR

ACERec tackles the granularity mismatch in generative item recommendations by decoupling long semantic IDs from the sequential modeling process. It introduces an Attentive Token Merger to distill long IDs into compact latent tokens and an Intent Token as a dynamic anchor, trained with a dual-granularity objective that combines token-level reconstruction and item-level semantic alignment, enabling parallel, exact holistic scoring during inference. Across six Amazon datasets, ACERec achieves state-of-the-art performance with substantial $NDCG@10$ gains and demonstrates robustness in cold-start scenarios, while significantly improving inference efficiency. By bridging semantic richness and computational feasibility, ACERec offers a scalable, semantically aware approach to generative sequential recommendation.

Abstract

Semantic ID-based generative recommendation represents items as sequences of discrete tokens, but it inherently faces a trade-off between representational expressiveness and computational efficiency. Residual Quantization (RQ)-based approaches restrict semantic IDs to be short to enable tractable sequential modeling, while Optimized Product Quantization (OPQ)-based methods compress long semantic IDs through naive rigid aggregation, inevitably discarding fine-grained semantic information. To resolve this dilemma, we propose ACERec, a novel framework that decouples the granularity gap between fine-grained tokenization and efficient sequential modeling. It employs an Attentive Token Merger to distill long expressive semantic tokens into compact latents and introduces a dedicated Intent Token serving as a dynamic prediction anchor. To capture cohesive user intents, we guide the learning process via a dual-granularity objective, harmonizing fine-grained token prediction with global item-level semantic alignment. Extensive experiments on six real-world benchmarks demonstrate that ACERec consistently outperforms state-of-the-art baselines, achieving an average improvement of 14.40\% in NDCG@10, effectively reconciling semantic expressiveness and computational efficiency.

Unleash the Potential of Long Semantic IDs for Generative Recommendation

TL;DR

gains and demonstrates robustness in cold-start scenarios, while significantly improving inference efficiency. By bridging semantic richness and computational feasibility, ACERec offers a scalable, semantically aware approach to generative sequential recommendation.

Abstract

Paper Structure (45 sections, 9 equations, 12 figures, 6 tables, 2 algorithms)

This paper contains 45 sections, 9 equations, 12 figures, 6 tables, 2 algorithms.

Introduction
Methodology
Sequential Recommendation Task
Tokenization and Compression
Semantic Tokenization via OPQ
Latent Token Distillation via ATM
Intent-Centric Sequential Modeling
Modeling Evolving User Intent
Dual-Granularity Alignment
Efficient Inference via Holistic Scoring
Discussion
Comparison with Generative Baselines
Complexity and Efficiency
Experiment
Experimental Setup
...and 30 more sections

Figures (12)

Figure 1: Comparison of representation paradigms. Left: RQ (short/serial codes) with limited expressiveness. Middle: PQ + pooling causes semantic blurring. Right: ACERec merges long semantic tokens for expressiveness and efficiency.
Figure 2: Overview architecture of ACERec. The left panel illustrates the sequence encoding process, where historical items are tokenized and distilled into compact latents, then aggregated into user intents. The middle panel details ATM, which employs cross-attention to adaptively filter subspace signals into latent representations (\ref{['subsec: compress']}). The right panel displays the training and inference paradigms: the upper part depicts the dual-granularity optimization (\ref{['subsec: intent_token']}), while the lower part demonstrates the holistic candidate scoring strategy for efficient parallel retrieval (\ref{['subsec: infer']}).
Figure 3: Impact of compression ratio $r$ on recommendation performance.
Figure 4: Performance comparison between ACERec and short-digit OPQ baseline. Both models use the same input length for the recommender.
Figure 5: Cold-start analysis on Instruments and Baby datasets, with NDCG@10 as performance metric.
...and 7 more figures

Unleash the Potential of Long Semantic IDs for Generative Recommendation

TL;DR

Abstract

Unleash the Potential of Long Semantic IDs for Generative Recommendation

Authors

TL;DR

Abstract

Table of Contents

Figures (12)