Unleash the Potential of Long Semantic IDs for Generative Recommendation
Ming Xia, Zhiqin Zhou, Guoxin Ma, Dongmin Huang
TL;DR
ACERec tackles the granularity mismatch in generative item recommendations by decoupling long semantic IDs from the sequential modeling process. It introduces an Attentive Token Merger to distill long IDs into compact latent tokens and an Intent Token as a dynamic anchor, trained with a dual-granularity objective that combines token-level reconstruction and item-level semantic alignment, enabling parallel, exact holistic scoring during inference. Across six Amazon datasets, ACERec achieves state-of-the-art performance with substantial $NDCG@10$ gains and demonstrates robustness in cold-start scenarios, while significantly improving inference efficiency. By bridging semantic richness and computational feasibility, ACERec offers a scalable, semantically aware approach to generative sequential recommendation.
Abstract
Semantic ID-based generative recommendation represents items as sequences of discrete tokens, but it inherently faces a trade-off between representational expressiveness and computational efficiency. Residual Quantization (RQ)-based approaches restrict semantic IDs to be short to enable tractable sequential modeling, while Optimized Product Quantization (OPQ)-based methods compress long semantic IDs through naive rigid aggregation, inevitably discarding fine-grained semantic information. To resolve this dilemma, we propose ACERec, a novel framework that decouples the granularity gap between fine-grained tokenization and efficient sequential modeling. It employs an Attentive Token Merger to distill long expressive semantic tokens into compact latents and introduces a dedicated Intent Token serving as a dynamic prediction anchor. To capture cohesive user intents, we guide the learning process via a dual-granularity objective, harmonizing fine-grained token prediction with global item-level semantic alignment. Extensive experiments on six real-world benchmarks demonstrate that ACERec consistently outperforms state-of-the-art baselines, achieving an average improvement of 14.40\% in NDCG@10, effectively reconciling semantic expressiveness and computational efficiency.
