Table of Contents
Fetching ...

Align$^3$GR: Unified Multi-Level Alignment for LLM-based Generative Recommendation

Wencai Ye, Mingjie Sun, Shuhang Chen, Wenjin Wu, Peng Jiang

TL;DR

Problem: bridging semantic and behavioral gaps when applying LLMs to personalized recommendations. Approach: Align$^3$GR unifies token-level, behavior-level, and preference-level alignment via dual SCID tokenization, bidirectional SFT, and progressive DPO with SP-DPO and RF-DPO. Findings: achieves strong offline gains (e.g., +17.8% Recall@10 and +20.2% NDCG@10 on Instruments) and meaningful online revenue improvements in production deployments. Significance: provides a scalable, end-to-end blueprint for deploying LLM-based generative recommender systems in industry.

Abstract

Large Language Models (LLMs) demonstrate significant advantages in leveraging structured world knowledge and multi-step reasoning capabilities. However, fundamental challenges arise when transforming LLMs into real-world recommender systems due to semantic and behavioral misalignment. To bridge this gap, we propose Align$^3$GR, a novel framework that unifies token-level, behavior modeling-level, and preference-level alignment. Our approach introduces: Dual tokenization fusing user-item semantic and collaborative signals. Enhanced behavior modeling with bidirectional semantic alignment. Progressive DPO strategy combining self-play (SP-DPO) and real-world feedback (RF-DPO) for dynamic preference adaptation. Experiments show Align$^3$GR outperforms the SOTA baseline by +17.8% in Recall@10 and +20.2% in NDCG@10 on the public dataset, with significant gains in online A/B tests and full-scale deployment on an industrial large-scale recommendation platform.

Align$^3$GR: Unified Multi-Level Alignment for LLM-based Generative Recommendation

TL;DR

Problem: bridging semantic and behavioral gaps when applying LLMs to personalized recommendations. Approach: AlignGR unifies token-level, behavior-level, and preference-level alignment via dual SCID tokenization, bidirectional SFT, and progressive DPO with SP-DPO and RF-DPO. Findings: achieves strong offline gains (e.g., +17.8% Recall@10 and +20.2% NDCG@10 on Instruments) and meaningful online revenue improvements in production deployments. Significance: provides a scalable, end-to-end blueprint for deploying LLM-based generative recommender systems in industry.

Abstract

Large Language Models (LLMs) demonstrate significant advantages in leveraging structured world knowledge and multi-step reasoning capabilities. However, fundamental challenges arise when transforming LLMs into real-world recommender systems due to semantic and behavioral misalignment. To bridge this gap, we propose AlignGR, a novel framework that unifies token-level, behavior modeling-level, and preference-level alignment. Our approach introduces: Dual tokenization fusing user-item semantic and collaborative signals. Enhanced behavior modeling with bidirectional semantic alignment. Progressive DPO strategy combining self-play (SP-DPO) and real-world feedback (RF-DPO) for dynamic preference adaptation. Experiments show AlignGR outperforms the SOTA baseline by +17.8% in Recall@10 and +20.2% in NDCG@10 on the public dataset, with significant gains in online A/B tests and full-scale deployment on an industrial large-scale recommendation platform.

Paper Structure

This paper contains 15 sections, 3 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: LLM-to-Recommendation Alignment Pipeline.
  • Figure 2: (a) The architecture of Align$^3$GR, a unified multi-level alignment framework for generative recommendation, which integrates hierarchical dual SCID, multi-task SFT, and progressive DPO. (b) Token-level alignment is achieved through user-item dual SC encoders and RQ-VAEs. (c) Preference-level alignment is accomplished via progressive SP-DPO and RF-DPO.
  • Figure 3: Behavior Modeling-level Alignment.
  • Figure 4: Recall@10 (%) under incremental alignment configurations; "Single+SEQ" denotes using item-side semantic IDs as tokens for the sequence task, while "+" indicates cumulative addition of each module.