Table of Contents
Fetching ...

DeepInterestGR: Mining Deep Multi-Interest Using Multi-Modal LLMs for Generative Recommendation

Yangchen Zeng

TL;DR

DeepInterestGR adopts a two-stage training pipeline: supervised fine-tuning aligns the generative model with deep interest signals and collaborative filtering patterns, followed by reinforcement learning with GRPO optimized by the authors' Interest-Aware Reward.

Abstract

Recent generative recommendation frameworks have demonstrated remarkable scaling potential by reformulating item prediction as autoregressive Semantic ID (SID) generation. However, existing methods primarily rely on shallow behavioral signals, encoding items solely through surface-level textual features such as titles and descriptions. This reliance results in a critical Shallow Interest problem: the model fails to capture the latent, semantically rich interests underlying user interactions, limiting both personalization depth and recommendation interpretability. DeepInterestGR introduces three key innovations: (1) Multi-LLM Interest Mining (MLIM): We leverage multiple frontier LLMs along with their multi-modal variants to extract deep textual and visual interest representations through Chain-of-Thought prompting. (2) Reward-Labeled Deep Interest (RLDI): We employ a lightweight binary classifier to assign reward labels to mined interests, enabling effective supervision signals for reinforcement learning. (3) Interest-Enhanced Item Discretization (IEID): The curated deep interests are encoded into semantic embeddings and quantized into SID tokens via RQ-VAE. We adopt a two-stage training pipeline: supervised fine-tuning aligns the generative model with deep interest signals and collaborative filtering patterns, followed by reinforcement learning with GRPO optimized by our Interest-Aware Reward. Experiments on three Amazon Review benchmarks demonstrate that DeepInterestGR consistently outperforms state-of-the-art baselines across HR@K and NDCG@K metrics.

DeepInterestGR: Mining Deep Multi-Interest Using Multi-Modal LLMs for Generative Recommendation

TL;DR

DeepInterestGR adopts a two-stage training pipeline: supervised fine-tuning aligns the generative model with deep interest signals and collaborative filtering patterns, followed by reinforcement learning with GRPO optimized by the authors' Interest-Aware Reward.

Abstract

Recent generative recommendation frameworks have demonstrated remarkable scaling potential by reformulating item prediction as autoregressive Semantic ID (SID) generation. However, existing methods primarily rely on shallow behavioral signals, encoding items solely through surface-level textual features such as titles and descriptions. This reliance results in a critical Shallow Interest problem: the model fails to capture the latent, semantically rich interests underlying user interactions, limiting both personalization depth and recommendation interpretability. DeepInterestGR introduces three key innovations: (1) Multi-LLM Interest Mining (MLIM): We leverage multiple frontier LLMs along with their multi-modal variants to extract deep textual and visual interest representations through Chain-of-Thought prompting. (2) Reward-Labeled Deep Interest (RLDI): We employ a lightweight binary classifier to assign reward labels to mined interests, enabling effective supervision signals for reinforcement learning. (3) Interest-Enhanced Item Discretization (IEID): The curated deep interests are encoded into semantic embeddings and quantized into SID tokens via RQ-VAE. We adopt a two-stage training pipeline: supervised fine-tuning aligns the generative model with deep interest signals and collaborative filtering patterns, followed by reinforcement learning with GRPO optimized by our Interest-Aware Reward. Experiments on three Amazon Review benchmarks demonstrate that DeepInterestGR consistently outperforms state-of-the-art baselines across HR@K and NDCG@K metrics.
Paper Structure (29 sections, 15 equations, 3 figures, 8 tables)

This paper contains 29 sections, 15 equations, 3 figures, 8 tables.

Figures (3)

  • Figure 1: Overview of the DeepInterestGR framework. Left: Multi-LLM Interest Mining (MLIM) extracts deep interests using frontier LLMs with Chain-of-Thought prompting, while Interest-Enhanced Item Discretization (IEID) encodes these interests into Semantic IDs via RQ-VAE. Right: Two-stage training pipeline consisting of Supervised Fine-Tuning (SFT) and Reinforcement Learning with GRPO, guided by our Interest-Aware Reward derived from Reward-Labeled Deep Interest (RLDI).
  • Figure 2: Component ablation study on the Beauty dataset. Green bar represents the full model, orange bars represent ablation variants. The percentage drop is shown for each removed component. MLIM contributes most significantly (-10.8%), followed by RL (-14.8%).
  • Figure 3: Impact of reinforcement learning with Interest-Aware Reward on three datasets. Orange bars represent SFT-only performance, while blue bars show final performance after RL training with GRPO. All datasets show consistent improvements (9.2%--12.1%).