Table of Contents
Fetching ...

Inductive Generative Recommendation via Retrieval-based Speculation

Yijie Ding, Jiacheng Li, Julian McAuley, Yupeng Hou

TL;DR

Generative recommendation models struggle to propose unseen items in inductive settings. SpecGR addresses this by a draft-then-verify framework: an inductive drafter proposes candidate items (including new ones), and a GR verifier scores and selects high-quality candidates, with guided re-drafting to align drafts with verifier outputs. A second variant, SpecGR++, reuses the GR encoder as the drafter and employs contrastive pretraining and large-batch fine-tuning to strengthen inductive representations. Across three real-world datasets, SpecGR achieves strong inductive generalization and the best overall performance, while maintaining scalable inference through adaptive exiting and efficient drafting. The framework is modular and plug-and-play, improving a range of drafters and GR backbones with notable gains in inductive and overall metrics, offering practical impact for dynamic item spaces.

Abstract

Generative recommendation (GR) is an emerging paradigm that tokenizes items into discrete tokens and learns to autoregressively generate the next tokens as predictions. While this token-generation paradigm is expected to surpass traditional transductive methods, potentially generating new items directly based on semantics, we empirically show that GR models predominantly generate items seen during training and struggle to recommend unseen items. In this paper, we propose SpecGR, a plug-and-play framework that enables GR models to recommend new items in an inductive setting. SpecGR uses a drafter model with inductive capability to propose candidate items, which may include both existing items and new items. The GR model then acts as a verifier, accepting or rejecting candidates while retaining its strong ranking capabilities. We further introduce the guided re-drafting technique to make the proposed candidates more aligned with the outputs of generative recommendation models, improving the verification efficiency. We consider two variants for drafting: (1) using an auxiliary drafter model for better flexibility, or (2) leveraging the GR model's own encoder for parameter-efficient self-drafting. Extensive experiments on three real-world datasets demonstrate that SpecGR exhibits both strong inductive recommendation ability and the best overall performance among the compared methods. Our code is available at: https://github.com/Jamesding000/SpecGR.

Inductive Generative Recommendation via Retrieval-based Speculation

TL;DR

Generative recommendation models struggle to propose unseen items in inductive settings. SpecGR addresses this by a draft-then-verify framework: an inductive drafter proposes candidate items (including new ones), and a GR verifier scores and selects high-quality candidates, with guided re-drafting to align drafts with verifier outputs. A second variant, SpecGR++, reuses the GR encoder as the drafter and employs contrastive pretraining and large-batch fine-tuning to strengthen inductive representations. Across three real-world datasets, SpecGR achieves strong inductive generalization and the best overall performance, while maintaining scalable inference through adaptive exiting and efficient drafting. The framework is modular and plug-and-play, improving a range of drafters and GR backbones with notable gains in inductive and overall metrics, offering practical impact for dynamic item spaces.

Abstract

Generative recommendation (GR) is an emerging paradigm that tokenizes items into discrete tokens and learns to autoregressively generate the next tokens as predictions. While this token-generation paradigm is expected to surpass traditional transductive methods, potentially generating new items directly based on semantics, we empirically show that GR models predominantly generate items seen during training and struggle to recommend unseen items. In this paper, we propose SpecGR, a plug-and-play framework that enables GR models to recommend new items in an inductive setting. SpecGR uses a drafter model with inductive capability to propose candidate items, which may include both existing items and new items. The GR model then acts as a verifier, accepting or rejecting candidates while retaining its strong ranking capabilities. We further introduce the guided re-drafting technique to make the proposed candidates more aligned with the outputs of generative recommendation models, improving the verification efficiency. We consider two variants for drafting: (1) using an auxiliary drafter model for better flexibility, or (2) leveraging the GR model's own encoder for parameter-efficient self-drafting. Extensive experiments on three real-world datasets demonstrate that SpecGR exhibits both strong inductive recommendation ability and the best overall performance among the compared methods. Our code is available at: https://github.com/Jamesding000/SpecGR.
Paper Structure (52 sections, 9 equations, 5 figures, 13 tables)

This paper contains 52 sections, 9 equations, 5 figures, 13 tables.

Figures (5)

  • Figure 1: (1 & 2) GR models struggle to generate unseen items in an inductive setting. (3) SpecGR, a draft-then-verify framework, leverages GR models to verify candidates from an inductive drafter, enabling new-item recommendations.
  • Figure 2: Illustration of the proposed SpecGR method, a draft-and-verify framework that iteratively performs drafting and verification until enough items are accepted. (a) Inductive Drafting. The inductive drafter first retrieves a set of candidates that contain new items. We present two drafting methods: using an auxiliary model or the GR’s encoder output (namely, self-drafting) for item retrieval. (b) Target-aware Verifying. The GR model accepts or rejects the candidates based on the likelihood of being the target. (c) Guided Redrafting. If not enough items are accepted, the GR filters the candidate space for the next drafting round based on the generated beam sequences.
  • Figure 3: Impact of hyperparameters on SpecGR's performance and efficiency. (Left, middle): Bars show the proportion of unseen items in recommendations. (Right): Bars represent inference latency in seconds. Lines depict the trade-off between in-sample and unseen Recall@50.
  • Figure A1: Inference speed acceleration factor w.r.t. different numbers of semantic ID digits.
  • Figure A2: (Left) Inference latency comparison for subset ranking. Both x- and y-axis use log scale. (Right) Acceptance rate comparison for different drafting strategies.