Table of Contents
Fetching ...

Uncertainty-aware Generative Recommendation

Chenxiao Fan, Chongming Gao, Yaxin Gong, Haoyan Liu, Fuli Feng, Xiangnan He

TL;DR

This work tackles uncertainty blindness in generative recommendation by modeling and exploiting uncertainty as a learning signal. It introduces Uncertainty-aware Generative Recommendation (UGR), which combines an uncertainty-weighted rollout reward, difficulty-aware optimization, and explicit confidence alignment on SID-based representations with constrained rollout. Empirical results across three real-world datasets show state-of-the-art performance and notably improved training stability, while explicit confidence signals enable risk-aware downstream tasks such as dynamic ranking and selective rejection. The approach demonstrates that explicit uncertainty modeling is essential for robust, trustworthy generative recommender systems and points to avenues for online adaptation and broader applicability.

Abstract

Generative Recommendation has emerged as a transformative paradigm, reformulating recommendation as an end-to-end autoregressive sequence generation task. Despite its promise, existing preference optimization methods typically rely on binary outcome correctness, suffering from a systemic limitation we term uncertainty blindness. This issue manifests in the neglect of the model's intrinsic generation confidence, the variation in sample learning difficulty, and the lack of explicit confidence expression, directly leading to unstable training dynamics and unquantifiable decision risks. In this paper, we propose Uncertainty-aware Generative Recommendation (UGR), a unified framework that leverages uncertainty as a critical signal for adaptive optimization. UGR synergizes three mechanisms: (1) an uncertainty-weighted reward to penalize confident errors; (2) difficulty-aware optimization dynamics to prevent premature convergence; and (3) explicit confidence alignment to empower the model with confidence expression capabilities. Extensive experiments demonstrate that UGR not only yields superior recommendation performance but also fundamentally stabilizes training, preventing the performance degradation often observed in standard methods. Furthermore, the learned confidence enables reliable downstream risk-aware applications.

Uncertainty-aware Generative Recommendation

TL;DR

This work tackles uncertainty blindness in generative recommendation by modeling and exploiting uncertainty as a learning signal. It introduces Uncertainty-aware Generative Recommendation (UGR), which combines an uncertainty-weighted rollout reward, difficulty-aware optimization, and explicit confidence alignment on SID-based representations with constrained rollout. Empirical results across three real-world datasets show state-of-the-art performance and notably improved training stability, while explicit confidence signals enable risk-aware downstream tasks such as dynamic ranking and selective rejection. The approach demonstrates that explicit uncertainty modeling is essential for robust, trustworthy generative recommender systems and points to avenues for online adaptation and broader applicability.

Abstract

Generative Recommendation has emerged as a transformative paradigm, reformulating recommendation as an end-to-end autoregressive sequence generation task. Despite its promise, existing preference optimization methods typically rely on binary outcome correctness, suffering from a systemic limitation we term uncertainty blindness. This issue manifests in the neglect of the model's intrinsic generation confidence, the variation in sample learning difficulty, and the lack of explicit confidence expression, directly leading to unstable training dynamics and unquantifiable decision risks. In this paper, we propose Uncertainty-aware Generative Recommendation (UGR), a unified framework that leverages uncertainty as a critical signal for adaptive optimization. UGR synergizes three mechanisms: (1) an uncertainty-weighted reward to penalize confident errors; (2) difficulty-aware optimization dynamics to prevent premature convergence; and (3) explicit confidence alignment to empower the model with confidence expression capabilities. Extensive experiments demonstrate that UGR not only yields superior recommendation performance but also fundamentally stabilizes training, preventing the performance degradation often observed in standard methods. Furthermore, the learned confidence enables reliable downstream risk-aware applications.
Paper Structure (43 sections, 13 equations, 5 figures, 4 tables)

This paper contains 43 sections, 13 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Comparison between existing Uncertainty-Blind methods (Left) and our proposed UGR (Right).
  • Figure 2: Overview of the proposed UGR framework. Given the user history, the model generates a candidate set via constrained beam search. These candidates effectively drive three synergistic mechanisms: (1) Uncertainty-weighted reward penalizes confident hallucinations based on logit intensity; (2) Difficulty-aware optimization adaptively re-weights the gradient budget based on ranking difficulty; (3) Explicit confidence alignment externalizes internal certainty into quantifiable risk signals.
  • Figure 3: Training dynamics of HR@1 and HR@16 on Office and Industrial datasets under varying difficulty coefficients $\alpha$. Standard uniform weighting ($\alpha=1.0$) leads to performance collapse due to overfitting, whereas UGR's difficulty-aware optimization ($\alpha=0$) prevents degradation and enables sustained learning.
  • Figure 4: Performance of risk-aware applications on the Office dataset. User-level Rejection (Left): Filtering low-confidence requests steadily improves NDCG@10. Item-level Truncation (Right): Dynamically removing low-confidence tail items effectively enhances Precision.
  • Figure 5: Analysis of candidate collapse during training. We track the average number of unique valid items generated per rollout (Beam Size $G=16$). The standard method ($\alpha=1.0$) succumbs to the overconfidence trap, causing a rapid collapse in candidate diversity due to extreme probability distributions on easy samples. Conversely, UGR ($\alpha=0$) suppresses this overconfidence via difficulty-aware optimization dynamics, consistently maintaining a diverse search space.