Uncertainty-aware Generative Recommendation
Chenxiao Fan, Chongming Gao, Yaxin Gong, Haoyan Liu, Fuli Feng, Xiangnan He
TL;DR
This work tackles uncertainty blindness in generative recommendation by modeling and exploiting uncertainty as a learning signal. It introduces Uncertainty-aware Generative Recommendation (UGR), which combines an uncertainty-weighted rollout reward, difficulty-aware optimization, and explicit confidence alignment on SID-based representations with constrained rollout. Empirical results across three real-world datasets show state-of-the-art performance and notably improved training stability, while explicit confidence signals enable risk-aware downstream tasks such as dynamic ranking and selective rejection. The approach demonstrates that explicit uncertainty modeling is essential for robust, trustworthy generative recommender systems and points to avenues for online adaptation and broader applicability.
Abstract
Generative Recommendation has emerged as a transformative paradigm, reformulating recommendation as an end-to-end autoregressive sequence generation task. Despite its promise, existing preference optimization methods typically rely on binary outcome correctness, suffering from a systemic limitation we term uncertainty blindness. This issue manifests in the neglect of the model's intrinsic generation confidence, the variation in sample learning difficulty, and the lack of explicit confidence expression, directly leading to unstable training dynamics and unquantifiable decision risks. In this paper, we propose Uncertainty-aware Generative Recommendation (UGR), a unified framework that leverages uncertainty as a critical signal for adaptive optimization. UGR synergizes three mechanisms: (1) an uncertainty-weighted reward to penalize confident errors; (2) difficulty-aware optimization dynamics to prevent premature convergence; and (3) explicit confidence alignment to empower the model with confidence expression capabilities. Extensive experiments demonstrate that UGR not only yields superior recommendation performance but also fundamentally stabilizes training, preventing the performance degradation often observed in standard methods. Furthermore, the learned confidence enables reliable downstream risk-aware applications.
