Process-Supervised LLM Recommenders via Flow-guided Tuning
Chongming Gao, Mengyao Gao, Chenxiao Fan, Shuai Yuan, Wentao Shi, Xiangnan He
TL;DR
This work addresses popularity bias and limited diversity in LLM-based recommender systems trained with cross-entropy fine-tuning. It introduces Flower, a flow-guided fine-tuning framework that uses Generative Flow Networks to propagate item-level rewards down to token-level rewards on a prefix-tree, aligning next-token probabilities with reward signals. By decomposing item rewards into token rewards and incorporating personalized preferences via $p_{ui}$, Flower achieves distribution matching, improved diversity, and enhanced fairness while preserving accuracy; it remains compatible with post-hoc alignment methods. Across three real-world datasets, Flower outperforms SFT-based baselines on distribution fitting, accuracy, fairness, and diversity, and demonstrates robustness as a reference policy for RL/DPO-based alignment, with code available at the authors' GitHub repository.
Abstract
While large language models (LLMs) are increasingly adapted for recommendation systems via supervised fine-tuning (SFT), this approach amplifies popularity bias due to its likelihood maximization objective, compromising recommendation diversity and fairness. To address this, we present Flow-guided fine-tuning recommender (Flower), which replaces SFT with a Generative Flow Network (GFlowNet) framework that enacts process supervision through token-level reward propagation. Flower's key innovation lies in decomposing item-level rewards into constituent token rewards, enabling direct alignment between token generation probabilities and their reward signals. This mechanism achieves three critical advancements: (1) popularity bias mitigation and fairness enhancement through empirical distribution matching, (2) preservation of diversity through GFlowNet's proportional sampling, and (3) flexible integration of personalized preferences via adaptable token rewards. Experiments demonstrate Flower's superior distribution-fitting capability and its significant advantages over traditional SFT in terms of accuracy, fairness, and diversity, highlighting its potential to improve LLM-based recommendation systems. The implementation is available via https://github.com/MrPeach0301/Flower
