Modeling User Retention through Generative Flow Networks
Ziru Liu, Shuchang Liu, Bin Yang, Zhenghai Xue, Qingpeng Cai, Xiangyu Zhao, Zijian Zhang, Lantao Hu, Han Li, Peng Jiang
TL;DR
The paper tackles the challenge of optimizing user retention in recommender systems, a cross-session, delayed-signal problem poorly addressed by traditional immediate-feedback objectives. It introduces GFN4Retention, which treats a user session as a generative trajectory governed by a forward flow $P_F(s_{t+1}|s_t)$ and a state flow $ ext{F}(s_t)$, aiming for $P( extbf{S}) \propto ext{F}(s_T)$. A novel integrated reward design $R( extbf{S}) = ext{R} \times e^{\alpha \sum_{t=1}^{T-1} r_t}$ and a flow factorization $ ext{F}(s_t)= ext{F}_R(s_t)(\text{F}_I(s_t))^{\alpha}$ with $\text{F}_I(s_t)=e^{\sum_{j=1}^{t-1} r_j}$ enable back-propagation of both immediate rewards and the retention signal via a refined Detailed Balance loss: $\mathcal{L}_{DB} = (\log \text{F}_R(s_t) + \log P_F(s_{t+1}|s_t) - \log \text{F}_R(s_{t+1}) - \log P_B(s_t|s_{t+1}) - \alpha r_t)^2$. The approach is validated on offline datasets and through live A/B testing, showing improvements in retention and meaningful engagement metrics while maintaining stable exploration, with notable gains for less active users. This work advances practical retention-aware recommendations by linking end-of-session outcomes to per-step actions via generative flows and integrated rewards.
Abstract
Recommender systems aim to fulfill the user's daily demands. While most existing research focuses on maximizing the user's engagement with the system, it has recently been pointed out that how frequently the users come back for the service also reflects the quality and stability of recommendations. However, optimizing this user retention behavior is non-trivial and poses several challenges including the intractable leave-and-return user activities, the sparse and delayed signal, and the uncertain relations between users' retention and their immediate feedback towards each item in the recommendation list. In this work, we regard the retention signal as an overall estimation of the user's end-of-session satisfaction and propose to estimate this signal through a probabilistic flow. This flow-based modeling technique can back-propagate the retention reward towards each recommended item in the user session, and we show that the flow combined with traditional learning-to-rank objectives eventually optimizes a non-discounted cumulative reward for both immediate user feedback and user retention. We verify the effectiveness of our method through both offline empirical studies on two public datasets and online A/B tests in an industrial platform.
