RPAF: A Reinforcement Prediction-Allocation Framework for Cache Allocation in Large-Scale Recommender Systems
Shuo Su, Xiaoshuang Chen, Yao Wang, Yulin Wu, Ziqiang Zhang, Kaiqiao Zhan, Ben Wang, Kun Gai
TL;DR
RPAF tackles cache-based caching under strict, global per-period budgets in large-scale recommender systems. It decomposes the problem into a prediction stage that uses a constrained RL approach with a Relaxed Local Allocator (RLA) and an allocation stage that employs PoolRank for streaming decisions, ensuring budget feasibility. The approach demonstrates superior performance to state-of-the-art baselines in offline simulations and delivers practical gains in online deployments, including improved daily watch time and user engagement. The work advances cache-augmented recommender design by explicitly modeling value-strategy dependency and enabling real-time streaming allocation under tight budgets.
Abstract
Modern recommender systems are built upon computation-intensive infrastructure, and it is challenging to perform real-time computation for each request, especially in peak periods, due to the limited computational resources. Recommending by user-wise result caches is widely used when the system cannot afford a real-time recommendation. However, it is challenging to allocate real-time and cached recommendations to maximize the users' overall engagement. This paper shows two key challenges to cache allocation, i.e., the value-strategy dependency and the streaming allocation. Then, we propose a reinforcement prediction-allocation framework (RPAF) to address these issues. RPAF is a reinforcement-learning-based two-stage framework containing prediction and allocation stages. The prediction stage estimates the values of the cache choices considering the value-strategy dependency, and the allocation stage determines the cache choices for each individual request while satisfying the global budget constraint. We show that the challenge of training RPAF includes globality and the strictness of budget constraints, and a relaxed local allocator (RLA) is proposed to address this issue. Moreover, a PoolRank algorithm is used in the allocation stage to deal with the streaming allocation problem. Experiments show that RPAF significantly improves users' engagement under computational budget constraints.
