Table of Contents
Fetching ...

RPAF: A Reinforcement Prediction-Allocation Framework for Cache Allocation in Large-Scale Recommender Systems

Shuo Su, Xiaoshuang Chen, Yao Wang, Yulin Wu, Ziqiang Zhang, Kaiqiao Zhan, Ben Wang, Kun Gai

TL;DR

RPAF tackles cache-based caching under strict, global per-period budgets in large-scale recommender systems. It decomposes the problem into a prediction stage that uses a constrained RL approach with a Relaxed Local Allocator (RLA) and an allocation stage that employs PoolRank for streaming decisions, ensuring budget feasibility. The approach demonstrates superior performance to state-of-the-art baselines in offline simulations and delivers practical gains in online deployments, including improved daily watch time and user engagement. The work advances cache-augmented recommender design by explicitly modeling value-strategy dependency and enabling real-time streaming allocation under tight budgets.

Abstract

Modern recommender systems are built upon computation-intensive infrastructure, and it is challenging to perform real-time computation for each request, especially in peak periods, due to the limited computational resources. Recommending by user-wise result caches is widely used when the system cannot afford a real-time recommendation. However, it is challenging to allocate real-time and cached recommendations to maximize the users' overall engagement. This paper shows two key challenges to cache allocation, i.e., the value-strategy dependency and the streaming allocation. Then, we propose a reinforcement prediction-allocation framework (RPAF) to address these issues. RPAF is a reinforcement-learning-based two-stage framework containing prediction and allocation stages. The prediction stage estimates the values of the cache choices considering the value-strategy dependency, and the allocation stage determines the cache choices for each individual request while satisfying the global budget constraint. We show that the challenge of training RPAF includes globality and the strictness of budget constraints, and a relaxed local allocator (RLA) is proposed to address this issue. Moreover, a PoolRank algorithm is used in the allocation stage to deal with the streaming allocation problem. Experiments show that RPAF significantly improves users' engagement under computational budget constraints.

RPAF: A Reinforcement Prediction-Allocation Framework for Cache Allocation in Large-Scale Recommender Systems

TL;DR

RPAF tackles cache-based caching under strict, global per-period budgets in large-scale recommender systems. It decomposes the problem into a prediction stage that uses a constrained RL approach with a Relaxed Local Allocator (RLA) and an allocation stage that employs PoolRank for streaming decisions, ensuring budget feasibility. The approach demonstrates superior performance to state-of-the-art baselines in offline simulations and delivers practical gains in online deployments, including improved daily watch time and user engagement. The work advances cache-augmented recommender design by explicitly modeling value-strategy dependency and enabling real-time streaming allocation under tight budgets.

Abstract

Modern recommender systems are built upon computation-intensive infrastructure, and it is challenging to perform real-time computation for each request, especially in peak periods, due to the limited computational resources. Recommending by user-wise result caches is widely used when the system cannot afford a real-time recommendation. However, it is challenging to allocate real-time and cached recommendations to maximize the users' overall engagement. This paper shows two key challenges to cache allocation, i.e., the value-strategy dependency and the streaming allocation. Then, we propose a reinforcement prediction-allocation framework (RPAF) to address these issues. RPAF is a reinforcement-learning-based two-stage framework containing prediction and allocation stages. The prediction stage estimates the values of the cache choices considering the value-strategy dependency, and the allocation stage determines the cache choices for each individual request while satisfying the global budget constraint. We show that the challenge of training RPAF includes globality and the strictness of budget constraints, and a relaxed local allocator (RLA) is proposed to address this issue. Moreover, a PoolRank algorithm is used in the allocation stage to deal with the streaming allocation problem. Experiments show that RPAF significantly improves users' engagement under computational budget constraints.
Paper Structure (29 sections, 3 theorems, 29 equations, 11 figures, 3 tables, 2 algorithms)

This paper contains 29 sections, 3 theorems, 29 equations, 11 figures, 3 tables, 2 algorithms.

Key Result

proposition 1

Given $\mathbb{E}\left[R_t^u|a_t^u\right]$ for each $u$ and each $a_t^u\in\{0,1\}$, the solution to the CacheAlloc-Simplified is: where $\textbf{arg-top}_M$ means that $a_t^u=1$ if $u$ is in the top $M$ users with $c_t^u=1$ ranked by the given scores, and otherwise $a_t^u=0$.

Figures (11)

  • Figure 1: Recommendation with a result cache.
  • Figure 2: WatchTime decreases when continuously receiving cached recommendations.
  • Figure 3: The Cache Allocation Problem.
  • Figure 4: The Real Cache Allocation Problem.
  • Figure 5: The RPAF method.
  • ...and 6 more figures

Theorems & Definitions (3)

  • proposition 1
  • corollary 1
  • proposition 2