Table of Contents
Fetching ...

Revisiting Fairness-aware Interactive Recommendation: Item Lifecycle as a Control Knob

Yun Lu, Xiaoyu Shi, Hong Xie, Chongjun Xia, Zhenhui Gong, Mingsheng Shang

TL;DR

The paper tackles fairness in interactive recommender systems under dynamic item popularity by introducing item lifecycles as a control knob. It presents LHRL, a Lifecycle-aware Hierarchical Reinforcement Learning framework, combining PhaseFormer for real-time lifecycle detection with a two-tier RL architecture that decouples long-term fairness from short-term engagement. PhaseFormer leverages STL decomposition and an iTransformer to predict lifecycle stages, while the high- and low-level RL agents coordinate via phase-aware fairness weights and engagement optimization, guided by lifecycle-aware rewards. Experiments on real-world KuaiRec/KuaiRand-derived setups show that LHRL improves both fairness (lower exposure disparity) and user engagement (long-term metrics), and that lifecycle rewards generalize to other RL baselines, underscoring practical impact for sustainable, equitable streaming platforms.

Abstract

This paper revisits fairness-aware interactive recommendation (e.g., TikTok, KuaiShou) by introducing a novel control knob, i.e., the lifecycle of items. We make threefold contributions. First, we conduct a comprehensive empirical analysis and uncover that item lifecycles in short-video platforms follow a compressed three-phase pattern, i.e., rapid growth, transient stability, and sharp decay, which significantly deviates from the classical four-stage model (introduction, growth, maturity, decline). Second, we introduce LHRL, a lifecycle-aware hierarchical reinforcement learning framework that dynamically harmonizes fairness and accuracy by leveraging phase-specific exposure dynamics. LHRL consists of two key components: (1) PhaseFormer, a lightweight encoder combining STL decomposition and attention mechanisms for robust phase detection; (2) a two-level HRL agent, where the high-level policy imposes phase-aware fairness constraints, and the low-level policy optimizes immediate user engagement. This decoupled optimization allows for effective reconciliation between long-term equity and short-term utility. Third, experiments on multiple real-world interactive recommendation datasets demonstrate that LHRL significantly improves both fairness and user engagement. Furthermore, the integration of lifecycle-aware rewards into existing RL-based models consistently yields performance gains, highlighting the generalizability and practical value of our approach.

Revisiting Fairness-aware Interactive Recommendation: Item Lifecycle as a Control Knob

TL;DR

The paper tackles fairness in interactive recommender systems under dynamic item popularity by introducing item lifecycles as a control knob. It presents LHRL, a Lifecycle-aware Hierarchical Reinforcement Learning framework, combining PhaseFormer for real-time lifecycle detection with a two-tier RL architecture that decouples long-term fairness from short-term engagement. PhaseFormer leverages STL decomposition and an iTransformer to predict lifecycle stages, while the high- and low-level RL agents coordinate via phase-aware fairness weights and engagement optimization, guided by lifecycle-aware rewards. Experiments on real-world KuaiRec/KuaiRand-derived setups show that LHRL improves both fairness (lower exposure disparity) and user engagement (long-term metrics), and that lifecycle rewards generalize to other RL baselines, underscoring practical impact for sustainable, equitable streaming platforms.

Abstract

This paper revisits fairness-aware interactive recommendation (e.g., TikTok, KuaiShou) by introducing a novel control knob, i.e., the lifecycle of items. We make threefold contributions. First, we conduct a comprehensive empirical analysis and uncover that item lifecycles in short-video platforms follow a compressed three-phase pattern, i.e., rapid growth, transient stability, and sharp decay, which significantly deviates from the classical four-stage model (introduction, growth, maturity, decline). Second, we introduce LHRL, a lifecycle-aware hierarchical reinforcement learning framework that dynamically harmonizes fairness and accuracy by leveraging phase-specific exposure dynamics. LHRL consists of two key components: (1) PhaseFormer, a lightweight encoder combining STL decomposition and attention mechanisms for robust phase detection; (2) a two-level HRL agent, where the high-level policy imposes phase-aware fairness constraints, and the low-level policy optimizes immediate user engagement. This decoupled optimization allows for effective reconciliation between long-term equity and short-term utility. Third, experiments on multiple real-world interactive recommendation datasets demonstrate that LHRL significantly improves both fairness and user engagement. Furthermore, the integration of lifecycle-aware rewards into existing RL-based models consistently yields performance gains, highlighting the generalizability and practical value of our approach.

Paper Structure

This paper contains 15 sections, 10 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: The role of lifecycle in fairness recommendation
  • Figure 2: Lifecycle analysis on KuaiRec and KuaiRand: (a)--(b) aggregated play progress curves; (c) classical four-phase product lifecycle (PLC); (d) Gompertz growth curve; (e)--(f) distribution of growth-phase duration. 92.9 % (9524/10253, KuaiRec) and 94.5 % (6400/6777, KuaiRand) of videos reach peak engagement within the first 7 days.
  • Figure 3: The overall architecture of LHRL
  • Figure 4: Case studies of the PhaseFormer module's predictions.
  • Figure 5: Proportion of videos from different lifecycle stages in the recommendation list over time.