On Causally Disentangled State Representation Learning for Reinforcement Learning based Recommender Systems
Siyu Wang, Xiaocong Chen, Lina Yao
TL;DR
This work addresses the challenge of high-dimensional, dynamic state spaces in RL-based recommender systems by introducing Causal-Indispensable State Representations (CIDS), which identify Directly Action-Influenced State Variables (DAIS) and Action-Influence Ancestors (AIA) through causal graphical models and conditional mutual information. DAIS captures state dimensions directly affected by actions, while AIA encompasses upstream variables influencing DAIS, enabling a focused, causally informative representation (CIDS) for policy learning. The paper provides identifiability results for the causal masks and proposes objective functions to learn DAIS and AIA from data, integrating CIDS into policy learning with a three-phase procedure. Extensive experiments on online simulators and offline datasets show that CIDS-based policies (e.g., DDPG-CIDS, SAC-CIDS, TD3-CIDS) consistently outperform strong baselines, validating the approach’s efficiency and effectiveness in adapting to evolving user behaviors. The method offers a principled pathway to improve personalization by concentrating learning on causally meaningful state components, with potential extensions to confounder handling and joint optimization of representation and policy.
Abstract
In Reinforcement Learning-based Recommender Systems (RLRS), the complexity and dynamism of user interactions often result in high-dimensional and noisy state spaces, making it challenging to discern which aspects of the state are truly influential in driving the decision-making process. This issue is exacerbated by the evolving nature of user preferences and behaviors, requiring the recommender system to adaptively focus on the most relevant information for decision-making while preserving generaliability. To tackle this problem, we introduce an innovative causal approach for decomposing the state and extracting \textbf{C}ausal-\textbf{I}n\textbf{D}ispensable \textbf{S}tate Representations (CIDS) in RLRS. Our method concentrates on identifying the \textbf{D}irectly \textbf{A}ction-\textbf{I}nfluenced \textbf{S}tate Variables (DAIS) and \textbf{A}ction-\textbf{I}nfluence \textbf{A}ncestors (AIA), which are essential for making effective recommendations. By leveraging conditional mutual information, we develop a framework that not only discerns the causal relationships within the generative process but also isolates critical state variables from the typically dense and high-dimensional state representations. We provide theoretical evidence for the identifiability of these variables. Then, by making use of the identified causal relationship, we construct causal-indispensable state representations, enabling the training of policies over a more advantageous subset of the agent's state space. We demonstrate the efficacy of our approach through extensive experiments, showcasing our method outperforms state-of-the-art methods.
