Table of Contents
Fetching ...

On Causally Disentangled State Representation Learning for Reinforcement Learning based Recommender Systems

Siyu Wang, Xiaocong Chen, Lina Yao

TL;DR

This work addresses the challenge of high-dimensional, dynamic state spaces in RL-based recommender systems by introducing Causal-Indispensable State Representations (CIDS), which identify Directly Action-Influenced State Variables (DAIS) and Action-Influence Ancestors (AIA) through causal graphical models and conditional mutual information. DAIS captures state dimensions directly affected by actions, while AIA encompasses upstream variables influencing DAIS, enabling a focused, causally informative representation (CIDS) for policy learning. The paper provides identifiability results for the causal masks and proposes objective functions to learn DAIS and AIA from data, integrating CIDS into policy learning with a three-phase procedure. Extensive experiments on online simulators and offline datasets show that CIDS-based policies (e.g., DDPG-CIDS, SAC-CIDS, TD3-CIDS) consistently outperform strong baselines, validating the approach’s efficiency and effectiveness in adapting to evolving user behaviors. The method offers a principled pathway to improve personalization by concentrating learning on causally meaningful state components, with potential extensions to confounder handling and joint optimization of representation and policy.

Abstract

In Reinforcement Learning-based Recommender Systems (RLRS), the complexity and dynamism of user interactions often result in high-dimensional and noisy state spaces, making it challenging to discern which aspects of the state are truly influential in driving the decision-making process. This issue is exacerbated by the evolving nature of user preferences and behaviors, requiring the recommender system to adaptively focus on the most relevant information for decision-making while preserving generaliability. To tackle this problem, we introduce an innovative causal approach for decomposing the state and extracting \textbf{C}ausal-\textbf{I}n\textbf{D}ispensable \textbf{S}tate Representations (CIDS) in RLRS. Our method concentrates on identifying the \textbf{D}irectly \textbf{A}ction-\textbf{I}nfluenced \textbf{S}tate Variables (DAIS) and \textbf{A}ction-\textbf{I}nfluence \textbf{A}ncestors (AIA), which are essential for making effective recommendations. By leveraging conditional mutual information, we develop a framework that not only discerns the causal relationships within the generative process but also isolates critical state variables from the typically dense and high-dimensional state representations. We provide theoretical evidence for the identifiability of these variables. Then, by making use of the identified causal relationship, we construct causal-indispensable state representations, enabling the training of policies over a more advantageous subset of the agent's state space. We demonstrate the efficacy of our approach through extensive experiments, showcasing our method outperforms state-of-the-art methods.

On Causally Disentangled State Representation Learning for Reinforcement Learning based Recommender Systems

TL;DR

This work addresses the challenge of high-dimensional, dynamic state spaces in RL-based recommender systems by introducing Causal-Indispensable State Representations (CIDS), which identify Directly Action-Influenced State Variables (DAIS) and Action-Influence Ancestors (AIA) through causal graphical models and conditional mutual information. DAIS captures state dimensions directly affected by actions, while AIA encompasses upstream variables influencing DAIS, enabling a focused, causally informative representation (CIDS) for policy learning. The paper provides identifiability results for the causal masks and proposes objective functions to learn DAIS and AIA from data, integrating CIDS into policy learning with a three-phase procedure. Extensive experiments on online simulators and offline datasets show that CIDS-based policies (e.g., DDPG-CIDS, SAC-CIDS, TD3-CIDS) consistently outperform strong baselines, validating the approach’s efficiency and effectiveness in adapting to evolving user behaviors. The method offers a principled pathway to improve personalization by concentrating learning on causally meaningful state components, with potential extensions to confounder handling and joint optimization of representation and policy.

Abstract

In Reinforcement Learning-based Recommender Systems (RLRS), the complexity and dynamism of user interactions often result in high-dimensional and noisy state spaces, making it challenging to discern which aspects of the state are truly influential in driving the decision-making process. This issue is exacerbated by the evolving nature of user preferences and behaviors, requiring the recommender system to adaptively focus on the most relevant information for decision-making while preserving generaliability. To tackle this problem, we introduce an innovative causal approach for decomposing the state and extracting \textbf{C}ausal-\textbf{I}n\textbf{D}ispensable \textbf{S}tate Representations (CIDS) in RLRS. Our method concentrates on identifying the \textbf{D}irectly \textbf{A}ction-\textbf{I}nfluenced \textbf{S}tate Variables (DAIS) and \textbf{A}ction-\textbf{I}nfluence \textbf{A}ncestors (AIA), which are essential for making effective recommendations. By leveraging conditional mutual information, we develop a framework that not only discerns the causal relationships within the generative process but also isolates critical state variables from the typically dense and high-dimensional state representations. We provide theoretical evidence for the identifiability of these variables. Then, by making use of the identified causal relationship, we construct causal-indispensable state representations, enabling the training of policies over a more advantageous subset of the agent's state space. We demonstrate the efficacy of our approach through extensive experiments, showcasing our method outperforms state-of-the-art methods.
Paper Structure (29 sections, 3 theorems, 12 equations, 4 figures, 2 tables, 1 algorithm)

This paper contains 29 sections, 3 theorems, 12 equations, 4 figures, 2 tables, 1 algorithm.

Key Result

Theorem 3.1

Under the assumptions A1-A4, $s^i_{t+1} \in \text{DAIS}_{t+1}$ if and only if $a_{t} \centernot{\perp\mkern-9.5mu\perp} s^i_{t+1} | \text{DAIS}_{t}$.

Figures (4)

  • Figure 1: An illustrative causal graphical model for an MDP is depicted. The state $s_t$ is decomposed into six different dimensions, denoted as $s_t = (s^1_t, \ldots, s^6_t)$. The purple nodes signify DAIS, while the blue nodes symbolize AIA. The nodes enclosed within the gray area collectively represent CIDS, which contains both DAIS and AIA.
  • Figure 2: Performance comparison of baseline algorithms and corresponding CIDS-enhanced methods in the VirtualTaobao simulation.
  • Figure 3: The 1-step CTR performance in the VirtualTaobao simulation is presented as the mean with error bars.
  • Figure 4: Evaluation with different RL frameworks: (a) DDPG as the backbone, (b) SAC as the backbone, and (c) TD3 as the backbone. Ablation versions with only DAIS representation and AIA representation are also included in each backbone.

Theorems & Definitions (8)

  • Definition 3.1: Causal Decomposition of State
  • Theorem 3.1
  • Theorem 3.2
  • Theorem 3.3
  • Definition A.1: d-Separation 10.5555/3202377
  • Definition A.2: Structural Causal Models pearl2009causality
  • Definition A.3: Markov property 10.5555/3202377
  • Definition A.4: Causal Faithfulness 10.5555/3202377pearl2009causalityspirtes2000causation