Table of Contents
Fetching ...

Causal Reinforcement Learning based Agent-Patient Interaction with Clinical Domain Knowledge

Wenzheng Zhao, Ran Zhang, Ruth Palan Lopez, Shu-Fen Wung, Fengpei Yuan

TL;DR

The paper tackles the challenge of deploying reinforcement learning in data-scarce, high-stakes clinical settings, especially dementia care, by introducing Causal Structure-Aware Reinforcement Learning (CRL). CRL fuses causal discovery and inference with RL, learning a DAG-based world model and using CATE estimation to inject clinical knowledge into policy optimization. In a simulated PwD-robot reminiscence therapy setting, CRL outperforms model-free baselines in rewards, patient-state maintenance, and interpretability, while demonstrating robust performance across hyperparameters and enabling a lightweight LLM-based dialogue deployment. These results underscore the potential of combining causal reasoning with RL to enable safe, data-efficient, and clinically aligned human-robot interactions, with future work aimed at real-world validation and expanded causal modeling.

Abstract

Reinforcement Learning (RL) faces significant challenges in adaptive healthcare interventions, such as dementia care, where data is scarce, decisions require interpretability, and underlying patient-state dynamic are complex and causal in nature. In this work, we present a novel framework called Causal structure-aware Reinforcement Learning (CRL) that explicitly integrates causal discovery and reasoning into policy optimization. This method enables an agent to learn and exploit a directed acyclic graph (DAG) that describes the causal dependencies between human behavioral states and robot actions, facilitating more efficient, interpretable, and robust decision-making. We validate our approach in a simulated robot-assisted cognitive care scenario, where the agent interacts with a virtual patient exhibiting dynamic emotional, cognitive, and engagement states. The experimental results show that CRL agents outperform conventional model-free RL baselines by achieving higher cumulative rewards, maintaining desirable patient states more consistently, and exhibiting interpretable, clinically-aligned behavior. We further demonstrate that CRL's performance advantage remains robust across different weighting strategies and hyperparameter settings. In addition, we demonstrate a lightweight LLM-based deployment: a fixed policy is embedded into a system prompt that maps inferred states to actions, producing consistent, supportive dialogue without LLM finetuning. Our work illustrates the promise of causal reinforcement learning for human-robot interaction applications, where interpretability, adaptiveness, and data efficiency are paramount.

Causal Reinforcement Learning based Agent-Patient Interaction with Clinical Domain Knowledge

TL;DR

The paper tackles the challenge of deploying reinforcement learning in data-scarce, high-stakes clinical settings, especially dementia care, by introducing Causal Structure-Aware Reinforcement Learning (CRL). CRL fuses causal discovery and inference with RL, learning a DAG-based world model and using CATE estimation to inject clinical knowledge into policy optimization. In a simulated PwD-robot reminiscence therapy setting, CRL outperforms model-free baselines in rewards, patient-state maintenance, and interpretability, while demonstrating robust performance across hyperparameters and enabling a lightweight LLM-based dialogue deployment. These results underscore the potential of combining causal reasoning with RL to enable safe, data-efficient, and clinically aligned human-robot interactions, with future work aimed at real-world validation and expanded causal modeling.

Abstract

Reinforcement Learning (RL) faces significant challenges in adaptive healthcare interventions, such as dementia care, where data is scarce, decisions require interpretability, and underlying patient-state dynamic are complex and causal in nature. In this work, we present a novel framework called Causal structure-aware Reinforcement Learning (CRL) that explicitly integrates causal discovery and reasoning into policy optimization. This method enables an agent to learn and exploit a directed acyclic graph (DAG) that describes the causal dependencies between human behavioral states and robot actions, facilitating more efficient, interpretable, and robust decision-making. We validate our approach in a simulated robot-assisted cognitive care scenario, where the agent interacts with a virtual patient exhibiting dynamic emotional, cognitive, and engagement states. The experimental results show that CRL agents outperform conventional model-free RL baselines by achieving higher cumulative rewards, maintaining desirable patient states more consistently, and exhibiting interpretable, clinically-aligned behavior. We further demonstrate that CRL's performance advantage remains robust across different weighting strategies and hyperparameter settings. In addition, we demonstrate a lightweight LLM-based deployment: a fixed policy is embedded into a system prompt that maps inferred states to actions, producing consistent, supportive dialogue without LLM finetuning. Our work illustrates the promise of causal reinforcement learning for human-robot interaction applications, where interpretability, adaptiveness, and data efficiency are paramount.

Paper Structure

This paper contains 26 sections, 7 equations, 15 figures, 2 tables, 1 algorithm.

Figures (15)

  • Figure 1: The Agent-Environment interaction from Causal Reinforcement Learning (CRL).
  • Figure 2: Learned causal structure illustrating dependencies among robot actions ($A_t$), patient’s current states ($S_t=\{RP_t, E_t, C_t\}$), and subsequent states ($S_{t+1}=\{RP_{t+1}, E_{t+1}, C_{t+1}\}$).
  • Figure 3: Smoothed average return across epochs under the final-episode policy of each epoch for different methods (evaluation using RL-only execution).
  • Figure 4: Proportion of high-return episodes (return $>$ 150) under RL-only execution.
  • Figure 5: Example conversation showing CRL policy integration with LLM for patient interaction. State transitions and policy decisions (highlighted in orange) guide the LLM to generate contextually appropriate, emotionally supportive responses.
  • ...and 10 more figures