Table of Contents
Fetching ...

CIER: A Novel Experience Replay Approach with Causal Inference in Deep Reinforcement Learning

Jingwen Wang, Dehui Du, Yida Li, Yiyang Li, Yikang Chen

TL;DR

This work tackles data efficiency and explainability in DRL by exploiting temporal correlations in training time series. It introduces Time Series Causal Factors (TSCF) to segment histories and a Causal Inference Experience Replay (CIER) that leverages causal discovery (PAG via $GFCI$) to assign causal-strength-based weights to experiences, connecting these to $ATE$ within the replay process. An extension, CIPER, integrates CIER with Prioritized Experience Replay to further boost learning performance across DRL tasks, including autonomous driving and MuJoCo environments. The results show improved sample efficiency and convergence speed, along with enhanced explainability via scenario-based causal explanations and visualizations of the learned relationships between temporal factors and rewards.

Abstract

In the training process of Deep Reinforcement Learning (DRL), agents require repetitive interactions with the environment. With an increase in training volume and model complexity, it is still a challenging problem to enhance data utilization and explainability of DRL training. This paper addresses these challenges by focusing on the temporal correlations within the time dimension of time series. We propose a novel approach to segment multivariate time series into meaningful subsequences and represent the time series based on these subsequences. Furthermore, the subsequences are employed for causal inference to identify fundamental causal factors that significantly impact training outcomes. We design a module to provide feedback on the causality during DRL training. Several experiments demonstrate the feasibility of our approach in common environments, confirming its ability to enhance the effectiveness of DRL training and impart a certain level of explainability to the training process. Additionally, we extended our approach with priority experience replay algorithm, and experimental results demonstrate the continued effectiveness of our approach.

CIER: A Novel Experience Replay Approach with Causal Inference in Deep Reinforcement Learning

TL;DR

This work tackles data efficiency and explainability in DRL by exploiting temporal correlations in training time series. It introduces Time Series Causal Factors (TSCF) to segment histories and a Causal Inference Experience Replay (CIER) that leverages causal discovery (PAG via ) to assign causal-strength-based weights to experiences, connecting these to within the replay process. An extension, CIPER, integrates CIER with Prioritized Experience Replay to further boost learning performance across DRL tasks, including autonomous driving and MuJoCo environments. The results show improved sample efficiency and convergence speed, along with enhanced explainability via scenario-based causal explanations and visualizations of the learned relationships between temporal factors and rewards.

Abstract

In the training process of Deep Reinforcement Learning (DRL), agents require repetitive interactions with the environment. With an increase in training volume and model complexity, it is still a challenging problem to enhance data utilization and explainability of DRL training. This paper addresses these challenges by focusing on the temporal correlations within the time dimension of time series. We propose a novel approach to segment multivariate time series into meaningful subsequences and represent the time series based on these subsequences. Furthermore, the subsequences are employed for causal inference to identify fundamental causal factors that significantly impact training outcomes. We design a module to provide feedback on the causality during DRL training. Several experiments demonstrate the feasibility of our approach in common environments, confirming its ability to enhance the effectiveness of DRL training and impart a certain level of explainability to the training process. Additionally, we extended our approach with priority experience replay algorithm, and experimental results demonstrate the continued effectiveness of our approach.
Paper Structure (18 sections, 3 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 18 sections, 3 equations, 5 figures, 1 table, 1 algorithm.

Figures (5)

  • Figure 1: A critical overtaking scenario. The entire scenario can be divided into four TSCFs based on the temporal order of actions. Factor 1 represents following the vehicle straight, Factor 2 represents the driver's intention to overtake by turning left into the oncoming lane, Factor 3 represents the driver's intention to accelerate and complete the overtaking action before a collision occurs, and the final TSCF (Consequence) represents a misjudgment by the driver in terms of distance and speed, resulting in a collision.
  • Figure 2: Causal graph of overtaking scenario based on TSCFs.
  • Figure 3: An illustration of the representation of multivariate time series. Unequal-length multivariate time series samples are transformed into equal-length sequences through the representation method of TSCF.
  • Figure 4: The architecture for CIER. It is based on the Actor-Critic algorithm and optimizes experience replay in the general DRL training process. During iterations, Time Series Unit represents the training historical data through TSCFs. The Causal Unit, based on the training target, conducts causal discovery on TSCF and maps causality to action sequences. The Prioritized Unit is optional and utilized for prioritizing experiences based on the TD Error calculated from the value function Estimation provided by the Critic.
  • Figure 5: TSCFs and PAG for overtaking scenarios. Figure \ref{['subfig:success']} and \ref{['subfig:crash']} represent typical trajectory samples of successful overtaking and crash respectively, in which the ego-vehicle travels from the positive direction of the x-axis to the negative direction. Figure \ref{['subfig:PAG']} is a PAG generated based on GFCI, where treatments correspond to the variables in Figure \ref{['subfig:success']} and \ref{['subfig:crash']}, and the explanatory labels in the figures are arranged in chronological order.