Table of Contents
Fetching ...

Explaining RL Decisions with Trajectories

Shripad Vilasrao Deshmukh, Arpan Dasgupta, Balaji Krishnamurthy, Nan Jiang, Chirag Agarwal, Georgios Theocharous, Jayakumar Subramanian

TL;DR

This work tackles explainability in offline RL by attributing a policy's decisions to encountered trajectories rather than to state features. It introduces a five-step trajectory attribution pipeline that encodes trajectories with sequence models into embeddings, clusters them, builds data embeddings for sets of trajectories, trains explanation policies on complementary data, and attributes decisions via cluster-level distances. Across grid-world, Seaquest, and HalfCheetah, the method yields semantically meaningful behaviours and scalable explanations, with a human study indicating substantial alignment between human intuition and the attributions. The findings suggest trajectory-based explanations can complement saliency methods, enabling trust and interpretability in real-world RL deployments.

Abstract

Explanation is a key component for the adoption of reinforcement learning (RL) in many real-world decision-making problems. In the literature, the explanation is often provided by saliency attribution to the features of the RL agent's state. In this work, we propose a complementary approach to these explanations, particularly for offline RL, where we attribute the policy decisions of a trained RL agent to the trajectories encountered by it during training. To do so, we encode trajectories in offline training data individually as well as collectively (encoding a set of trajectories). We then attribute policy decisions to a set of trajectories in this encoded space by estimating the sensitivity of the decision with respect to that set. Further, we demonstrate the effectiveness of the proposed approach in terms of quality of attributions as well as practical scalability in diverse environments that involve both discrete and continuous state and action spaces such as grid-worlds, video games (Atari) and continuous control (MuJoCo). We also conduct a human study on a simple navigation task to observe how their understanding of the task compares with data attributed for a trained RL policy. Keywords -- Explainable AI, Verifiability of AI Decisions, Explainable RL.

Explaining RL Decisions with Trajectories

TL;DR

This work tackles explainability in offline RL by attributing a policy's decisions to encountered trajectories rather than to state features. It introduces a five-step trajectory attribution pipeline that encodes trajectories with sequence models into embeddings, clusters them, builds data embeddings for sets of trajectories, trains explanation policies on complementary data, and attributes decisions via cluster-level distances. Across grid-world, Seaquest, and HalfCheetah, the method yields semantically meaningful behaviours and scalable explanations, with a human study indicating substantial alignment between human intuition and the attributions. The findings suggest trajectory-based explanations can complement saliency methods, enabling trust and interpretability in real-world RL deployments.

Abstract

Explanation is a key component for the adoption of reinforcement learning (RL) in many real-world decision-making problems. In the literature, the explanation is often provided by saliency attribution to the features of the RL agent's state. In this work, we propose a complementary approach to these explanations, particularly for offline RL, where we attribute the policy decisions of a trained RL agent to the trajectories encountered by it during training. To do so, we encode trajectories in offline training data individually as well as collectively (encoding a set of trajectories). We then attribute policy decisions to a set of trajectories in this encoded space by estimating the sensitivity of the decision with respect to that set. Further, we demonstrate the effectiveness of the proposed approach in terms of quality of attributions as well as practical scalability in diverse environments that involve both discrete and continuous state and action spaces such as grid-worlds, video games (Atari) and continuous control (MuJoCo). We also conduct a human study on a simple navigation task to observe how their understanding of the task compares with data attributed for a trained RL policy. Keywords -- Explainable AI, Verifiability of AI Decisions, Explainable RL.
Paper Structure (17 sections, 14 figures, 5 tables, 6 algorithms)

This paper contains 17 sections, 14 figures, 5 tables, 6 algorithms.

Figures (14)

  • Figure 1: Trajectory Attribution in Offline RL. First, we encode trajectories in offline data using sequence encoders and then cluster the trajectories using these encodings. Also, we generate a single embedding for the data. Next, we train explanation policies on variants of the original dataset and compute corresponding data embeddings. Finally, we attribute decisions of RL agents trained on entire data to trajectory clusters using action and data embedding distances.
  • Figure 2: Grid-world Trajectory Attribution. RL agent suggests taking action 'right' in grid cell (1,1). This action is attributed to trajectories (i), (ii) and (iii) (We denote gridworld trajectory by annotated $\wedge$,$\vee$,$>$,$<$ arrows for 'up', 'down', 'right', 'left' actions respectively, along with the time-step associated with the actions (0-indexed)). We can observe that the RL decisions could be influenced by trajectories distant from the state under consideration, and therefore attributing decisions to trajectories becomes important to understand the decision better.
  • Figure 3: Seaquest Trajectory Attribution. The agent (submarine) decides to take 'left' for the given observation under the provided context. Top-3 attributed trajectories are shown on the right (for each training data traj., we show 6 sampled observations and the corresponding actions). As depicted in the attributed trajectories, the action 'left' is explained in terms of the agent aligning itself to face the enemies coming from the left end of the frame.
  • Figure 4: Column (a): An example of the human study experiment where users are required to identify the attributed trajectories that best explain the state-action behavior of the agent. Column (b): Results from our human study experiments show a decent alignment of human knowledge of navigation task with actual factors influencing RL decision-making. This underlines the utility as well as the scope of the proposed trajectory attribution explanation method.
  • Figure 5: Overview of the Grid-world Environment. The aim of the agent is to reach any of the goal states (green squares) by avoiding lava (red square) and going around the impenetrable walls (grey squares). The reward for reaching the goal is +1; if the agent falls into the lava, it is -1. For any other transitions, the agent receives -0.1. The agent is allowed to take up, down, left or right as the action.
  • ...and 9 more figures