Table of Contents
Fetching ...

Selecting Decision-Relevant Concepts in Reinforcement Learning

Naveen Raman, Stephanie Milani, Fei Fang

Abstract

Training interpretable concept-based policies requires practitioners to manually select which human-understandable concepts an agent should reason with when making sequential decisions. This selection demands domain expertise, is time-consuming and costly, scales poorly with the number of candidates, and provides no performance guarantees. To overcome this limitation, we propose the first algorithms for principled automatic concept selection in sequential decision-making. Our key insight is that concept selection can be viewed through the lens of state abstraction: intuitively, a concept is decision-relevant if removing it would cause the agent to confuse states that require different actions. As a result, agents should rely on decision-relevant concepts; states with the same concept representation should share the same optimal action, which preserves the optimal decision structure of the original state space. This perspective leads to the Decision-Relevant Selection (DRS) algorithm, which selects a subset of concepts from a candidate set, along with performance bounds relating the selected concepts to the performance of the resulting policy. Empirically, DRS automatically recovers manually curated concept sets while matching or exceeding their performance, and improves the effectiveness of test-time concept interventions across reinforcement learning benchmarks and real-world healthcare environments.

Selecting Decision-Relevant Concepts in Reinforcement Learning

Abstract

Training interpretable concept-based policies requires practitioners to manually select which human-understandable concepts an agent should reason with when making sequential decisions. This selection demands domain expertise, is time-consuming and costly, scales poorly with the number of candidates, and provides no performance guarantees. To overcome this limitation, we propose the first algorithms for principled automatic concept selection in sequential decision-making. Our key insight is that concept selection can be viewed through the lens of state abstraction: intuitively, a concept is decision-relevant if removing it would cause the agent to confuse states that require different actions. As a result, agents should rely on decision-relevant concepts; states with the same concept representation should share the same optimal action, which preserves the optimal decision structure of the original state space. This perspective leads to the Decision-Relevant Selection (DRS) algorithm, which selects a subset of concepts from a candidate set, along with performance bounds relating the selected concepts to the performance of the resulting policy. Empirically, DRS automatically recovers manually curated concept sets while matching or exceeding their performance, and improves the effectiveness of test-time concept interventions across reinforcement learning benchmarks and real-world healthcare environments.

Paper Structure

This paper contains 52 sections, 12 theorems, 34 equations, 13 figures, 1 table.

Key Result

Theorem 2.1

Suppose that $\phi_{\mathrm{SA}}$ is an $\epsilon$-approximate state abstraction. Let $\pi_{\phi_{\mathrm{SA}}}$ be Then $V^{\pi^{*}}(s) - V^{\pi_{\phi_{\mathrm{SA}}}}(s) \leq \frac{2 \epsilon}{(1-\gamma)^{2}}$ for all $s$. $\blacktriangleleft$$\blacktriangleleft$

Figures (13)

  • Figure 1: Standard pipeline for training concept-based policies. Practitioners select concepts for decision-making through a labor-intensive process of iteratively selecting candidate concepts, training concept-based policies, and evaluating their performance.
  • Figure 2: Concept-based models rely on a set of decision-relevant concepts that help distinguish between different states, yet currently these concepts are manually selected. In this work, we study how to identify and select decision-relevant concepts. Our key insight is that decision-relevant concepts best separate "different" states, where difference is defined by their decision consequences. We use this insight to develop algorithms for concept selection with performance guarantees.
  • Figure 3: Normalized reward of concept selection algorithms with perfect (top) and imperfect (bottom) concept predictors. Our algorithm, DRS, improves performance compared to the random, variance, and greedy baselines for four out of five environments in the perfect setting. DRS and DRS-log improve performance or are optimal in all environments in the imperfect setting.
  • Figure 4: We vary the number of timesteps that we train policies for in MiniGrid, while also varying the accuracy of concept predictors (left) or the number of concepts selected (right). Increasing the accuracy of concept predictors speeds up training, while increasing the number of concepts increases the maximum performance.
  • Figure 5: Impact of the number of concepts selected against the accuracy of the underlying concepts. Increasing either number or accuracy of concepts has a similar impact on performance, and that sufficiently accurate and many concepts are needed to ensure good performance.
  • ...and 8 more figures

Theorems & Definitions (21)

  • Theorem 2.1: Theorem 1, approximate_state_abstraction
  • Example 3.1
  • Definition 3.2: Q-Distance
  • Definition 3.3: Abstraction Error
  • Proposition 3.1
  • Theorem 3.1
  • Theorem 4.1
  • Theorem 4.2
  • Lemma 4.1
  • Theorem 12.1
  • ...and 11 more