Table of Contents
Fetching ...

Prism: Policy Reuse via Interpretable Strategy Mapping in Reinforcement Learning

Thomas Pravetz

Abstract

We present PRISM (Policy Reuse via Interpretable Strategy Mapping), a framework that grounds reinforcement learning agents' decisions in discrete, causally validated concepts and uses those concepts as a zero-shot transfer interface between agents trained with different algorithms. PRISM clusters each agent's encoder features into $K$ concepts via K-means. Causal intervention establishes that these concepts directly drive - not merely correlate with - agent behavior: overriding concept assignments changes the selected action in 69.4% of interventions ($p = 8.6 \times 10^{-86}$, 2500 interventions). Concept importance and usage frequency are dissociated: the most-used concept (C47, 33.0% frequency) causes only a 9.4% win-rate drop when ablated, while ablating C16 (15.4% frequency) collapses win rate from 100% to 51.8%. Because concepts causally encode strategy, aligning them via optimal bipartite matching transfers strategic knowledge zero-shot. On Go~7$\times$7 with three independently trained agents, concept transfer achieves 69.5%$\pm$3.2% and 76.4%$\pm$3.4% win rate against a standard engine across the two successful transfer pairs (10 seeds), compared to 3.5% for a random agent and 9.2% without alignment. Transfer succeeds when the source policy is strong; geometric alignment quality predicts nothing ($R^2 \approx 0$). The framework is scoped to domains where strategic state is naturally discrete: the identical pipeline on Atari Breakout yields bottleneck policies at random-agent performance, confirming that the Go results reflect a structural property of the domain.

Prism: Policy Reuse via Interpretable Strategy Mapping in Reinforcement Learning

Abstract

We present PRISM (Policy Reuse via Interpretable Strategy Mapping), a framework that grounds reinforcement learning agents' decisions in discrete, causally validated concepts and uses those concepts as a zero-shot transfer interface between agents trained with different algorithms. PRISM clusters each agent's encoder features into concepts via K-means. Causal intervention establishes that these concepts directly drive - not merely correlate with - agent behavior: overriding concept assignments changes the selected action in 69.4% of interventions (, 2500 interventions). Concept importance and usage frequency are dissociated: the most-used concept (C47, 33.0% frequency) causes only a 9.4% win-rate drop when ablated, while ablating C16 (15.4% frequency) collapses win rate from 100% to 51.8%. Because concepts causally encode strategy, aligning them via optimal bipartite matching transfers strategic knowledge zero-shot. On Go~77 with three independently trained agents, concept transfer achieves 69.5%3.2% and 76.4%3.4% win rate against a standard engine across the two successful transfer pairs (10 seeds), compared to 3.5% for a random agent and 9.2% without alignment. Transfer succeeds when the source policy is strong; geometric alignment quality predicts nothing (). The framework is scoped to domains where strategic state is naturally discrete: the identical pipeline on Atari Breakout yields bottleneck policies at random-agent performance, confirming that the Go results reflect a structural property of the domain.

Paper Structure

This paper contains 42 sections, 4 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Fine-tuning after zero-shot transfer. Both conditions run REINFORCE on the DQN concept bottleneck; they differ only in initialization. The transferred policy crosses 60% at generation 5 (50K steps); the from-scratch policy reaches 27% at generation 40 (400K steps) without crossing the threshold. Single seed; treat magnitude as indicative.
  • Figure 2: Concept ablation: frequency vs. win-rate drop when ablated. C16 (15.4% frequency) causes a 48.2 pp drop; C47 (33.0% frequency, the most-used concept) causes only a 9.4 pp drop. Three concepts improve win rate when removed (negative drop), suggesting they encode suboptimal patterns.
  • Figure 3: Win rate vs. $K$ (number of concepts) at 300K training steps against GnuGo L3. Transfer performance peaks at $K{=}32$ (76%); direct bottleneck performance is higher at $K{=}8$ and $K{=}64$. The paper uses $K{=}64$ based on direct performance at full training (3.4M steps), where larger $K$ captures finer-grained structure. Results at 300K steps are not directly comparable to Table \ref{['tab:agent_transfer']} (full curriculum).