Information-Theoretic State Variable Selection for Reinforcement Learning

Charles Westphal; Stephen Hailes; Mirco Musolesi

Information-Theoretic State Variable Selection for Reinforcement Learning

Charles Westphal, Stephen Hailes, Mirco Musolesi

TL;DR

This work addresses the challenge of learning compact, informative state representations in reinforcement learning by introducing the Transfer Entropy Redundancy Criterion (TERC), an information-theoretic measure that quantifies how state variables reduce the uncertainty in actions. The core idea is to include only variables that exhibit positive transfer entropy to actions, while rigorously handling perfect conditional redundancy (PCR/PMCR/CPMCR) to avoid discarding informative variables or retaining redundant ones. The authors provide theoretical guarantees and practical algorithms for deriving the minimal informative state subset, with a Naïve method, a CPMCR-aware algorithm, and a simplified variant that scales linearly with the number of variables. Extensive experiments on synthetic data and diverse RL benchmarks (Cart Pole, Lunar Lander, Pendulum, Secret Key Game, Iterated Prisoner’s Dilemma) show that TER C consistently identifies the optimal variable set and accelerates learning compared to UMFI and PI baselines, while also enabling interpretable tracking of information transfer during training.

Abstract

Identifying the most suitable variables to represent the state is a fundamental challenge in Reinforcement Learning (RL). These variables must efficiently capture the information necessary for making optimal decisions. In order to address this problem, in this paper, we introduce the Transfer Entropy Redundancy Criterion (TERC), an information-theoretic criterion, which determines if there is \textit{entropy transferred} from state variables to actions during training. We define an algorithm based on TERC that provably excludes variables from the state that have no effect on the final performance of the agent, resulting in more sample efficient learning. Experimental results show that this speed-up is present across three different algorithm classes (represented by tabular Q-learning, Actor-Critic, and Proximal Policy Optimization (PPO)) in a variety of environments. Furthermore, to highlight the differences between the proposed methodology and the current state-of-the-art feature selection approaches, we present a series of controlled experiments on synthetic data, before generalizing to real-world decision-making tasks. We also introduce a representation of the problem that compactly captures the transfer of information from state variables to actions as Bayesian networks.

Information-Theoretic State Variable Selection for Reinforcement Learning

TL;DR

Abstract

Paper Structure (79 sections, 44 equations, 10 figures, 2 algorithms)

This paper contains 79 sections, 44 equations, 10 figures, 2 algorithms.

Introduction
Summary of contributions.
Related Work
Feature selection.
Unsupervised derivation of state representations in reinforcement learning.
Background and Notation
Reinforcement Learning
Transfer Entropy
Conditional Redundancy
Perfect conditional redundancy (PCR).
Perfect conditional multivariate redundancy (PCMR).
Constrained perfect conditional multivariate redundancy (CPMCR).
Synergy
Problem Statement
Approach
...and 64 more sections

Figures (10)

Figure 1: In this graph we illustrate TERC's effectiveness in dealing with complex redundancies and synergies. We plot the values for TERC, PI and UMFI for the Four Redundant Variables and the Two Redundant Triplets datasets.
Figure 2: Subfigure (A) illustrates how at least three secret keys are needed to decode a polynomial of order two in Shamir's secure multi-party communication. Subfigure (B) depicts how we use our method of state variable selection to distinguish these three secret-forming keys, from non-secret-forming keys.
Figure 3: Subfigure (A) depicts the final feature importance values for each key in the secret game when using TERC, UMFI or PI. Subfigure (B) depicts the Bayesian network representation of TERC in the Secret Key Game if the secret keys were at index two, six and $N$. Finally, graph (C) represents how the agent training efficiency varied as a function of state length.
Figure 4: Graphs (A) and (B) show the final values computed when verifying TERC as evaluated during different training quartiles, for 25 and 50 potential secret keys. Similarly to Figure \ref{['fig:skg']}, we have included only TERC values for the 10 decoy keys that transferred the most entropy to the actions.
Figure 5: This subfigure depicts the final values obtained for $\Phi_{{X}_i;{\mathcal{X}} \rightarrow {A}}$, UMFI, and PI for Cart Pole, Lunar Lander, and Pendulum. On the right-hand side, we illustrate how failing to remove these random variables from the state of the Cart Pole playing agent degrades the performance of the game.
...and 5 more figures

Information-Theoretic State Variable Selection for Reinforcement Learning

TL;DR

Abstract

Information-Theoretic State Variable Selection for Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (10)