Table of Contents
Fetching ...

Symbolic State Partitioning for Reinforcement Learning

Mohsen Ghaffari, Mahsa Varshosaz, Einar Broch Johnsen, Andrzej Wąsowski

TL;DR

This work tackles the challenge of applying tabular reinforcement learning to continuous state spaces by introducing SymPar, a symbolic execution–driven state partitioning method that adaptively abstracts the environment dynamics. By symbolically analysing the environment model, SymPar derives path conditions that, when intersected across actions, yield an action‑independent partition tailored to problem structure and nonlinear dependencies. The paper demonstrates that these partitions improve state-space coverage and learning efficiency, especially under sparse rewards, and provides extensive empirical comparisons against tile coding and online partitioning methods. Overall, SymPar bridges software engineering symbolic reasoning with reinforcement learning to produce explainable, scalable, and dynamics-aware discretizations with strong performance gains on a diverse set of benchmarks.

Abstract

Tabular reinforcement learning methods cannot operate directly on continuous state spaces. One solution for this problem is to partition the state space. A good partitioning enables generalization during learning and more efficient exploitation of prior experiences. Consequently, the learning process becomes faster and produces more reliable policies. However, partitioning introduces approximation, which is particularly harmful in the presence of nonlinear relations between state components. An ideal partition should be as coarse as possible, while capturing the key structure of the state space for the given problem. This work extracts partitions from the environment dynamics by symbolic execution. We show that symbolic partitioning improves state space coverage with respect to environmental behavior and allows reinforcement learning to perform better for sparse rewards. We evaluate symbolic state space partitioning with respect to precision, scalability, learning agent performance and state space coverage for the learnt policies.

Symbolic State Partitioning for Reinforcement Learning

TL;DR

This work tackles the challenge of applying tabular reinforcement learning to continuous state spaces by introducing SymPar, a symbolic execution–driven state partitioning method that adaptively abstracts the environment dynamics. By symbolically analysing the environment model, SymPar derives path conditions that, when intersected across actions, yield an action‑independent partition tailored to problem structure and nonlinear dependencies. The paper demonstrates that these partitions improve state-space coverage and learning efficiency, especially under sparse rewards, and provides extensive empirical comparisons against tile coding and online partitioning methods. Overall, SymPar bridges software engineering symbolic reasoning with reinforcement learning to produce explainable, scalable, and dynamics-aware discretizations with strong performance gains on a diverse set of benchmarks.

Abstract

Tabular reinforcement learning methods cannot operate directly on continuous state spaces. One solution for this problem is to partition the state space. A good partitioning enables generalization during learning and more efficient exploitation of prior experiences. Consequently, the learning process becomes faster and produces more reliable policies. However, partitioning introduces approximation, which is particularly harmful in the presence of nonlinear relations between state components. An ideal partition should be as coarse as possible, while capturing the key structure of the state space for the given problem. This work extracts partitions from the environment dynamics by symbolic execution. We show that symbolic partitioning improves state space coverage with respect to environmental behavior and allows reinforcement learning to perform better for sparse rewards. We evaluate symbolic state space partitioning with respect to precision, scalability, learning agent performance and state space coverage for the learnt policies.
Paper Structure (26 sections, 4 theorems, 2 equations, 12 figures, 4 tables, 1 algorithm)

This paper contains 26 sections, 4 theorems, 2 equations, 12 figures, 4 tables, 1 algorithm.

Key Result

theorem thmcountertheorem

The set $\mathcal{P}$obtained in alg:sympar is a partition (i.e., it is total): $\forall {\overline s}\xspace\in {\overline { \mathcal{S}}_{} }\; \exists!\, \mathcal{P}_0 \in\mathcal{P}\cdot \; {\overline s}\xspace\in \mathcal{P}_0.$

Figures (12)

  • Figure 1: Navigation environment. A mouse agent in a continuous rectangular board needs to find the cheese, while not stepping on the trap.
  • Figure 2: Reinforcement learning schematic.
  • Figure 3: Symbolic execution rules for an idealized probabilistic language. Each judgement is a quadruple: the program, the symbolic store ($\sigma$), the sample index ($k$), the current path condition ($\phi$).
  • Figure 4: The environment program ($$T, $$R) for the navigation problem (\ref{['fig:navigationEnv']}).
  • Figure 5: Path conditions collected by symbolic execution. The numbers (to the left) refer to line numbers in the program of \ref{['fig:navigationProgram']}.
  • ...and 7 more figures

Theorems & Definitions (9)

  • Example 1
  • Example 2
  • Example 3
  • theorem thmcountertheorem
  • theorem thmcountertheorem
  • theorem thmcountertheorem
  • proof
  • theorem thmcountertheorem
  • proof