Symbolic State Partitioning for Reinforcement Learning
Mohsen Ghaffari, Mahsa Varshosaz, Einar Broch Johnsen, Andrzej Wąsowski
TL;DR
This work tackles the challenge of applying tabular reinforcement learning to continuous state spaces by introducing SymPar, a symbolic execution–driven state partitioning method that adaptively abstracts the environment dynamics. By symbolically analysing the environment model, SymPar derives path conditions that, when intersected across actions, yield an action‑independent partition tailored to problem structure and nonlinear dependencies. The paper demonstrates that these partitions improve state-space coverage and learning efficiency, especially under sparse rewards, and provides extensive empirical comparisons against tile coding and online partitioning methods. Overall, SymPar bridges software engineering symbolic reasoning with reinforcement learning to produce explainable, scalable, and dynamics-aware discretizations with strong performance gains on a diverse set of benchmarks.
Abstract
Tabular reinforcement learning methods cannot operate directly on continuous state spaces. One solution for this problem is to partition the state space. A good partitioning enables generalization during learning and more efficient exploitation of prior experiences. Consequently, the learning process becomes faster and produces more reliable policies. However, partitioning introduces approximation, which is particularly harmful in the presence of nonlinear relations between state components. An ideal partition should be as coarse as possible, while capturing the key structure of the state space for the given problem. This work extracts partitions from the environment dynamics by symbolic execution. We show that symbolic partitioning improves state space coverage with respect to environmental behavior and allows reinforcement learning to perform better for sparse rewards. We evaluate symbolic state space partitioning with respect to precision, scalability, learning agent performance and state space coverage for the learnt policies.
