Subgoal Discovery Using a Free Energy Paradigm and State Aggregations
Amirhossein Mesbah, Reshad Hosseini, Seyed Pooya Shariatpanahi, Majid Nili Ahmadabadi
TL;DR
The paper tackles subgoal discovery in reinforcement learning to improve sample efficiency and reward shaping by introducing a free energy–based framework that selects between a Main state space and an Aggregation space. Subgoals (bottlenecks) are identified where the aggregation space becomes uncertain, as quantified by a free energy objective $F(s,m,\pi)$. The method uses Thompson sampling to approximate action-value distributions across spaces and applies Otsu thresholding with non-maximum suppression to extract bottlenecks, proving effective in both discrete grid-worlds and continuous settings with deep nets. This approach avoids explicit graph construction or predefined subgoal counts and demonstrates robustness to environment stochasticity, offering a scalable, model-free pathway to automatic subgoal discovery for HRL and GCRL.
Abstract
Reinforcement learning (RL) plays a major role in solving complex sequential decision-making tasks. Hierarchical and goal-conditioned RL are promising methods for dealing with two major problems in RL, namely sample inefficiency and difficulties in reward shaping. These methods tackle the mentioned problems by decomposing a task into simpler subtasks and temporally abstracting a task in the action space. One of the key components for task decomposition of these methods is subgoal discovery. We can use the subgoal states to define hierarchies of actions and also use them in decomposing complex tasks. Under the assumption that subgoal states are more unpredictable, we propose a free energy paradigm to discover them. This is achieved by using free energy to select between two spaces, the main space and an aggregation space. The $model \; changes$ from neighboring states to a given state shows the unpredictability of a given state, and therefore it is used in this paper for subgoal discovery. Our empirical results on navigation tasks like grid-world environments show that our proposed method can be applied for subgoal discovery without prior knowledge of the task. Our proposed method is also robust to the stochasticity of environments.
