Table of Contents
Fetching ...

Hybrid Recurrent Models Support Emergent Descriptions for Hierarchical Planning and Control

Poppy Collis, Ryan Singh, Paul F Kinghorn, Christopher L Buckley

TL;DR

Addresses the challenge of learning useful discrete abstractions for inherently continuous control problems by introducing a Hybrid Hierarchical Agent (HHA) that combines an rSLDS with a discrete MDP and a finite-horizon LQR controller, guided by Active Inference-inspired exploration. The recurrent transitions in the rSLDS yield $K$ discrete modes indexing piecewise affine dynamics, enabling temporally abstract sub-goals and information-theoretic exploration bonuses, while caching low-level solutions for efficient planning. The approach is demonstrated on the Continuous Mountain Car task, showing faster reward discovery compared to model-free baselines and revealing meaningful, non-grid discretizations of the state-space that support planning and control. Overall, the work provides a interpretable, scalable framework that integrates discrete planning with continuous control through emergent hierarchical representations, with potential implications for safe and efficient autonomous systems.

Abstract

An open problem in artificial intelligence is how systems can flexibly learn discrete abstractions that are useful for solving inherently continuous problems. Previous work has demonstrated that a class of hybrid state-space model known as recurrent switching linear dynamical systems (rSLDS) discover meaningful behavioural units via the piecewise linear decomposition of complex continuous dynamics (Linderman et al., 2016). Furthermore, they model how the underlying continuous states drive these discrete mode switches. We propose that the rich representations formed by an rSLDS can provide useful abstractions for planning and control. We present a novel hierarchical model-based algorithm inspired by Active Inference in which a discrete MDP sits above a low-level linear-quadratic controller. The recurrent transition dynamics learned by the rSLDS allow us to (1) specify temporally-abstracted sub-goals in a method reminiscent of the options framework, (2) lift the exploration into discrete space allowing us to exploit information-theoretic exploration bonuses and (3) `cache' the approximate solutions to low-level problems in the discrete planner. We successfully apply our model to the sparse Continuous Mountain Car task, demonstrating fast system identification via enhanced exploration and non-trivial planning through the delineation of abstract sub-goals.

Hybrid Recurrent Models Support Emergent Descriptions for Hierarchical Planning and Control

TL;DR

Addresses the challenge of learning useful discrete abstractions for inherently continuous control problems by introducing a Hybrid Hierarchical Agent (HHA) that combines an rSLDS with a discrete MDP and a finite-horizon LQR controller, guided by Active Inference-inspired exploration. The recurrent transitions in the rSLDS yield discrete modes indexing piecewise affine dynamics, enabling temporally abstract sub-goals and information-theoretic exploration bonuses, while caching low-level solutions for efficient planning. The approach is demonstrated on the Continuous Mountain Car task, showing faster reward discovery compared to model-free baselines and revealing meaningful, non-grid discretizations of the state-space that support planning and control. Overall, the work provides a interpretable, scalable framework that integrates discrete planning with continuous control through emergent hierarchical representations, with potential implications for safe and efficient autonomous systems.

Abstract

An open problem in artificial intelligence is how systems can flexibly learn discrete abstractions that are useful for solving inherently continuous problems. Previous work has demonstrated that a class of hybrid state-space model known as recurrent switching linear dynamical systems (rSLDS) discover meaningful behavioural units via the piecewise linear decomposition of complex continuous dynamics (Linderman et al., 2016). Furthermore, they model how the underlying continuous states drive these discrete mode switches. We propose that the rich representations formed by an rSLDS can provide useful abstractions for planning and control. We present a novel hierarchical model-based algorithm inspired by Active Inference in which a discrete MDP sits above a low-level linear-quadratic controller. The recurrent transition dynamics learned by the rSLDS allow us to (1) specify temporally-abstracted sub-goals in a method reminiscent of the options framework, (2) lift the exploration into discrete space allowing us to exploit information-theoretic exploration bonuses and (3) `cache' the approximate solutions to low-level problems in the discrete planner. We successfully apply our model to the sparse Continuous Mountain Car task, demonstrating fast system identification via enhanced exploration and non-trivial planning through the delineation of abstract sub-goals.
Paper Structure (21 sections, 24 equations, 4 figures, 3 tables)

This paper contains 21 sections, 24 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: HHA solves nonlinear problems via specifying abstract sub-goals in state-space. (a) Piecewise linear dynamics of the Continuous Mountain Car state-space found by rSLDS. Reward location shown (black triangle). While the rSLDS retrives 5 modes in total, here we plot only the modes seen in the position-velocity ($x$) space without showing the control input ($u$) axis. (b) Example trajectory (segments coloured by mode) showing the HHA consistently navigating to the goal. (c) Continuous control input (coloured by discrete action specified by planner and arrow size indicating magnitude and direction) over same example trajectory in (b).
  • Figure 2: HHA with information-gain explored a wider range of the state-space. State-space coverage in Continuous Mountain Car after 10,000 steps and best of 3 runs for (a) HHA with information-gain drive, (b) HHA without information gain drive and (c) randomly sampled continuous actions baseline.
  • Figure 3: HHA both finds the reward and captilises on its experience significantly quicker than other model-free RL baselines. Average reward (+/- std) over 6 runs for Continuous Mountain Car (20 episodes, max episode length of 200 steps) for HHA (our model), Soft-Actor Critic (with 2 Q-functions), and Actor-Critic models. Note that after 20 episodes, SAC and AC are yet to find the reward and converge on a solution.
  • Figure 4: On Continuous Mountain Car, our model (HHA) competes with the state-space coverage achieved by model-based baselines on Discrete Mountain Car (an easier problem) State-space coverage after 10,000 timesteps on (a) Continuous Mountain Car task using our model (HHA) and (b) Discrete Mountain Car task using a Deep Q-Network with Model-Based Exploration (DQN-MBE) Gou2019. Exact parameters in Table \ref{['dqn-table']}.