Learning in Hybrid Active Inference Models

Poppy Collis; Ryan Singh; Paul F Kinghorn; Christopher L Buckley

Learning in Hybrid Active Inference Models

Poppy Collis, Ryan Singh, Paul F Kinghorn, Christopher L Buckley

TL;DR

A novel hierarchical hybrid active inference agent in which a high-level discrete active inference planner sits above a low-level continuous active inference controller is presented, and is applied to the sparse Continuous Mountain Car task.

Abstract

An open problem in artificial intelligence is how systems can flexibly learn discrete abstractions that are useful for solving inherently continuous problems. Previous work in computational neuroscience has considered this functional integration of discrete and continuous variables during decision-making under the formalism of active inference (Parr, Friston & de Vries, 2017; Parr & Friston, 2018). However, their focus is on the expressive physical implementation of categorical decisions and the hierarchical mixed generative model is assumed to be known. As a consequence, it is unclear how this framework might be extended to learning. We therefore present a novel hierarchical hybrid active inference agent in which a high-level discrete active inference planner sits above a low-level continuous active inference controller. We make use of recent work in recurrent switching linear dynamical systems (rSLDS) which implement end-to-end learning of meaningful discrete representations via the piecewise linear decomposition of complex continuous dynamics (Linderman et al., 2016). The representations learned by the rSLDS inform the structure of the hybrid decision-making agent and allow us to (1) specify temporally-abstracted sub-goals in a method reminiscent of the options framework, (2) lift the exploration into discrete space allowing us to exploit information-theoretic exploration bonuses and (3) `cache' the approximate solutions to low-level problems in the discrete planner. We apply our model to the sparse Continuous Mountain Car task, demonstrating fast system identification via enhanced exploration and successful planning through the delineation of abstract sub-goals.

Learning in Hybrid Active Inference Models

TL;DR

Abstract

Paper Structure (25 sections, 38 equations, 6 figures, 3 tables)

This paper contains 25 sections, 38 equations, 6 figures, 3 tables.

Introduction
Hybrid Active Inference
Recurrent Switching Systems
Emergent descriptions for planning
Related work
Framework
Generative Model: rSLDS(ro)
Active Inference
Discrete Planner
Continuous controller
Results
Discussion
Conclusion
Appendix
Framework
...and 10 more sections

Figures (6)

Figure 1: Previous discrete-continuous active inference models have focused on the physical implementation of categorical decisions in continuous space. Here, outcomes from the high-level active inference planner select from a set of discrete models of continuous dynamics, speciﬁed by a prior over their hidden causes. This mixed generative model effectively generates discrete sequences of short continuous trajectories deﬁned in terms of their generalised coordinates of motion. The discrete planner is formulated as a standard POMDP generative model (see Sec. \ref{['discrete-planner-sec']}) with discrete states $s_\tau$ and observations $o_\tau$. The first action $a_\tau$ of the selected policy (see Sec. \ref{['aif-sec']}) is then passed down to the continuous active inference controller via the expected observation $q(o_{\tau+1}|a_\tau)$. This distribution weights a set of fixed point means $\{\eta_m\}_{m = 0}^{M-1}$ which map the $m$ discrete latent states into continuous state space. The resulting weighted average, $\eta_\tau = \sum_m \eta_m \cdot q(o_{\tau+1,m}|a)$, serves as the mean of a prior over hidden causes, $p(\nu) = \mathcal{N}(\eta_{\tau}, \pi_{\tau}^{-1})$, which drives the dynamics of the low-level continuous latent states $\tilde{x_t} = \{x_t, x_t^\prime, x_t^{\prime\prime}\}$ and observations $\tilde{y_t} = \{y_t, x_t^\prime, x_t^{\prime\prime}\}$ represented in generalised coordinates. Inherently, there is a separation of timescales in this open-loop control setup: the discrete controller sends an action down to the continuous controller which is then executed in a ballistic manner over several timesteps. After this low-level inner-loop completes, a process of Bayesian model selection is used to infer the current discrete state description of the low-level system given the trajectory of continuous observations $\tilde{y_t}$. This is then given as an observation for the discrete planner at the top. For a full treatment of this model, see friston2017graphical.
Figure 2: Recurrent switching linear dynamical systems (rSLDS) discover meaningful discrete states and explain how their switching behaviour depends on continuous latent states. This class of hybrid state space model includes a recurrent dependency of the discrete latent state $z_{t+1}$ on the continuous latent state $x_t$ and control input $u_t$. As in a standard SLDS, the continuous latent dynamics are conditionally linear (dependent on the current discrete state $z_t$) and generate observations $y_t$. Note that this figure shows the recurrent-only formulation of the rSLDS (see Sec. \ref{['rslds-section']}) in which the discrete latent $z_t$ have no dependency on $z_{t-1}$ as is present in its canonical form.
Figure 3: Our Hybrid Hierarchical Agent learns emergent coarse-grained descriptions of the continuous state-space for planning and control. Like previous work on mixed generative models in active inference shown in Fig. \ref{['discrete-cont']}, we have a discrete active inference planner sitting above a low-level continuous active inference controller. The discrete planner is constructed as a standard POMDP generative model (see Sec. 3.3) with discrete states $s_\tau$ and observations $o_\tau$. However, our model departs from friston2017graphical in the generation of coarse-grained variables which instead emerge from the the underlying rSLDS generative model. Here, the states $S$ of the planner are essentially a re-description of the discrete states $Z$ found by the rSLDS. The transition model probabilities are then constrained to reflect the adjacency structure of the polyhedral partitions of the state-space found by the softmax regression component of rSLDS. The chosen action $a_\tau$ from the high-level planner selects from a discrete set of continuous active inference controllers based on both the linear dynamics of the current discrete state $z_\tau$ and a control prior for the desired next discrete state. Using the rSLDS generative model, this prior is a flexibly specified continuous point in the state-space that is in the discrete region the agent wishes to move into (see Eq. \ref{['gen-ctl-prior-eq']}). Unlike the models in friston2017graphical, the action $a_\tau$ is temporally abstracted with no pre-defined timescale at the lower level. Instead, the discrete planner is only re-triggered when the system enters a new discrete state (i.e. $z_{t} \neq z_{t-1}$). At which point, the planner observes the new discrete state $z_\tau$ of the system and constructs a plan accordingly.
Figure 4: HHA solves nonlinear problems via specifying abstract sub-goals in state-space. (a) Piecewise linear dynamics of the Continuous Mountain Car state-space found by rSLDS represented as a vector plot where magnitude of the arrows indicates how fast the state is changing at that point. Reward location shown (black triangle). While the rSLDS retrives 5 modes in total, here we plot only the modes seen in the position-velocity ($x$) space without showing the control input ($u$) axis. (b) Example trajectory (segments coloured by mode) showing the HHA consistently navigating to the goal. (c) Continuous control input (coloured by discrete action specified by planner and arrow size indicating magnitude and direction) over same example trajectory in (b).
Figure 5: HHA with information-gain explored a wider range of the state-space. State-space coverage in Continuous Mountain Car after 10,000 steps and best of 3 runs for (a) HHA with information-gain drive, (b) HHA without information gain drive and (c) randomly sampled continuous actions baseline. HHA with information-gain drive also shows comparable performance to (d) a Deep Q-Network with Model-Based Exploration (DQN-MBE) on the (comparably easier) Discrete Mountain Car task Gou2019. Exact parameters for DQN-MBE are given in Table \ref{['dqn-table']} in \ref{['sec-DQNMBE']}.
...and 1 more figures

Learning in Hybrid Active Inference Models

TL;DR

Abstract

Learning in Hybrid Active Inference Models

Authors

TL;DR

Abstract

Table of Contents

Figures (6)