Table of Contents
Fetching ...

Deep hybrid models: infer and plan in a dynamic world

Matteo Priorelli, Ivilin Peev Stoianov

TL;DR

The paper addresses planning in dynamic, hierarchical environments by reframing control as inference within a deep hybrid active-inference framework. It combines predictive coding with discrete policy-based planning, utilizing variational free energy $\mathcal{F}$ for perception and expected free energy $\mathcal{G}_\pi$ for action selection to operate across continuous and discrete levels. The main contributions are a deep hierarchical architecture of hybrid units that encode factorial, hierarchical, and temporal depths, a representation of potential body configurations via intrinsic/extrinsic modules, and a demonstration on a moving-tool/moving-ball reach task showing robust inference and dynamic planning under varying conditions. The work offers an interpretable, data-efficient alternative to traditional optimal control and some deep RL methods, with potential advantages in explainability and flexible multi-timescale planning.

Abstract

To determine an optimal plan for complex tasks, one often deals with dynamic and hierarchical relationships between several entities. Traditionally, such problems are tackled with optimal control, which relies on the optimization of cost functions; instead, a recent biologically-motivated proposal casts planning and control as an inference process. Active inference assumes that action and perception are two complementary aspects of life whereby the role of the former is to fulfill the predictions inferred by the latter. Here, we present an active inference approach that exploits discrete and continuous processing, based on three features: the representation of potential body configurations in relation to the objects of interest; the use of hierarchical relationships that enable the agent to easily interpret and flexibly expand its body schema for tool use; the definition of potential trajectories related to the agent's intentions, used to infer and plan with dynamic elements at different temporal scales. We evaluate this deep hybrid model on a habitual task: reaching a moving object after having picked a moving tool. We show that the model can tackle the presented task under different conditions. This study extends past work on planning as inference and advances an alternative direction to optimal control.

Deep hybrid models: infer and plan in a dynamic world

TL;DR

The paper addresses planning in dynamic, hierarchical environments by reframing control as inference within a deep hybrid active-inference framework. It combines predictive coding with discrete policy-based planning, utilizing variational free energy for perception and expected free energy for action selection to operate across continuous and discrete levels. The main contributions are a deep hierarchical architecture of hybrid units that encode factorial, hierarchical, and temporal depths, a representation of potential body configurations via intrinsic/extrinsic modules, and a demonstration on a moving-tool/moving-ball reach task showing robust inference and dynamic planning under varying conditions. The work offers an interpretable, data-efficient alternative to traditional optimal control and some deep RL methods, with potential advantages in explainability and flexible multi-timescale planning.

Abstract

To determine an optimal plan for complex tasks, one often deals with dynamic and hierarchical relationships between several entities. Traditionally, such problems are tackled with optimal control, which relies on the optimization of cost functions; instead, a recent biologically-motivated proposal casts planning and control as an inference process. Active inference assumes that action and perception are two complementary aspects of life whereby the role of the former is to fulfill the predictions inferred by the latter. Here, we present an active inference approach that exploits discrete and continuous processing, based on three features: the representation of potential body configurations in relation to the objects of interest; the use of hierarchical relationships that enable the agent to easily interpret and flexibly expand its body schema for tool use; the definition of potential trajectories related to the agent's intentions, used to infer and plan with dynamic elements at different temporal scales. We evaluate this deep hybrid model on a habitual task: reaching a moving object after having picked a moving tool. We show that the model can tackle the presented task under different conditions. This study extends past work on planning as inference and advances an alternative direction to optimal control.
Paper Structure (15 sections, 59 equations, 10 figures, 6 algorithms)

This paper contains 15 sections, 59 equations, 10 figures, 6 algorithms.

Figures (10)

  • Figure 1: (a) Factor graph of a hybrid unit. Continuous hidden states $\tilde{\bm{x}}$ generate predictions $\tilde{\bm{y}}$ through parallel pathways. Model dynamics is encoded by potential trajectories $\bm{f}_m$, which are hypotheses of how the world may evolve and are associated with discrete hidden causes $\bm{v}$. (b) Illustrative example of a hybrid unit. In this task, the agent has to infer which one among two objects (a red circle and a gray square moving along a circular trajectory) is being tracked by another 1-DoF agent. The time step is shown in the bottom left of each frame. The hidden states $\tilde{\bm{x}}$ encode the angle and angular velocity of the arm (generating proprioceptive predictions), as well as the positions and velocities of the two objects (generating visual predictions). The blue arrow represents the actual hand trajectory, while the red and green arrows represent the two potential trajectories associated with reaching movements toward the two objects. See Priorelli2023e for more details.
  • Figure 2: (a) An IE module is composed of two units $\mathcal{U}_i$ and $\mathcal{U}_e$, which represent a signal in intrinsic and extrinsic reference frames, respectively. Different IE modules can be combined in a hierarchical fashion: the extrinsic signal $\bm{x}_e^{(i)}$ is iteratively transformed through linear transformation matrices encoded in the extrinsic likelihood function $\bm{g}_e^{(i)}$. Hierarchical levels communicate via the 0th-order hidden states. (b) Illustrative examples of a hierarchical model with IE modules. In the first task, the agent (a 23-DoF human body) has to avoid a moving obstacle; in the second task, the agent (a 28-DoF kinematic tree) has to reach four target locations with the extremities of its branches. In both cases, the module in (a) is repeated for every DoF of the agents, matching their kinematic structures. Proprioceptive and exteroceptive (e.g., visual) for each DoF are respectively generated by the intrinsic and extrinsic units via appropriate likelihood functions. See Priorelli2023b for more details.
  • Figure 3: (a) Interface between a discrete model and several hybrid units. The hidden causes $\bm{v}^{(i)}$ are directly generated, in parallel pathways, from discrete hidden states $\bm{s}_{\tau}$ via likelihood matrices $\bm{A}^{(i)}$. (b) Illustrative example with the hybrid units combined with a discrete model. In this task, the agent (a 4-DoF arm with an additional 4-DoF hand composed of two fingers) has to pick a moving ball (the red circle) and place it at a goal position (the grey square). The discrete hidden states $\bm{s}_{\tau}$ encode the agent position (start position, at the ball, or at the goal) and the status of the hand (open or closed). These are informed by two continuous models encoding intrinsic (joint angles) and extrinsic (hand and objects positions) information, respectively. The hidden causes $\bm{v}$ of the intrinsic model are related to hand opening and closing actions, while the hidden causes of the extrinsic model relate to two reaching movements, as in the previous case. Note that the object belief (purple circle) is rapidly inferred, and as soon as the picking action is complete, the belief is gradually pulled toward the goal position, resulting in a second reaching movement. The top right panel shows the hand-object distance over time, while the bottom right panel displays the dynamics of the discrete action probabilities used to infer the next discrete state. The vertical dashed lines distinguish five different phases: a pure reaching movement, an intermediate phase when the agent prepares the grasping action, a grasping phase, a second reaching movement and, finally, the ball release. The stepped behavior of the action probabilities is due to the replanning made by the discrete model every $10$ continuous time steps. See Priorelli2023d for more details.
  • Figure 4: (a) Virtual environment of the tool use task. An agent controlling a 4-DoF arm has to grasp a moving tool (in green) and reach a moving ball (in red) with the tool's extremity. (b) Agent's beliefs over the continuous hidden states of the arm (blue), tool (light green), and ball (light red). The real positions of the tool and ball are represented in dark green and dark red, respectively. The virtual level is plotted with more transparent colors. (c) Graphical representation of the agent's continuous generative model. Every environmental entity is encoded hierarchically by considering the whole arm's kinematic structure. For clarity, the three pathways are displayed separately, while lateral connections and the high-level discrete model are not shown. The end effector's level encodes intrinsic and extrinsic information about the end effector, regarding the three configurations (the actual end effector position, the belief over the end effector at the tool's origin, or at an appropriate position to reach the ball with the tool's extremity). Instead, the virtual level is not present in the actual configuration, since the tool is not part of the agent's kinematic chain and it is only used in the generative model for goal-directed behavior -- as if it were a new joint. This level encodes intrinsic and extrinsic information about the tool, regarding the two potential configurations (the belief over the tool's extremity at the actual tool's extremity, and at the actual ball position.) Small purple and yellow circles represent proprioceptive and exteroceptive observations, respectively.
  • Figure 5: Graphical representation of a deep hybrid model for tool use, composed of a discrete model at the top and several IE modules. Every module is factorized into three elements, related to the observations of the agent's arm (in blue), a tool (in green), and a ball (in red). Note that the last (virtual) level only considers the tool's extremity and the ball. The computation of the action for a single time step is divided into four main processes. (a) Perception. Proprioceptive and visual observations $\bm{y}_p$ and $\bm{y}_e$ are compared with the agent's predictions. The resulting prediction errors are propagated throughout the hierarchy to infer the actual kinematic configuration, as well as potential configurations related to the objects. (b) Dynamic inference. The bottom-up messages $\bm{l}_e$ from the IE modules inform the discrete model about the most likely state that may have generated the perceived arm trajectory. This is done by comparing the latter with potential trajectories $\bm{f}_m$ related to dynamic hypotheses $\bm{v}_e$ (see Equation \ref{['eq:causes']}). For instance, if the agent is reaching the tool and the ball is moving away, the bottom-up messages assign a higher probability to the tool-reaching hypothesis and a lower probability to the initial steady state. (c) Dynamic planning. The agent infers the next discrete action to take by minimizing the expected free energy $\mathcal{G}$ (see Equation \ref{['eq:exp_fe']}). As a result, the agent believes to be at the next discrete state, corresponding to the ball-reaching hypothesis. In turn, this biased state generates a new combined trajectory (through the discrete extrinsic prediction $\bm{A}_e \bm{s}$ in Equation \ref{['eq:causes']}), acting as a prior for the continuous hidden states of the IE modules. (d) Action. The continuous hidden states generate predictions, which are again compared with the related observations. The proprioceptive prediction errors climb back the hierarchy as before, but they are also suppressed through movement by motor units (see Equation \ref{['eq:action']}). This second process eventually produces a continuous action that moves the end effector toward the ball.
  • ...and 5 more figures