Toward Universal and Interpretable World Models for Open-ended Learning Agents

Lancelot Da Costa

Toward Universal and Interpretable World Models for Open-ended Learning Agents

Lancelot Da Costa

TL;DR

This work introduces a generic, compositional and interpretable class of generative world models that supports open-ended learning agents and enables agents to actively develop and refine their world models, which may lead to developmental learning and more robust, adaptive behavior.

Abstract

We introduce a generic, compositional and interpretable class of generative world models that supports open-ended learning agents. This is a sparse class of Bayesian networks capable of approximating a broad range of stochastic processes, which provide agents with the ability to learn world models in a manner that may be both interpretable and computationally scalable. This approach integrating Bayesian structure learning and intrinsically motivated (model-based) planning enables agents to actively develop and refine their world models, which may lead to developmental learning and more robust, adaptive behavior.

Toward Universal and Interpretable World Models for Open-ended Learning Agents

TL;DR

Abstract

Paper Structure (8 sections, 2 figures)

This paper contains 8 sections, 2 figures.

Introduction
A generic, interpretable and agentic class of generative models
Discussion
Current challenges in developmental agents
Details on generic, interpretable and agentic class of Bayesnets
Discrete dynamics
Continuous dynamics
Hierarchical mixed dynamics

Figures (2)

Figure 1: Space of discrete-state Bayesian networks. The basic module for expressing discrete-state agent-environment interactions is a partially observed Markov decision process (POMDP). This module can have an arbitrary finite temporary horizon (i.e. temporal depth). Several such modules can be stacked atop each other finitely many times (i.e. hierarchical depth), thereby expressing multi-scale semi-Markovian latent dynamics. We can specify multiple co-evolving factors (i.e. factorial depth; e.g. position and colour of an object). In any given layer, a number of auxiliary (i.e. generalised) states can be added, accounting for velocity, acceleration, and higher orders of motion in the latent states (i.e. generalised depth) to express semi-Markovian processes friston_supervised_2023. In each of these layers, the highest generalised states may or may not be actions denoted by $a$ (cf. controllable states friston_supervised_2023) while all other states are uncontrollable (i.e. part of the environment) denoted by $s$. The controllable and uncontrollable states cause observations denoted by $o$. The parameters of these Bayesian networks (not shown) correspond to the parameters of the causal maps as well as the initial distributions over states da_costa_active_2020. These parameters as well as the graphical structure of these Bayesnets to be inferred from data (e.g. past actions and observations). The resulting approximate posterior belief then informs action, which follows a mixture of goal-seeking and information gathering (i.e. intrinsic motivation) imperatives. Please see friston_supervised_2023friston_active_2023 for more details on temporal, hierarchical and factorial depth---and examples of controllable versus uncontrollable states.
Figure 2: Recurrent switching linear dynamical systems (rsLDS). This figure illustrates rsLDS as a way of interpretably parameterising fairly arbitrary stochastic differential equations (SDEs) as a switching linear SDE. Consider for example a generative model used to play pool. This must be able to express the non-linear trajectories in a game of pool. The behaviour of a ball on a pool table depends on whether the ball is by one of the four walls (i.e. bouncing), or not. In each of these sectors the dynamic of the ball is captured by a simple dynamical equation. The rsLDS linderman_bayesian_2017 is a simple generative model that is able to express these kinds of trajectories, where a (switching) discrete state expresses in which of these sectors the ball is. This discrete state furnishes a linear drift and volatility to the linear SDE describing the continuous motion of the ball (cf. continuous states), which yields continuous observations. Note the so-called recurrent connection (i.e. causal map) from continuous to discrete states; this connection enables the continuous dynamic to influence the discrete switching: if the continuous trajectory of the ball collisions with a wall the discrete state switches so that the continuous dynamic of the ball changes course. The rsLDS layer may be supplemented with active states, acting on the discrete latent states, thereby emulating a continuous partially observable Markov decision process. This continuous POMDP can easily be extended by varying the temporal, hierarchical, factorial and generalised depth as in Figure \ref{['fig: discrete']}, furnishing a generic model of continuous dynamics. Please see linderman_bayesian_2017 for more details on the current rsLDS architecture.

Toward Universal and Interpretable World Models for Open-ended Learning Agents

TL;DR

Abstract

Toward Universal and Interpretable World Models for Open-ended Learning Agents

Authors

TL;DR

Abstract

Table of Contents

Figures (2)