Active Inference and Reinforcement Learning: A unified inference on continuous state and action spaces under partial observability

Parvin Malekzadeh; Konstantinos N. Plataniotis

Active Inference and Reinforcement Learning: A unified inference on continuous state and action spaces under partial observability

Parvin Malekzadeh, Konstantinos N. Plataniotis

TL;DR

A unified principle is proposed that establishes a theoretical connection between AIF and RL, enabling seamless integration of these two approaches and overcoming their aforementioned limitations in continuous space POMDP settings.

Abstract

Reinforcement learning (RL) has garnered significant attention for developing decision-making agents that aim to maximize rewards, specified by an external supervisor, within fully observable environments. However, many real-world problems involve partial observations, formulated as partially observable Markov decision processes (POMDPs). Previous studies have tackled RL in POMDPs by either incorporating the memory of past actions and observations or by inferring the true state of the environment from observed data. However, aggregating observed data over time becomes impractical in continuous spaces. Moreover, inference-based RL approaches often require many samples to perform well, as they focus solely on reward maximization and neglect uncertainty in the inferred state. Active inference (AIF) is a framework formulated in POMDPs and directs agents to select actions by minimizing a function called expected free energy (EFE). This supplies reward-maximizing (exploitative) behaviour, as in RL, with information-seeking (exploratory) behaviour. Despite this exploratory behaviour of AIF, its usage is limited to discrete spaces due to the computational challenges associated with EFE. In this paper, we propose a unified principle that establishes a theoretical connection between AIF and RL, enabling seamless integration of these two approaches and overcoming their aforementioned limitations in continuous space POMDP settings. We substantiate our findings with theoretical analysis, providing novel perspectives for utilizing AIF in the design of artificial agents. Experimental results demonstrate the superior learning capabilities of our method in solving continuous space partially observable tasks. Notably, our approach harnesses information-seeking exploration, enabling it to effectively solve reward-free problems and rendering explicit task reward design by an external supervisor optional.

Active Inference and Reinforcement Learning: A unified inference on continuous state and action spaces under partial observability

TL;DR

Abstract

Paper Structure (66 sections, 11 theorems, 96 equations, 5 figures, 7 tables, 1 algorithm)

This paper contains 66 sections, 11 theorems, 96 equations, 5 figures, 7 tables, 1 algorithm.

Introduction
Overview
Problem characteristics
Preliminaries and Problem modeling
Partially observed Markov decision processes (POMDPs)
Policy
Generative model
Inference
Review of the State-of-The-Art algorithms
Reinforcement learning (RL)
Active inference (AIF)
Perceptual inference and learning
Plan selection
Unified inference integrating AIF and RL in continuous space POMDPs
Problem formulation: Unified objective function for AIF and RL in POMDPs
...and 51 more sections

Key Result

Theorem 5.3

Let $\pi$ be a stochastic belief state-action policy selecting action $a_{\tau}$ according to $\pi(a_{\tau}|b_{\tau})$ for $\tau=\{t, t+1, t+2, ... \}$ in a POMDP satisfying Assumption ASS:regula. ${G}^{(\pi)}_{\text{Unified}}(b_{t})$, the EFE corresponding to the policy $\pi$, can be achieved as

Figures (5)

Figure 1: Relationship between generative models of MDPs and POMDPs. Arrows indicate dependence.
Figure 2: The average state-space coverage in terms of percentage of bins visited by the agents for deterministic and stochastic partially observable MountainCarContinuous-v0. The more state space coverage in an episode, the better the agent explores the environment and thus performs in that episode.
Figure 3: Ablation study comparing the average return of G-SAC and G-DDPG algorithms across the partial observation version of Roboschool tasks.
Figure 4: Ablation study comparing the average return of the hybrid G-Dreamer-SAC algorithm and the model-based G-Dreamer algorithm across four partial observation versions of Roboschool tasks.
Figure I.1: Mean return for four Roboschool benchmarks with partial observations (left), and noisy observations (right). Shaded areas indicate standard deviation.

Theorems & Definitions (13)

Definition 4.1
Theorem 5.3
Definition 5.1
Proposition 5.4
Theorem 5.5
Corollary 5.6
Theorem 5.7
Lemma 5.8
Proposition 5.9
Theorem 5.10
...and 3 more

Active Inference and Reinforcement Learning: A unified inference on continuous state and action spaces under partial observability

TL;DR

Abstract

Active Inference and Reinforcement Learning: A unified inference on continuous state and action spaces under partial observability

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (13)